IBM DB2 for Linux, UNIX, and Windows Best Practices ...public.dhe.ibm.com/.../bestpractices/DB2BP_Physical... · IBM® DB2® for Linux®, UNIX®, and Windows® Best Practices . Physical

��®IBM® DB2® for Linux®, UNIX®, and Windows®

Best Practices Physical Database Design

Sam Lightstone Program Director and Senior Technical Staff Member Information Management Software

Christopher Tsounis Executive IT Specialist Information Management Technical Sales

Agatha Colangelo DB2 Information Development

Steven Tsounis IT Specialist Information Management Technical Sales

Physical Database Design Page 2

Physical Database Design ................................................................................... 1

Executive summary ............................................................................................. 4

Introduction to physical database design......................................................... 6

Assumptions about the reader..................................................................... 7

Goals of physical database design..................................................................... 8

Datatype selection best practices ....................................................................... 9

Example of virtual views that represent a lookup table for each column........................................................................................................................... 9

Table normalization and denormalization best practices ............................ 12

Normalization............................................................................................... 12

Third normal form (3NF)..............................................................................................12 1NF, 2NF, and 3NF of database design ......................................................................13 Star schema and snowflake models ............................................................................15

Denormalization........................................................................................... 15

IBM Layered Data Architecture................................................................. 15

Index design best practices............................................................................... 18

Clustering indexes ....................................................................................... 18

Data clustering and multidimensional clustering (MDC) best practices... 23

Block indexes for MDC tables .................................................................... 23

Maintaining clustering automatically during INSERT operations....... 25

Benefits of using MDC ................................................................................ 26

MDC storage scenario ................................................................................. 29

MDC run time overhead and benefit considerations ............................. 30

Determining when to use MDC versus a clustering index.................... 30

Database partitioning (shared-nothing hash partitioning) best practices . 34

Balanced Warehouse and Balanced Configuration Units (BCU).......... 35

Table (range) partitioning best practices ........................................................ 39

UNION All View (UAV) partitioning best practices.................................... 42


Migrating UAVs to table partitioning....................................................... 43

Database partitioning, table partitioning, and MDC in the same database design best practices .......................................................................................... 45

Roll-in and roll-out of data with table partitioning and MDC best practices............................................................................................................................... 46

Rolling-in large data volumes using table partitioning best practices....... 47

Materialized query table (MQT) best practices ............................................. 48

Post-design tools for improving designs for existing databases................. 51

Explain facility best practices ..................................................................... 51

DB2 Design Advisor best practices ........................................................... 52

MDC selection capability of the DB2 Design Advisor..............................................53 Best Practices....................................................................................................... 55

Conclusion .......................................................................................................... 58

Further reading................................................................................................... 59

Contributors.................................................................................................. 60

Notices ................................................................................................................. 61

Trademarks ................................................................................................... 62


Executive summary Physical database design is the single most important factor that impacts database performance. Physical database design covers all of the design features that relate to the physical structure of the database such as datatype selection, table normalization and denormalization, indexes, materialized views, data clustering, multidimensional data clustering, table (range) partitioning, and database (hash) partitioning.

Good physical database design reduces hardware resource utilization (I/O, CPU, and network) and improves your administrative efficiency. This, in turn, can help you achieve the following potential benefits to your business:

• Increased performance of applications that use the database, resulting in better response times and higher end-user satisfaction

• Reduced IT administrative costs, giving you the ability to manage a wider scope of databases and respond quicker to changes in application requirements

• Reduced IT hardware costs

• Improved backup and recovery elapsed time

Figure 1 shows an illustration of a physical database system. The three heavy dark-boxed vertical rectangles indicate three distinct database instances. All other square or rectangular boxes represent storage blocks on disk. All symbols represent data values within the table (such as geography or month).

In this example, a table has been hash-partitioned across three instances called P1, P2, and P3. The table has been range-partitioned by month, allowing data to be easily added and deleted by month. Indirectly, this also helps with queries that have predicates by month. Data within each table has been clustered using multidimensional clustering (MDC), and this serves as a further clustering within each range partition. The rows within the table are also indexed using regular row-based (RID-based) indexes. A materialized query table (MQT) is created on the table, which includes aggregated data (such as average sales by geography), which itself has indexing and MDC.


Figure 1 Illustration of a physical database system


Introduction to physical database design Database design is performed in three stages:

1. Logical database design: includes gathering of business requirements, and entity relationship modeling.

2. Conversion of the logical design into table definitions (often performed by an application developer): includes pre-deployment design, table definitions, normalization, PK and FK relationships, and basic indexing.

3. Post deployment physical database design (often performed by a database administrator): includes improving performance, reducing I/O, and streamlining administration tasks.

Physical database design covers those aspects of database design that impact the actual structure of the database on disk, items 21 and 3 in the list above. Although you can perform logical design independently of the platform that the database will eventually use, many physical database attributes depend on the specifics and semantics of the target DBMS. Physical database design includes the following attributes:

• Datatype selection

• Table normalization

• Table denormalization

• Indexing

• Clustering

• MDC

• Database partitioning

• Range partitioning

• UAV partitioning

• MQTs

• Memory allocation

• Database storage topology

• Database storage object allocation

This paper covers all but “Database storage topology” and “Database storage object allocation,” which are covered in “Best Practices: Database Storage” white paper. This

1 This phase is variably referred to in the industry as logical database design or physical database design. It’s known as logical database design in the sense that it can be designed independent of the data server or the particular DBMS used. It is also often performed by the same people who perform the early requirements building and entity relationship modeling. Conversely, it is also called physical database design in the sense that it affects the physical structure of the database and its implementation. For the sake of this document we use the latter assumption, and therefore include it as part of physical database design.


white paper and others mentioned throughout this paper are available at the DB2 Best Practices website at http://www.ibm.com/developerworks/db2/bestpractices/.

Physical database design is as old as databases themselves2. The first relational databases were prototypes (in the early 1970s). As relational database systems advanced, new techniques were introduced to help improve operational efficiency. The most elementary problems of database design are table normalization and index selection, both of which are discussed below.

Today, we can achieve I/O reductions by properly partitioning data, distributing data, and improving the indexing of data. All of these innovations (which improve database capabilities, expand the scope of physical database design, and increase the number of design choices) have resulted in the increased complexity of optimizing database structures. Although the 1980s and 1990s were dominated by the introduction of new physical database design capabilities, the years since have been dominated by efforts to simplify the process through automation and best practices.

The vast majority of physical database design features and attributes have the primary goal of reducing I/O use at run time. However, to a lesser degree, there are “physical design aspects” that help improve administrative efficiency and reduce CPU or network use. In addition, in the DB2 partitioned environment, the database design influences the degree of parallel processing, for example, parallel query processing.

The best practices presented in this document have been developed with the reality of today’s database systems in mind and specifically address the features and facilities available in DB2 9.5.

Assumptions about the reader It is assumed that you are familiar with the physical database design features described. Therefore, only a very brief description of each one is provided. The focus of this paper is on the best practices for applying these features. For details on each respective feature, refer to the DB2 product documentation.

2 The relational model for databases was first proposed in 1970 by E.F Codd at IBM. The first relational database systems to be implemented, using SQL and B+ tree, were IBM’s System R, in 1976, and Ingres at the University of California, Berkeley. The B+ tree, the most commonly used indexing storage structure for user-designed indexes, was first described in the paper “Organization and Maintenance of Large Ordered Indices” by Rudolf Bayer and Edward M. McCreight, 1972.

http://www.ibm.com/developerworks/db2/bestpractices/


Goals of physical database design A high-quality physical database design is one that meets the following goals:

• Minimizes I/O

• Balances design features that optimize query performance concurrently with transaction performance and maintenance operations

• Improves the efficiency of database management, such as roll-in and roll-out of data

• Improves the performance of administration tasks, such as index creation or backup and recovery processing

• Minimizes backup and recovery elapsed time


Datatype selection best practices When designing a physical database, the selection of appropriate datatypes is an important consideration that should not be overlooked. Often, abbreviated or intuitive codes are used to represent a longer value in columns, or to easily identify what the code represents; for example, an account status column whose codes are OPN, CLS, and INA (representing an account that can be open, closed, or inactive). From a query processing perspective, numeric values can be processed more efficiently than character values, especially when joining values. Therefore, using a numeric datatype can provide a slight benefit.

While using numeric datatypes might mean that interpreting the values that are being stored in a column is more difficult, there are appropriate places where the definitions of numeric values can be stored for retrieval by end users, such as:

o Storing the definitions as a domain value in a data modeling tool such as Rational Data Architect, where the values can be published to a larger team using metadata reporting

o Storing the definition of the values in a table in a database, where the definitions can be joined to the values to provide context, such as text name or description (tables that store values of columns and their descriptions are often referred to as reference tables or lookup tables)

Another concern that is often raised is that, for a large databases, this storing of definitions could lead to the proliferation of reference tables. While this is true, if an organization chooses to use a reference table for each column that is used to store a code value, it is possible to consolidate these reference tables into either a single or a few reference tables. From these consolidated reference tables, virtual views can be created to represent the lookup table for each column.

Example of virtual views that represent a lookup table for each column In the following diagram, the TCUSTOMER table has two columns that use code values: CUST_TYPE and CUST_MKT_SEG. In this scenario, a reference table is created for each column that uses a code, resulting in two reference tables, TCUST_TYPE_REF and TCUST_MKT_SEG_REF.


This approach is not flexible because any time a new column is added that employs the use of a code value, a new reference table must be created. A possible solution is to consolidate the reference table into a single reference table (TREF_MASTER), as shown in the following diagram:

In this diagram, two virtual views, VCUST_TYPE_REF and VCUST_MKT_SEG_REF, were created from the TREF_MASTER table to represent the reference tables in the example above. The benefit to this approach is that end users can still use the reference table (without having to write complex SQL) by simply accessing the reference views for each column. In addition, the DBA will only maintain a single table for all of the reference data, and the proliferation of reference tables is limited.


To understand how the VCUST_TYPE_REF view was created, here is the SQL: SELECT VALUE as CUST_TYPE, VALUE_NME as CUST_TYPE_NME, VALUE_DESC as CUST_TYPE_DESC FROM REFTB.TREE_MASTER WHERE TBL_SCHEMA = ‘REFTB’ AND TABLE = ‘REF_MASTER’ AND COLUMN = ‘CUST_TYPE’

Use the following best practices when selecting datatypes:

Always try to use a numeric datatype over a character datatype, taking the following considerations into account:

o When creating a column that will hold a Boolean value (“YES” or “NO”), use a decimal (1,0) or similar datatype. Use 0 and 1 as values for the column rather than “N” or “Y”.

o Use integers to represent codes.

o If there will be less than 10 code values for a given column, decimal (1,0) datatype is appropriate. If there are more than 9 code values that will be stored in a given column, use smallint.

Store the definitions as a domain value in a data modeling tool, such as Rational Data Architect, where the values can be published to a larger team using metadata reporting.

Store the definition of the values in a table in a database, where the definitions can be joined to the value to provide context, such as “text name” or “description”.


Table normalization and denormalization best practices Table normalization is the restructuring of a data model by reducing its relations to their simplest forms. It is a key step in the task of building a logical relational database design. Normalization helps avoid redundancies and inconsistencies in data; it is typically a logical data modeling exercise, whose outcome might be implemented in the physical design.

There are a few goals for deploying a normalized design:

• Eliminate redundant data, for example, storing the same data in more than one table.

• Enforce valid data dependencies by only storing related data in a table, and dividing relational data into multiple related tables.

• Maximize the flexibility of the system for future growth in data structures.

Normalization The two or three dominant strategies for normalization are:

• Third normal form (3NF), which is used in online transaction processing (OLTP) and many general-purpose databases, including enterprise data warehouses (also called atomic warehouses).

• Star schema and snowflake, which are dimensional model forms for normalization, and are used heavily in data warehousing and OLAP.

Specify non-enforced RI on FK columns to reduce table access for STAR JOINs without incurring the overheadof RI.

Third normal form (3NF) 3NF is a combination of the rules from first normal form and second normal form. The following rules are specific to 3NF:

• Eliminate repeating groups. Make a separate table for each set of related attributes, and give each table a PK.

• Eliminate duplicate columns and redundant data in each table.

• Move subsets of columnar data that apply to multiple rows of a table into separate tables.

• Create relationships between the tables by using FKs.


• Eliminate columns not dependent on keys. If attributes do not contribute to a description of a key, move them into a separate table.

• Remove columns not dependent upon the PK.

1NF, 2NF, and 3NF of database design The following diagrams demonstrate the first, second, and third normal forms of database design:

Denormalized model:

First normal form (1NF):

To make the denormalized model comply with 1NF, the repeating group of data elements, the customer address lines, and the customer names were normalized into separate tables.


Second normal form (2NF):

For the model to comply with 2NF, it must comply with 1NF and any attributes must be fully dependent on a part of a composite key.

Third normal form (3NF):

For the model to comply with 3NF, any transitive dependencies must be eliminated. Transitive dependencies occur when a value in a non-key field is determined by the value of another non-key field that is not part of a candidate key.


Star schema and snowflake models The star schema and snowflake models have become quite popular for data warehousing BI systems. The basis of star schema is the separation of the facts of a system from its dimensions. Dimensions are defined as attributes of the data, such as the location, or customer name, or part description, and the facts refer to the time-specific events related to the data.

For example, a part description does not typically change over time, so it can be designed as a dimension. Conversely the number of parts sold daily varies over time and is therefore a fact. A star schema is called that because it is typically characterized by a large central fact table that holds information about events that vary over time, surrounded (conceptually) by a set of dimension tables holding the meta attributes of items that are referenced within the fact events.

A snowflake is basically an extension of a star schema. In a snowflake design, the low cardinality attributes are often moved from a dimension table in a star schema into another dimension table and then a relationship is created between the two dimension tables.

Denormalization In contrast to normalization, denormalization is the process of collapsing tables and, therefore possibly increasing the redundancy of data within a database. Denormalization can be useful in reducing the complexity or number of joins, and reducing the complexity of a database by reducing the number of tables. The primary goal of denormalization is to maximize performance of a system and reduce the complexity in administering the system.

IBM Layered Data Architecture IBM Layered Data Architecture offers multiple levels of granularity. Each layer provides a different level of detail and data summarization appropriate to user needs, which users (analysts and executives) can access. As data ages, it rolls up through the layers (with more tables and less data per table). This architecture is designed specifically for mixed workloads, query performance, rapid incorporation of new data sources, and deployment of new applications.

The layered architecture enables concurrent loading, query, archive and maintenance without compromising query performance. The multiple levels of data granularity are available for multiple types of analytics.

Figure 2 shows the 5 layers (or floors) of the IBM Layered Data Architecture.


Figure 2 IBM Layered Data Architecture

With this model, warehouse administrators can:

1. Use visual modeling tools to optimize the design of multilayered warehouse schemas.

2. Use their preferred extract, transform, and load (ETL) software to bulk-load the staging layer of the warehouse—with scale, speed and rich transformations from myriad enterprise data sources.

3. Use SQL Warehousing Tools (SQWs) to maintain analytic structures in the performance and business access layers—or to replace hand-coded SQL flows anywhere inside the warehouse.

This layered architecture is a powerful paradigm that is too detailed to describe at length here. Refer to “Best Practices for Creating Scalable High Quality Data Warehouses with DB2” in the “Further reading” section for detailed information on this layered architecture.

Use the following normalization and denormalization best practices:


• Use 3NF whenever possible for most OLTP and general-purpose database designs to maintain flexibility in the design of the system. It is a tried-and-true normalization model.

• For data warehouses and data marts that require very high performance, a star schema or snowflake model is typically optimal for dimensional query processing. However, verify that the star schema or snowflake model conforms to the relationships that you designed in the normalized logical data model. More information about logical modeling for users of Rational Data Architect is available in “Best Practices: Data Life Cycle Management” white paper.

• For broad-based data warehousing that is used for several purposes, such as operational data stores, reporting, OLAP and cubing, use the IBM Layered Data Architecture illustrated in Figure 2.

• Consider denormalizing very narrow tables, ones with a row length of 30 or fewer bytes. Extra tables in a database increase query complexity and complicate administration.


Index design best practices Indexes are critical for performance. They are used by a database for the following purposes:

• Apply predicates to provide rapid look up of the location of data in a database, reducing the number of rows navigated

• To avoid sorts for ORDER BY and GROUP BY clauses

• To induce order for joins

• To provide index-only access, which avoids the cost of accessing data pages

• As the only way to enforce uniqueness in a relational database

However, indexes incur additional hardware resources:

• They add extra CPU and I/O cost to UPDATE, INSERT, DELETE, and LOAD operations

• They add to prepare time because they provide more choices for the optimizer

• They can use a significant amount of disk storage

In DB2 database systems, a B+ tree structure is used as the underlying implementation for indexes. All data is stored in the leaf nodes, and the keys are optionally chained in a bidirectional manner to allow both forward and backward index scanning. If DISALLOW REVERSE SCANS is specified then the index cannot be scanned in reverse order.

Clustering indexes Clustering indexes (also called special indexes) indicate to the database manager that data in the table object should be clustered in a specific order, on disk, according to the definition of the index. For example, if the clustering index is defined on a date key, then the DB2 database manager will attempt to store, in the table object, rows with similar dates in ascending date sequence.

The table in Figure 3 has two row-based indexes defined on it:

• A clustering index on Region

• Another index on Year


Figure 3. A regular table with a clustering index

The value of this clustering is that subsequent queries that have predicates on the clustering attribute need to perform dramatically reduced I/O. For example, a query on sales by date will perform far less I/O if the rows for the selected dates are stored next to each other on disk.

However, clustering indexes are merely an indicator to the database, and as new rows are inserted into the database the DB2 kernel attempts to place these rows near rows with the same or similar attributes. If space is unavailable, the incoming or changed row might be redirected to another location that is unclustered (that is, not near the related rows).

When an INSERT occurs (or an UPDATE to the clustering keys) the DB2 kernel navigates, top down, scanning the clustering index to determine an appropriate location for the row. Therefore, INSERT, and some UPDATE operations on a table with a clustering index, incurs the overhead of index access that an unclustered table would not. Techniques like “append on” (APPEND ON option on the CREATE and ALTER TABLE statements) can minimize this overhead by placing all new rows at the end of the table. Therefore, clustering indexes provide approximate clustering, and data often becomes unclustered over time. The REORG utility can be used to reorganize the data rows back into perfect cluster order, although, for online REORGs, this can be a time-consuming and log-intensive operation.


To create clustering indexes, simply add the CLUSTER keyword on the create index statement as shown in the following example, where a clustering index MyIndex will be created on column C1 of table T1. There can be only one clustering index per table.

CREATE INDEX MyIndex on T1 (C1) CLUSTER

Because data clustering can deteriorate over time when using a clustering index, clustering with MDC is preferred as a best practice as it guarantees clustering at all times, and provides the option to clustering along multiple dimensions concurrently. See the discussion on MDC for help on determining which method to use.

Utilize the following index design best practices:

• Index every PK and most FKs in a database. Most joins occur between PKs and FKs, so it is important to build indexes on all PKs and FKs whenever possible. Indexes on FKs also improve the performance of RI checking.

• Explicitly provide an index for the PK. The DB2 database manager indexes the PK automatically with a system-generated name if one is not specified. The system-generated name for an automatically-generated index is difficult to administer.

• Columns frequently referenced in WHERE clauses are good candidates for an index. An exception to this rule is when the predicate provides minimal filtering. An example is an inequality such as WHERE cost <> 4. Indexes are seldom useful for inequalities because of the limited filtering provided.

• Specify indexes on columns used for equality and range queries.

• Create an index for each set of fact table columns that join to a dimension. These columns do not have to be part of an explicit FK. Creating the index allows STAR JOIN access to plans that use dynamic bitmap index ANDing. Consider creating indexes on combinations of fact-table columns.

For example, if PRODKEY and STOREKEY join to the product and store the dimension respectively, consider creating an index on (PRODKEY, STOREKEY). This facilitates a hub or cartesian STAR JOIN access plan.

• Use the db2pd command, which indicates the number of times that indexes were used in order from highest to lowest. This can be helpful in detecting which indexes are commonly used. For example:

db2pd -db MY_DATABASE -tcbstats index

The indexes are referenced using the IID, which can be linked with SYSIBM.SYSINDEXES's IID for the index. At the end of the output (shown below


in two sections) is a list of index statistics. “Scans” indicates read access on each index, while the other indicators in the output provide insight on write and update activity to the index.

Left side of report:

Right side of report:

• Use the DB2 Design Advisor to indicate which indexes are never accessed for a specified workload and can therefore be dropped.

• Add indexes only when absolutely necessary. Remember that indexes significantly impact INSERT, UPDATE, and DELETE performance, and they also require storage.

• To reduce the need for frequent reorganization, when using a clustering index specify an appropriate PCTFREE at index creation time to leave a percentage of free space on each index leaf page as it is created. During future activity, rows can be inserted into the index with less likelihood of causing index page splits. Page splits cause index pages not to be contiguous or sequential, which in turn results in decreased efficiency of index page prefetching.

Note: The PCTFREE specified when you create the relational index is retained when the index is reorganized.

Dropping and recreating, or reorganizing, the relational index also creates a new set of pages that are roughly contiguous and sequential and improves index page prefetch. Although more costly in time and resources, the REORG TABLE utility also ensures clustering of the data pages. Clustering has greater benefit for index scans that access a significant number of data pages.

• Examine queries with range or with ORDER BY clauses to identify clustering dimensions.

• Clustering indexes incur additional overhead for INSERT and some UPDATE operations. If your workload performs a large amount of updates, you will need to weigh the benefits of clustering for queries against the additional cost to INSERTS and UPDATES. In many cases, the benefit far outweighs the cost, but not always.


• Avoid or remove redundant indexes. An example of a redundant index is one that contains only an account number column when there is another index that contains the same account number column as its first column. Indexes that use the same or similar columns make query optimization more complicated, use storage, seriously impact INSERT, UPDATE, and DELETE performance, and often have very marginal benefits.

Although the DB2 database system provides dynamic bitmap indexing, index ANDing, and index ORing, it is good practice to specify composite indexes, referred to as multiple column indexes, if these columns are frequently specified in WHERE clauses.

• Choose the leading columns of a composite index to facilitate matching index scans. The leading columns should reflect columns frequently used in WHERE clauses. The DB2 database system navigates only top down through a B-tree index for the leading columns used in a WHERE clause, referred to as a matching index scan. If the leading column of an index is not in a WHERE clause, the optimizer might still use the index, but the optimizer is forced to use a non-matching index scan across the entire index.


Data clustering and multidimensional clustering (MDC) best practices MDC is a technique for clustering data along more than one dimension at the same time. However, you can also use MDC for single-dimensional clustering, just as you can use a clustering index. An advantage of an MDC table is that it is designed to always be clustered. A reorganization is never required to re-establish a high-cluster ratio.

To understand MDC, you must first understand some basic terminology: Cells are the portion of the table containing data having a unique set of dimension values—the intersection formed by taking a slice from each dimension. Blocks are the unit of storage equal to an extent size (one or more pages) that is used to store a cell. Your extent size specification determines the size of the block (or cell).

Block indexes for MDC tables Unlike traditional indexes created by the CREATE INDEX syntax, which index each row in a table, MDC indexes the rows in the table by block, called block indexes. MDC block indexes are typically 1/1000th of the size of row-based indexes, and provide not only huge savings in storage for the index, but massive efficiencies on all block index operations (such as index scan, index ANDing, and index ORing). INSERT and UPDATE operations are also enhanced because the block index is only updated if a new cell is created.

As shown in Figure 4, block indexes provide a significant reduction in disk usage and significantly faster data access:


Figure 4. How row indexes differ from block indexes

The MDC table shown in Figure 5 is physically organized such that rows having the same Region and Year values are grouped together into separate blocks, or extents.

MDC block indexes are created for each dimension as well as the composite dimension. For example, if the dimensions for a table are Region,Year then a block index is built for Region, for Year, and for the composite dimension Region,Year.


Figure 5. A multidimensional clustering table (MQT)

An MDC table defined with even just a single dimension can benefit from these MDC attributes, and can be a viable alternative to a regular table with a clustering index. This decision should be based on many factors, including the queries that make up the workload, and the nature and distribution of the data in the table. A high cardinality column is not a good choice for a single-dimension MDC because you will get a cell for each unique value.

Maintaining clustering automatically during INSERT operations Automatic maintenance of data clustering in MDC tables is ensured using composite block indexes3. These indexes are used to dynamically manage and maintain the physical clustering of data along the dimensions of the table over the course of INSERT operations. When an insert occurs, the composite block index is probed for the logical cell corresponding to the dimension values of the row to be inserted. The block index is not updated unless a new cell is created.

3 A composite block index is automatically created and contains all columns across all dimensions. It is used to maintain the clustering of data over insert and update activity, and might also be selected by the optimizer to efficiently access data that satisfies values from a subset, or from all, of the column dimensions.


As shown in Figure 6, if the key of the logical cell is found in the index, its list of block ID (BIDs) gives the complete list of blocks in the table having the dimension values of the local cell. This limits the number of extents of the table to search for space to insert the row.

Figure 6. Composite block index on YearAndMonth, Region

Because clustering is automatically maintained, reorganization of an MDC table is never needed to re-cluster data. Also, MDC can reuse empty cells that result from the mass deletion of rows without a REORG. However, reorganization can still be used in rare situations to reclaim space. For example, if cells have many sparse blocks where data could fit on fewer blocks, or if the table has many pointer-overflow pairs, a reorganization of the table would compact rows belonging to each logical cell into the minimum number of blocks needed, as well as remove pointer-overflow pairs.

Benefits of using MDC The value of MDC is profound. It improves complex query performance by 10 times in some cases and you can use it for roll-in and roll-out of data. Other benefits include the following ones:

• MDCs are multi-dimensional. For example, data can be perfectly clustered along DATE and LOCATION dimensions; cells and ranges are created automatically as new data arrives.

• MDCs can be used in conjunction with normal RID-based indexes, range partitioning, and MQTs. Index ANDing or ORing of block-based and RID-based indexes is a possible access path that can be chosen by the DB2 Optimizer.

• MDCs are used with intra-query parallelism, DPF (shared nothing) parallelism, and LOAD, BACKUP, and REORG operations.


• MDC dimensions, unlike range-partitioned tables, are dynamic; new cells get created within the table automatically as unique new data representing new cells arrives in the table either through SQL operations (including JDBC, CLI, and so forth), or through utility operations such as LOAD and IMPORT. Empty cells can also be reused during these operations.

• MDCs maintain clustering, and, as such, do not need REORGs to maintain cluster ratios.

The following example shows how to define an MDC table:

CREATE TABLE T1 (c1 DATE, c2 INT, c3 INT, c4 DOUBLE, c5 INT generated always as (INT(C1)/100) ) ORGANIZE BY DIMENSIONS (c5, c3)

The ORGANIZE BY clause defines the clustering dimensions. The table is clustered by C5 and C3 at the same time. C1 is coarsified4 to C5, which contains fewer distinct values (days are reduced to months).

NOTE: The coarsified generated column(s) are used in the MDC block indexes to perform cell-level elimination of data. Calculated columns are fully supported by MDC and the DB2 Optimizer.

The key design challenge of MDC is the careful selection of the clustering dimensions. If you choose clustering dimensions that result in too many cells, storage costs can increase substantially. The reason for this is important to understand. In an MDC table, every cell is allocated as many storage blocks on disk as required. Storage blocks are by design equal to the extent size of the table space that holds a table. The number of storage blocks is 0 if a cell has no data. However, in a typical table a cell stores several rows, resulting in one or more storage blocks being allocated to the cell. For every cell that has data, there is a chain of blocks, which typically contains a partially filled block. Therefore, there could be wasted storage for each cell (not each block), proportional to the size of the storage block. New blocks are created only when the previous block is full (or nearly full). If rows are deleted and the cell is empty, the database manager can reuse the space and avoid the need for a reorganization (for space reclamation). Storage blocks are by design equal to the extent size of the table space that holds a table. If the number of cells in the table is very large, the storage waste is large. If MDC is poor and results in a huge number of cells, the table storage requirement expands dramatically, and MDC can also be a performance detriment. However, when designed

4 The term coarsification refers to a mathematics expression to reduce the cardinality (the number of distinct values) of a clustering dimension. A common example of a coarsification is the date where coarsification could be by date, week of the date, month of the date, or quarter of the year.


well, MDC tables are only slightly larger than non-MDC tables, and offer profound benefits for clustering and roll-in and roll-out of data (as discussed in the paragraphs that follow). The key is to use low-cardinality columns for the dimensions of an MDC.

Figure 7 shows storage block and cell allocation. As shown, each cell contains a set of storage blocks. Most of the blocks are filled with data, but for each cell there is a block at the end of the chain which is partially filled to a lesser or greater degree.

Figure 7 MDC storage by cell

If you have sample or actual data, using SQL, you can measure the number of expected MDC cells for any given potential MDC design, as follows:

SELECT COUNT(*) FROM (SELECT DISTINCT COL1, COL2, COL3 FROM MY_FAV_TABLE) AS NUM_DISTINCT;

COL1, COL2, and COL3 represent the MDC dimensions for a 3-dimensional MDC table. The resulting number multiplied by the extent size of the table will give you an upper bound on the extent growth (not size) of the table when converted to MDC.

As described in the previous section, another key value of MDC is that the DB2 database manager automatically creates indexes for MDC tables over the MDC dimensions of the table. These special indexes (call block indexes) index data by block instead of by row. This


results in associated run time performance benefits for queries and minimal overhead for INSERT, UPDATE and DELETE operations.

MDC provides features that facilitate the roll-in and roll-out of data:

o MDC has much less block index I/O during the roll-in process because the block index is only updated once when the block is full (not for every row inserted).

o Inserts are also faster because MDC reuses existing empty blocks without the need for index page splitting.

o Locking is reduced for inserts because they occur at a block level rather than at a row level.

o There is no need to REORG data after roll-in and roll-out.

MDC storage scenario You want to create an MDC for a Transaction Fact on Date, Product Name, and Region. Here are some variables to consider for the MDC creation:

• There are 365 days in a year • There are 100,000 products for company XYZ • There are 10 regions for company XYZ

Initial MDC creation

If the MQT was created strictly on the Date, Product and Region column, there would be 1,000,000 new cells created daily (1 x 100,000 x 10) and 365 million cells per year (previous x 365).

In regions where transactions are low, there will be a lot of sparse pages, and even empty pages. This could lead to a lot of unnecessary space being used by allocating so many cells (pages) to contain this block of data. This is not good.

Improving the creation of the MDC

Use functions to coarsify and limit MDC cardinality. For example:

• If you use the month function on the Date, you would have 12 results per year

• If you substring the Product name to pick the first character of the Product name, you could have 26 potential results

• Leave Region as is with 10 results

Using the recommendation in this scenario, every year, the MDC would have 12*26*10 = 3210 cells or about 8-9 cells per day. This would eliminate the scarcity of data on many of


the pages, and provide a reasonable cardinality for the MDC to be effective in providing a performance benefit.

MDC run time overhead and benefit considerations MDC is designed to provide large performance benefits for queries and improvement for many DELETE scenarios. Even so, MDC tables do incur overhead over non-clustered tables, while offering significant performance benefits over tables that are clustered using a clustering index. Consider first the overhead of MDC versus an unclustered table:

• INSERT operations on a non-clustered table access each index to add a reference to the inserted row. In contrast, INSERT on an MDC table requires an initial read to the MDC composite block index in order to determine to which cell and block the row belongs, followed (after the insert on the table) by access to each index in order to insert a reference to the row. (Clustering indexes incur a similar overhead).

• If the MDC table includes a generated column to coarsify one of the dimensions, every INSERT will incur a small processing overhead to compute the generated value for that column as all generated columns in DB2 are fully materialized, that is, calculated and stored within the row.

However, when compared to a table clustered with the use of a clustering index, MDC offers significant performance advantages:

• Index maintenance is dramatically reduced during INSERTs compared to the processing required for a clustering index, as the DB2 database manager only updates the block index when the first key is added to a block—unlike a RID-index where every single inserted row to the table requires an update to all indexes. That is, if there are 1000 rows per block, the rate of index updates is 1/1000th what it would be for a RID index.

• The index update is cheaper, because the index is smaller and therefore has fewer levels in the tree. Fewer levels in the B+-tree means less processing to determine the target leaf page for the index entry.

In both cases, whether clustered by a clustering index or by MDC, the DB2 database manager will access the index (clustering index of the block index) during INSERT to determine the target location of the row. Again the index is much smaller, and the height of the tree usually shorter resulting in a faster search.

Determining when to use MDC versus a clustering index MDC provides huge value over a clustering index because the clustering is guaranteed and automatic. In general you can achieve cluster ratios with MDC anywhere between 93%-100% depending on the coarsification needed. In contrast, clustering indexes can cluster data close to 100% initially, but becomes declustered over time, and might require time-consuming REORG to recluster the data. In general, use MDC to create and maintain data clustering in your database unless:


• MDC would require coarsification and you are unable to add a generated column to your table.

• The MDC version of the table results in table growth you are unable or unwilling to incur. Well-designed MDC tables are typically 2-15% larger than non-MDC tables.

• You find that MDC clustering will give you a lower cluster ratio (for example, 93%) due to coarsification and you are willing to incur the periodic REORG processing in order to get the improved clustering that can be achieved with a clustering index.

Use the following MDC design best practices:

• Start your selection for MDC candidates by looking for columns that are used as predicates for equality, inequality, range, and sorting. To improve roll-in of data, your dimension should match your roll-in range.

• Strive for density! Remember, an extent is allocated for every existing cell—regardless of the number of rows in that cell. To leverage MDC with optimal space utilization, strive for densely filled blocks.

• Constrain the number of cells in an MDC design. Keep the number of cells reasonably low to limit how much additional storage the table will require when converted to MDC form. 5% to 10% growth for any single table is a reasonable goal. (See the discussion on MDC cells in the “Benefits of using MDC” section.) There are exceptions, where even double the amount of growth is useful, but they are rare.

Note: Block indexes are usually so small as a percentage of the corresponding table size that, in most cases, you can ignore the storage required for them.

• Coarsify some dimensions to improve data density. Use generated columns to create coarsifications of a table column that have much lower column cardinality. For example, create a column on the month-of-year part of a date column, or use (INT(colname))/100 to convert a DATE column with the format Y-M-D to Y-M. For example,

CREATE TABLE Sales (SALES_DATE DATE, REGION CHAR(12), PRODUCT CHAR(30),… MONTH GENERATED ALWAYS AS ((INTEGER(DATE)/100)… ORGANIZE BY (MONTH, REGION, PRODUCT)

For the query:


select * from sales where sales_date>”2006/03/03” and date<“2007/01/01”..

The compiler generates the additional predicates:

month>=200603 and month<=200701

To reduce wasted space, specify a small table space extent size, which reduces your MDC Block Size.

• Don’t select too many dimensions. It is very rare to find useful designs that have more than three MDC dimensions without unreasonable storage requirements.

The more dimensions you have, the more the cardinality of cells will increase exponentially. This makes it extremely hard to constrain the expansion of the MDC table to the design goal of approximately 10% (versus a non-MDC table). If the table expands unreasonably (for example, more than two times its non-MDC size) not only will you require more storage, but the gains of clustering might be lost due to the increase in doing I/O on partially filled blocks.

A simple example: Consider a table with three dimensions worth clustering on, each with 10,000 unique values. If these columns have no correlation between them, then clustering on all three dimensions without coarsification would result in 10,000 x 10,000 x 10,000 cells, with a partially filled block per cell. If each block is 1MB, the overhead from this careless design would be around 500,000 TB!

• Consider single-dimensional MDC. Single-dimensional MDC can still provide massive benefits compared to those of traditional single dimensional clustered indexes. The reasons are that:

o Clustering is guaranteed.

o MDC tables are indexed by block and not by row, resulting in indexes that are roughly 1/1000 the size of traditional row-based indexes.

o DELETE performance using MDC roll-out is improved. RID indexes on MDC are updated asynchronously with DB2 9.5.

o MDC facilitates roll-in of data.

o Use single-dimensional MDC (with coarsification if needed) to enforce clustering instead of using a clustering index. Clustering indexes cluster data on a best effort basis (there are no guarantees of how well they cluster), and over time, they tend to become unclustered. In contrast MDC guarantees clustering, avoiding the need to reorganize data. (See the coarsification example in the “MDC Scenario” section.)


• Be prepared to tinker (on a test database). It might take trial and error to find an MDC design that works really well. Use the DB2 Design Advisor with the –m C option (C for clustering search). You can also use the db2mdcsizer utility, which determines space requirements and simplifies administration of MDC tables. This utility is available on AlphaWorks for certain versions of DB2 products. MDC modifications will not impact your application programs.

• Use the MDC selection capability of the DB2 Design Advisor with a representative workload to find suitable MDC dimensions for an existing table.


Database partitioning (shared-nothing hash partitioning) best practices Database partitioning is a technique for horizontally distributing rows in the database across many database instances that work together to form a single large database server. These instances can be located within a single server, across several physical machines, or a combination. In DB2 products, this is called the Database Partitioning Facility (DPF).

Database partitioning allows the DB2 database manager to scale to hundreds of instances that participate in the larger database system. The scalability of this design can approach near linear scaleout for many complex query workloads. As such, database partitioning has become extremely popular for data warehousing and BI workloads due to its near linear scaleout characteristics and its ability to scale to hundreds of terabytes of data and hundreds of CPUs. The architecture is less popular for OLTP processing due to the inter-instance communication incurred on each transaction, which though small, can still be very significant for short running transactions typically found in OLTP workloads. DPF might be used for OLTP applications that require a cluster of computers for throughput.

Shared-nothing hash partitioning hashes rows to logical data partitions. The primary design goal of hash distribution is to ensure the even distribution of data across all logical nodes (as range partitioning tends to skew data). These partitions might reside within a single server or be distributed across a set of physical machines, as shown in Figure 9:

Figure 9 Table hash-partitioning


The scalability of shared-nothing databases has proven to be nearly linear for a wide range of complex query workloads. Also, the modular nature of the design lends itself to linear scaleout as storage pressures, workload pressures, or both grow. As a result, shared-nothing architectures have dominated data warehousing for the past decade. Database partitioning is implemented without impact on existing application code, and is completely transparent. Partitioning strategies can be modified online with the redistribution utility without affecting application code.

The primary design choice is determining which columns to use to hash partition each table that comprises the database-partitioning key. The goals are twofold:

1. Distribute data evenly across database partitions. This requires choosing partitioning columns that have a high cardinality of values to ensure an even distribution of rows across the logical partitions.

2. Minimize shipping of data across database partitions during join processing. Collocation of rows being joined will occur (avoiding movement) if the partitioning key is included in the WHERE clause.

Another central problem in designing shared-nothing data warehouses is determining the best combinations of memory, CPUs, buses, storage capacity, storage bandwidth, and networks. How much or how many do you need of each of these?

To help solve this problem, IBM provides the IBM Balanced Warehouse™, which is based on DB2 database system’s shared nothing architecture. It was developed through IBM best practices used for successful client implementations.

Balanced Warehouse and Balanced Configuration Units (BCU) The Balanced Warehouse combines building blocks known as Balanced Configuration Units (BCU). These building blocks are preconfigured, pre-tested, and tuned for performance to provide an ideal volume and ratio of system resources. The BCU combines the best practices for database configuration and hardware components to greatly simplify warehouse setup and deployment. Scores of best practices for resource ratios and database configuration have been incorporated into the Balanced Warehouse.

Figure 10 shows the various Balanced Warehouse offerings for 2007 and 2008. 5 You can see that the Balanced Warehouse currently offers three classes of offerings, C, D and E. These three classes offer increasing power and scalability to the solution. The C class is an entry level offering intended for SMB markets, or systems integrators that can be contained in a single server. D and E class offerings scale out to much larger configurations using DB2 database partitioning capabilities.

5 For an up-to-date version of the Balanced Warehouse offerings refer to the Balanced Warehouse web pages online at: http://www.ibm.com/software/data/infosphere/balanced-warehouse/


Figure 10 Balanced Warehouse offerings6, 2007-2008

Use the following database partitioning best practices:

• Select partitioning keys that have a large number of values (high cardinality) to ensure even distribution of rows across partitions. Unique keys are good candidates. If you are having a difficult time finding a key that can distribute data evenly across partitions, you might want to consider using a function on a column.

• Avoid choosing a partitioning key with a column that is updated frequently; this could incur additional overhead on the update to repartition the row to another partition.

• If possible, as your partitioning key, try to choose a column that has a simple datatype, such as fixed-length character or integer. The hashing performance can benefit from doing this versus selecting a complex datatype.

• To increase collocation, consider using the join column as the partitioning key for a table that is frequently joined (provided that the columns have high cardinality to satisfy the even distribution of rows). Select the minimum number of columns

6 Prices reflected in the “Estimated Cost” in this table are current as of May 2008, exclude applicable taxes, and are subject to change by IBM without notice.


hat the column will be in the join predicates (improving the odds of collocation).

• Ensure that unique indexes are a superset of the partitioning key.

e are a reasonable rule of thumb) or infrequently updated tables in order to:

o Improve collocation and reduce movement over the network

o Assist in the collocation of joins

allowing the database to manage precomputed values of the table data.

required to achieve high cardinality and even distribution of rows in the partitioning key. Reducing the number of columns in the partitioning key improves the likelihood t

• Use replicated MQTs for small tables (tables that are less than 3% of the total database size, or less than 5% of the largest table siz

o Improve performance of frequently executed joins in a partitioned database environment by

For example:

CREATE TABLE R_EMPLOYEE AS ( SELECT EMPNO, FIRSTNME, MIDINIT, LASTNAME, WORKDEPT FROM EMPLOYEE ) DATA INITIALLY DEFERRED REFRESH IMMEDIATE IN REGIONTABLESPACE REPLICATED;

To update the content of the replicated MQT, run the following statement:

REFRESH TABLE R_EMPLOYEE;

Note: After using the REFRESH statement, you should run RUNSTATS on the replicated table as you would on any other table.

, ber of distinct values and skew within the corresponding

fact-table column.

e “small” is relative and depends on the installation’s available storage.

ical subset of dimensions that don’t match the partitioning key, as follows:

o Partition any remaining dimensions on their PK.

• Collocate the largest dimension-table’s key as the partition key for the fact tableconsidering the num

• Replicate small dimensions (less volatile) tables, wher

• Replicate a horizontal or vert


o After creating a replicated table to improve collocation, remember to collect table and index statistics (or use the DB2 automatic statistics collection feature). Remember to implement the same indexes on the replicated MQTs as you have defined on the base table(s).

o Define replicated MQTs as REFRESH IMMEDIATE if they are small and rarely updated. Try to limit the number of parallel ETL jobs executing when REFRESH IMMEDIATE is specified. A deferred refresh strategy provides less overhead for updates of the base table.

• Distribute large tables on several partitions. Small tables with less than one million rows should be located on one database partition only.


Table (range) partitioning best practices Table partitioning should be used predominantly to facilitate improved roll-in and roll-out of data. It enables an administrator to add a large range of data (such as a new month of data) to a table, en-masse, and perhaps more importantly it allows an administrator to remove data from a table, or from the database, en-masse, almost in an instant (without data movement).

DB2 database systems' unique asynchronous index-cleanup technology means that even while using global indexes that index data across several range partitions, a range can be detached from the table, and the index keys associated with that range become immediately invisible to incoming queries. The keys are subsequently deleted quietly in a background process with negligible impact to the executing database workload.

Table partitioning also offers side benefits of increased query performance through an internal process called partition elimination, which, in many cases, enables the query compiler to select improved execution plans. This is a secondary benefit of table partitioning.

Furthermore, table partitioning enables the division of a table into several ranges that are stored in one or more physical objects within a database logical partition. The goal of table partitioning is to logically organize data to facilitate optimal data access and the roll-out of data. The division of the table into ranges is transparent to the application, and can therefore be designed at any point in the application development cycle.

See “Best Practices: Data Life Cycle Management” white paper for more details on table partitioning. Other attributes and features of table partitioning include the following ones:

• Each range can be in a different table space

• Ranges can be scanned independently

• Performance for certain BI-style queries is improved through partition elimination

• New ALTER ATTACH/DETACH statements for easier roll-in and roll-out of data:

o New ATTACH operation for roll-in o New DETACH operation for roll-out

• SET INTEGRITY is now online (allowing read/write access to older data)

• For new ranges, ADD plus LOAD operations can be used over ATTACH plus SET INTEGRITY operations


The following example shows how to define a partitioned table:

CREATE TABLE SALES(SALE_DATE DATE, CUSTOMER INT, …) PARTITION BY RANGE(SALE_DATE) (STARTING ‘1/1/2006’ ENDING ‘3/31/2008’, STARTING ‘4/1/2006’ ENDING ‘6/30/2008, STARTING ‘7/1/2006’ ENDING ‘9/30/2008’, STARTING ‘10/1/2006’ ENDING ’12/31/2012’);

This statement results in the creation of four table objects, each one of which stores a range of data, as shown in Figure 8:

Figure 8 Table partitioning by date range

Use the following table partitioning best practices:

• Use table (range) partitioning to rapidly delete (roll-out) ranges of data. Match range-partitioning periods to roll-in and roll-out ranges. For example, if you need to roll-in and roll-out data by month, range partitioning by month is a reasonable strategy.

• Partition on DATE columns. Roll-in and roll-out scenarios are almost always based on dates. Improved query execution plan (QEP) selection, using partition elimination7, and a significant set of those opportunities are also based on date predicates.

• Limit the number of ranges. Remember that each range is a table object with a minimum of two extents. Avoid designs with an excessive number of partitions. A rule of thumb is at least 50MB of data in each range (several gigabytes of data per range is best). Make the size of your ranges match the size you typically roll-out.

7 Partition elimination improves your SQL workload performance. Partition Elimination is a strategy used internally by the query compiler. The query compiler automatically determines if it can exploit the table partitioning for this purpose. Typically dates can satisfy the roll-out requirement and often provide partition elimination benefits to many queries.


• When adding new ranges, ADD table partition with a LOAD operation is often faster than the ATTACH of a partition with subsequent SET INTEGRITY operations.

o The LOAD utility has an option to maintain indexes incrementally, and to write only a single log row for the event, regardless of how many rows are inserted into the table. Although the LOAD utility supports concurrent read access to older data, queries need to be drained.

• Consider separating table partitions in separate table spaces to facilitate backup

and recovery. Table partitions (ranges) can be backed up and restored by table space.

• Place global indexes, which can be large, in their own individual table space. Placing all the global indexes in a single table space can impact the elapsed time of the BACKUP utility (because the index table space can become much larger than the data table spaces).

• Ensure and maintain the clustering of data by making the range-partitioning key the leading column in a clustered index (no MDC). Data will not be clustered properly if your clustered index is not prefixed by your partition key. For example,

PARTITION BY RANGE (Month, Region) CREATE INDEX … (Month, Region, Department) CLUSTER

• Use page-level sampling to reduce RUNSTATS time. A sampling rate of 10% to 20% provides good quality statistics with a major performance improvement. For details, see “Best Practices: Writing and Tuning Queries for Optimal Performance” white paper.

• Place table partitions in different table spaces; this allows you to backup new ranges as data is rolled in to the new range, without having to backup the other partitions. This greatly improves the speed and reduces the size of backup images.


UNION All View (UAV) partitioning best practices Prior to the availability of DB2 9 table partitioning, applications often had a requirement to partition data by ranges. By creating a table for each range with the appropriate constraints, DBAs were able to provide a single system view by the creating a UAV for all the tables. For example:

Create Table TestQ1 (Col 1 date) Alter Table TestQ1 add constraint q1_chk (month(dt) in (1,2,3)

Repeat the table create/constraint for each quarter:

Create View Test as Select * from TestQ1 Union Select * from TestQ2

Table partitioning provides a single view of the table to the compiler and optimizer. This allows more aggressive predicate push-down to the different ranges than UAV, and a more consistent model for partitioning data. Table partitioning is the recommended method for implementing range-based partitioning for most application requirements. NOTE: UAVs are not a parallel processing method for dividing work across CPUs. The DB2 Database Partitioning Facility (DPF) should be used for that purpose (see the discussion on “Database partitioning”). As with Table partitioning, you can use UAV to store ranges of data in distinct table spaces, providing granularity for BACKUP operations (see the discussion on “Table partitioning”). The advantages of the UAV design predominantly revolve around the ability to operate on some ranges independent of others, or to design some ranges with unique attributes. Conversely table partitioning provides a homogenous view of a range-partitioned table. Although table partitioning is generally preferred, there are advantages to UAVs:

• For replication: Historical tables in UAVs can be compressed. (Use UAVs when replication is needed on certain ranges of data, while other ranges that do not require replication can benefit from compression.)

• UAVs are utilized to reduce the granularity of utility operations (such as REORG and RUNSTATS). Utilities can operate on a given table containing a range. NOTE: REORG is commonly the most important of these. This is valuable when ranges are changing frequently requiring reclustering or recompression of a range. UAVs allow this operation to be performed on the subset of ranges that


require it. DB2 9.5 has automatic dictionary rebuild for table partitioning, alleviating the need to REORG a new range for compression.

• Heavily used ranges can be isolated into separate tables containing additional indexes or MQTs to optimize data access.

• UAVs provide end users with a single view of federated data (stored in multiple IBM or non-IBM databases). A UAV can provide a single view of data across several databases.

Table partitioning provides the following advantages over the UAV partitioning approach:

• Preparation time is faster (one table instead of multiple tables in a view)

• Simpler management (one table, not multiple tables)

• Less catalog locking for roll-in and roll-out of ranges

• Unique indexes across all ranges supported

• Better handling of complex queries

• Simpler EXPLAINs (using the explain facility)

Migrating UAVs to table partitioning The migration of UAVs to table partitioning can be achieved without data movement by following this procedure:

1. Create a partition table with a single dummy partition and with a range that does not interfere with existing ranges. This requires the same page size and extent size.

2. ALTER ATTACH all tables in the UAV.

3. Drop the dummy partition.

4. Run SET INTEGRITY after all TABLE ATTACH commands. To speed up set integrity:

a) Drop all indexes.

b) Recreate indexes after SET INTEGRITY completes.

Use the following UAV partitioning best practices:

• Use database partitioning to achieve scalability, rather than UAVs.

• As with table partitioning, use UAVs in order to place ranges of data in distinct table spaces, improving BACKUP granularity.


Recommendation: Migrate UAVs to table partitioning, taking the following considerations into account:

• Newly developed applications with range-partition requirements should be implemented with table partitioning rather than with UAVs, unless you have strong requirements for one or more of the UAV advantages listed above.

• UNION ALL applications being migrated to table partitions utilizing deep compression should be implemented with DB2 9.5 in order to benefit from automatic dictionary compression.


Database partitioning, table partitioning, and MDC in the same database design best practices Database partitioning, table partitioning, and MDC can be implemented simultaneously in the same design.

• Database partitioning can be implemented to help achieve scalability and to ensure the even distribution of data across logical partitions.

• Table Partitioning can be implemented to facilitate Query Partition Elimination and roll-out of data.

• MDC can be implemented to improve Query Performance and facilitate the roll-in of data.

This is a best practice approach for deploying large scale applications.

For example:

CREATE TABLE TestTable (A INT, B INT, C INT, D INT …) IN Tablespace A, Tablespace B, Tablespace C … INDEX IN Tablespace B DISTRIBUTE BY HASH (A) PARTITION BY RANGE (B) (STARTING FROM (100) ENDING (300) EVERY (100)) ORGANIZE BY DIMENSIONS (C,D)

See “Best Practices: Data Life Cycle Management” white paper for more details.

To deploy large scale applications, implement database partitioning, table partitioning, and MDC in the same database design.


Roll-in and roll-out of data with table partitioning and MDC best practices Design your partitioning strategy to use table partitioning for your roll-out strategy and to use MDC on a single dimension for your roll-in strategy.

For example, if you roll-in daily and roll-out monthly, specify an MDC on day and a Table Partition Key for month (calculated values are supported).

This approach reduces the number of table partitions and eases the DBA administrative tasks. It takes advantage of the roll-in features of MDC: reduced index I/O with block indexes and reduced logging.


Use table partitioning for roll-out, and MDC on a single dimension for roll-in.


Rolling-in large data volumes using table partitioning best practices Applications that need to roll-in very large data volumes can speed up the table attachment process by ADDing rather than ATTACHing table partitions, which avoids the need to execute SET INTEGRITY.

There is an alternative to ATTACHing a table partition: you also have the ability to ALTER ADD an empty table to a table partition. After the empty table has been added, you can populate the table using the LOAD utility (with read access to older data) or using inserts (logged).

LOAD will help provide superior performance, and can load either from external files or from a query definition using the “LOAD from cursor” capability.

For applications utilizing Deep Compression, DB2 9.5 facilitates this technique for rolling-in data because it provides Automatic Dictionary Compression, avoiding the need to REORG in order to compress data.


Use the following roll-in and roll-out best practices:

• Use table partitioning to roll-out large volumes of data.

• ALTER ADD an empty table to a table partition and populate it using the LOAD utility when using table partitioning for roll-in of data.


Materialized query table (MQT) best practices An MQT table is a table whose definition is based on the result of a query. The MQT contains pre-computed results. MQTs are a powerful way to improve response times for complex queries, especially queries that might require some of the following types of data or operations:

• Aggregated data over one or more dimensions

• Joins and aggregated data between tables in a group

• Data from a commonly accessed subset of data—that is, from a hot horizontal or vertical database partition

• Repartitioned data from a table, or part of a table, in a partitioned database environment

• Replicated MQTs can reduce network traffic for non-partitioned tables in a DPF environment

In addition to speeding up query performance, MQTs can be used on nicknames of federated data sources to maintain frequently accessed data locally. MQTs can be maintained with SQL or Q Replication (the system-maintained MQT option for Federated Nicknames is not supported).

MQTs are completely transparent to applications. Knowledge of MQTs is integrated into the SQL and XQuery compiler, which determines whether an MQT should be used to answer all or part of a query. As a result, you can create and drop MQTs, without making application code changes, much like you can create and drop indexes without making application code changes.

Figure 11 summarizes the characteristics of MQTs according to their refresh type. In the table, “Optimization” indicates that the DB2 database manager will exploit the deferred MQT where possible, when it processes a query, whereas, “No optimization” indicates that the MQT will not be looked at, since it could be arbitrarily stale; that is, the database manager does not know when the last refresh occurred against the MQT.

Note that MQTs can decrease INSERT performance of the base table.

To assist in problem determination, the DB2 9 explain facility indicates why an MQT was not chosen for an access path.


Figure 11 Summary of MQT characteristics by refresh type

Use the following MQT design best practices:

• Create an MQT by using the same or higher isolation level that is used by the queries for which you intend to use the MQT. The isolation levels, in order of descending restrictiveness, are RR, RS, CS, and UR.

• Focus on frequently-used queries that use a lot of resources. These queries provide the greatest opportunities for performance gains through MQTs.

• Set a limit on the number of MQTs that you are willing to maintain. There are two reasons for this:

o Each MQT uses storage space on disk and additional UPDATE overhead.

o Each MQT adds complexity to the search for the optimal QEP, increasing query compilation time.


• Decide on a limit for the amount of disk space available for MQTs. Generally, do not allocate more than 10% to 20% of the total system storage of a data warehouse for MQTs.

• Consider indexing the MQTs and execute RUNSTATS after index creation. Try to create an MQT that is generally useful to multiple queries. Often such an MQT is not a perfect match for a query and might require indexing. Replicated MQTs should have the same indexing design as the base table.

• Help the query compiler find matching MQTs. (MQT routing is complex.) Give the compiler as much information as possible by using the following techniques:

o Keep statistics on the MQTs up-to-date.

o Use RI on foreign columns in the MQT. (To avoid system overhead, specify non-enforced RI.) Make FK columns NOT NULL.

o Avoid problematic MQT designs that make routing difficult. Try to avoid using EXISTS, NOT EXISTS, and SELECT DISTINCT. Unless the MQT is an exact match for a query, these predicates can make it difficult for the query compiler to make use of the MQT.


Post-design tools for improving designs for existing databases

Explain facility best practices The explain facility can show you whether design features are being used. For example, it can show you whether indexes are being accessed in a QEP, whether partition elimination is being used, and whether queries are being routed to MQTs.

Consider the fragment shown in Figure 12 of the QEP from the explain facility for query 20 of TPC-H 8.

Figure 12 Fragment of QEP for Query 20 of TPC-H

The QEP clearly shows that the information for PARTSUPP requires access to both the index TPCD.UXPS_PK2KSC and the PARTSUPP table itself. How can you determine the reason?

8 TPC-H: The TPC Benchmark™H (TPC-H) is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions.


Looking at operator (15) you can see that the FETCH statement requires access to the PARTSUPP table because the index includes PS_PARTKEY and PS_SUPPKEY columns, but does not include the PS_AVAILQTY column. This strongly suggests that by adding the PS_AVAILQTY column to this index, you can avoid accessing the PARTSUPP table in the subplan, thereby improving performance.

The explain output shown in Figure 13 (from DB2 9.1) indicates which MQTs the optimizer considered but did not choose for a QEP, and explains why. The reason might be due to cost, or due to the fact that the MQT is not similar enough to be matched.

Examples:

explain plan for select c1, count(*) from t1 where c2 >= 10 group by c1;

EXP0073W The following MQT or statistical view was not eligible because one or more data filtering predicates from the query could not be matched with the MQT: “PKSCHO "."MQT2". EXP0073W The following MQT or statistical view was not eligible because one or more data filtering predicates from the query could not be matched with the MQT: “PKSCHO "."MQT3". EXP0148W The following MQT or statistical view was considered in query matching: “PKSCHO "."MQT1". EXP0148W The following MQT or statistical view was considered in query matching: “PKSCHO "."MQT2". EXP0148W The following MQT or statistical view was considered in query matching: “PKSCHO "."MQT3".EXP0149W The following MQT was used (from those considered) in query matching: “PKSCHO "."MQT1".

Figure 13 Using the explain facility to understand MQT selection

Utilize the explain facility to help understand your design choices.

B2 Design Advisor best practices

2 autonomic computing initiative. It is

ction (for database partitioning)

DThe DB2 Design Advisor is a key feature of the DBa push button solution: given a workload (user provided or system detected) and, optionally a disk constraint9, the Design Advisor recommends physical database design options that are designed to optimize the execution of the workload provided. The Design Advisor performs extensive “what-if” analysis, data sampling, and correlation modeling to explore thousands of design permutations that humans cannot.

The Design Advisor has the following capabilities:

o Index selection

o MQT selection

o MDC selection

o Partitioning sele

o Industry-leading workload compression

9 disk constraint: A limit on the amount of disk space the Design Advisor can consider available for adding new design features. For example, the limit might be 100MB, and that would mean that the new design aspects recommended by the Design Advisor, such as additional indexes or MQTs, should not consume more than an additional 100MB in total.


Many customers have reported using the Design Advisor to make dramatic improvements in physical database design, leading to performance improvements of over five times for individual queries or entire workloads. Of course, you should not apply the results from the Design Advisor without due consideration.

Figure 14 highlights the benefit of the Design Advisor. In this example, a decision-support database running the TPC-H workload and data set was created with a reasonable set of indexes, meaning that a good database designer could have come up with this set and considered it adequate. The Design Advisor was then used to provide additional recommendations for the database, which when applied resulted in a six-and-a-half time performance gain.

Figure 14 Benefits from DB2 Design Advisor

MDC selection capability of the DB2 Design Advisor For improved workload performance, use the MDC selection capability of the Design Advisor to obtain recommended clustering dimensions for use in an MDC table, including coarsification on base columns. Only single-column dimensions, and not composite-column dimensions, are considered, although single or multiple dimensions can be recommended for the table.

The MDC selection capability is enabled using the -m <advise type> flag on the db2advis utility. The advise types (“C” for MDC and clustering indexes, “I” for index, “M” for MQT, and “P” for database partitioning) can be used in combination with each other.


The MDC recommendations provided by the Design Advisor are intended to provide optimized density and to limit the amount of table expansion that will occur when the table is converted to MDC. The analysis operations within the advisor includes not only the benefits of block-index access, but also the impact of MDC on INSERT, UPDATE, and DELETE operations against the dimensions of the table.

The output includes generated-column expressions for each table for coarsified dimensions that appear in the MDC solution, and an ORGANIZE BY clause recommended for each table.

Use the following Design Advisor best practices:

• Provide a broad representation of your workload as input, and avoid running the Design Advisor for one query at a time. This allows the Design Advisor to make recommendations that apply to an entire workload rather than to a single query, perhaps to the detriment of other parts of the workload.

• Include as input the INSERT, UPDATE, and DELETE operations that occur in your workload so that the Design Advisor can model the drawbacks and benefits (of adding new design features) to queries. For example, new indexes have maintenance drawbacks in addition to their value in improving query execution times.)

• Use the MDC selection capability of the DB2 Design Advisor (on tables that are greater than 12 extents in size) to obtain recommended clustering dimensions for use in MDC tables for improved workload performance.

• Use Query Patroller or the DB2 9.5 Workload Manager to automatically capture your actual workload in a format that serves as input to the Design Advisor.


Best Practices Datatype selection

• Choose numeric datatypes over character datatypes whenever possible.

• Use data modeling tools, such as Rational Data Architect, to publish to a larger team.

• Store value definitions in a table, where the definitions can be joined to values to provide context.

Table normalization and denormalization

• Normalize your tables using Third Normal Form (3NF) for most general-purpose databases, the star schema or snowflake model for dimensional queries, and the IBM Layered Data Architecture for broad-based data warehousing, onLine analytical processing (OLAP), and business intelligence (BI).

Index design

• Design a basic set of indexes using workload predicates and primary keys (PKs) and foreign keys (FKs). Indexes are the single most important physical database design feature. (Remember that indexes and Refresh Immediate MQTs incur a penalty for INSERT, UPDATE, and DELETE operations.)

Data clustering and MDC

• Use MDC to improve query performance, and for roll-in and roll-out of data.


Database partitioning (shared-nothing hash partitioning)

• Use database partitioning to improve scalability for large BI applications.

• Focus on both high cardinality of the partitioning key and improved collocation of joins when selecting the partitioning key.

• Use hash-partitioning (recommended primarily for data warehousing, which benefits from shared-nothing databases).

Table (range) partitioning

• Use range-clustered tables (RCTs) to provide fast, direct access to data.

• Design table partitions based on roll-in and roll-out characteristics. Partitioning by month or financial quarter is a good strategy.

UNION ALL View (UAV) partitioning

• Use UAVs when replication is needed on certain ranges of data, while other ranges that do not require replication can benefit from compression. UAVs allow you to have different characteristics on different objects that underlay the view. In general, homogeneity provides a cleaner and more maintainable architecture. However, there are exceptions where this ability to mix and match is needed.

• Use database partitioning for scalability of decision support, business intelligence, data warehousing, and reporting workloads, rather than UAVs.

• Use table partitioning to improve recovery efficiency and roll-out efficiency.

Roll-in and roll-out of data with table partitioning and MDC

• Use table partitioning for roll-out, and MDC on a single dimension for roll-in.


Database partitioning, table (range) partitioning, and MDC in the same database

• Implement database partitioning, table partitioning, and MDC in the same database design to deploy large scale applications.

MQTs

• Use replicated MQTs to improve collocation of joins for database partitioning, and query access to aggregated data.

• Help the query compiler find MQTs by keeping MQT statistics up-to-date, by defining functional dependencies, and by defining referential integrity (RI) (including FK columns in the MQT, defined as NOT NULL). Avoid problematic MQT designs that make routing difficult by avoiding the use of EXISTS, NOT EXISTS, and SELECT DISTINCT clauses, unless the MQT is an exact match for the query.

Post-design tools for improving designs for existing databases

• Use the explain facility to help understand your design choices.

• Use the DB2 Design Advisor to generate ideas for physical database design improvements (for indexes, MQTs, and partitioning). When doing so, provide as input a set of queries, not just one query at a time. This allows the Design Advisor to make trade-offs across the workload.

• Utilize the DB2 9.5 Workload Manager (WLM), Query Patroller, Snapshot Scripts, or Statement Event Monitoring to automatically capture SQL statements for input to the explain facility and to the DB2 Design Advisor.


Conclusion

Physical database design is the single most important quality of any database. It affects the scalability, efficiency, maintainability and extensibility of a database like no other aspect of database administration. Although database design can be complex, a good design improves performance and reduces operational risk. Mastery of this talent is undoubtedly the cornerstone of professional database administrators.


Further reading • DB2 Best Practices

http://www.ibm.com/developerworks/db2/bestpractices/

• DB2 9 for Linux, UNIX and Windows manuals http://www.ibm.com/support/docview.wss?rs=71&uid=swg27009552

• DB2 Data Warehouse Edition documentation http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.dwe.welcome.doc/dwe91docs.html

• IBM Balanced Warehouse documentation http://www.ibm.com/software/data/infosphere/balanced-warehouse/

• IBM Data Warehousing and Business Intelligence documentation http://www.ibm.com/software/data/db2bi/

• IBM DB2 9.5 for Linux, UNIX, and Windows Information Center http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp

• S. Lightstone, T. Teorey, T. Nadeau, “Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more”, Morgan Kaufmann Press, 2007. ISBN: 0123693896

• Sam S. Lightstone, “Best Practices for Creating Scalable High Quality Data Warehouses with DB2”, IBM Information On Demand 2007 Global Conference, October 14 - 19, 2007. Mandalay Bay Resort, Las Vegas, NV

• T. Teorey, S. Lightstone, T. Nadeau, “Database Modeling & Design: Logical Design, 4th edition”, Morgan Kaufmann Press, 2005. ISBN: 0-12-685352-5

http://www.ibm.com/developerworks/data/bestpractices/

http://www-1.ibm.com/support/docview.wss?rs=71&uid=swg27009552

http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.dwe.welcome.doc/dwe91docs.html

http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.dwe.welcome.doc/dwe91docs.html

http://www-306.ibm.com/software/data/infosphere/balanced-warehouse/

http://www.ibm.com/software/data/db2bi/

http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp


Contributors

Kevin L. Beck Information Management Software

Brad Cassells DB2 Information Development

Karl Fleckenstein Senior IT Architect Lead Architect for SAP/DB2 Solutions

John Hornibrook DB2 Query Optimization Development

Norma Mullin Enterprise Data Management Consulting IT Specialist

Reuven Stepansky Senior Managing Specialist North American Lab Services

Tim Vincent Chief Architect DB2 LUW


Notices This information was developed for products and services offered in the U.S.A.

IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service.

IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to:

IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A.

The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.

Without limiting the above disclaimers, IBM provides no representations or warranties regarding the accuracy, reliability or serviceability of any information or recommendations provided in this publication, or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information contained in this document has not been submitted to any formal IBM test and is distributed AS IS. The use of this information or the implementation of any recommendations or techniques herein is a customer responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Anyone attempting to adapt these techniques to their own environment do so at their own risk.

This document and the information contained herein may be used solely in connection with the IBM products discussed in this document.

This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.

IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.

Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment.


Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.

All statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only.

This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental.

COPYRIGHT LICENSE: © Copyright IBM Corporation 2008. All Rights Reserved.

This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are provided "AS IS", without warranty of any kind. IBM shall not be liable for any damages arising out of your use of the sample programs.

Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml

Windows is a trademark of Microsoft Corporation in the United States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

Other company, product, or service names may be trademarks or service marks of others.

Documents

IBM DB2 for Linux, UNIX, and Windows Best Practices ...public.dhe.ibm.com/.../bestpractices/DB2BP_Physical... · IBM® DB2® for Linux®, UNIX®, and Windows® Best Practices . Physical