An expert system approach for database application tuning

Data & Knowledge Engineering 8 (1992) 35-55 35 North-Holland

An expert system application tuning

approach for database

F r a n c e s c a C e s a r i n i a, M i c h e l e M i s s i k o f f b a n d G i o v a n n i S o d a a aDipartimento di Sistemi e lnformatica, Via S. Marta, 3, 50139 Firenze, Italy blASI-CNR, viale Manzoni, 30, 00185 Rorna, Italy

Abstract

Cesarini, F., M. Missikoff and G. Soda, An expert system approach for database application tuning, Data & Knowledge Engineering 8 (1992) 35-55.

The paper describes a method and the principles relative to the tuning of database applications using an expert system. In database applications the maintenance concerns the reorganization of the database, generally decoupled from the software interventions. In our proposal we consider jointly the internal schema of the database and the application programs, i.e. the calls they issue against the underlying DBMS. There are several cases in which it is possible to reformulate a query to have it performed more efficiently by the DBMS. The paper presents first a method that allows to classify transactions on the basis of operations performed on the database, then the optimal internal schema is given for each class of database operations, and finally it is shown how to transform the result of this work in the form of rules for an expert system. The method has been tested using a commercial RDBMS and expert system shell. The application program has been generated artificially in order to have a mix of transactions, difficult to be found in a single application. The results show that performing periodically the database application tuning can improve significantly the performance of the overall application.

Keywords. Database application tuning; expert systems; SQL; query optimization.

1. Introduction

Applications based on relational database systems [8, 9] are becoming increasingly popular, mainly because of the simplicity and the expressiveness of their underlying data model, which facilitates the interactions with the causal user and the development and maintenance of database applications.

In an early stage, database systems based on the relational model suffered from poor performance. Many research activities have tackled this problem, particularly from a database design, physical data organization and query optimization point of view. (Per- formance improvement due to specialized hardware devices is not considered here.) There is another aspect that is rarely examined explicitly: the improvement related to the syntactic form of transactions, which can affect the execution strategy determined by the query optimizer (hence the performance) in a substantial way.

Many methodologies [4] and tools [7, 3] have been proposed for obtaining good logical design, a fundamental element for good performance. Once the conceptual schema for a given application has been defined, it is necessary to consider the physical organization of the database (internal schema), the run-time query optimization, and the syntactic form of the application transactions.

0169-023X/92/$05.00 © 1992 - Elsevier Science Publishers B.V. All rights reserved

36 F. Cesarini et al.

In designing the internal schema of a relational database, choices are determined by the expected use of the database (i.e. by the workload profile) [19, 21]. In general, logical schemas are quite stable over time, while internal schemas, defined at the beginning of the life of the application (on the basis of a forecast on workload profile), must undergo periodical restructuring. Restructuring and physical design have many characteristics in common, and there are several techniques that are valid for both. In order to obtain a good physical organization, quite a few methods [17, 18] and tools [10, 12, 16] have been proposed, essentially based on analytical approaches. The main problem of analytical models is the need to oversimplify reality, whose complexity tends to produce unmanageable models.

The third issue is that of run-time query optimization [22, 11]. The Query Optimizer (QO) analyzes the transactions at run-time with the aim of determining the most advantageous access paths for their execution. Today's relational DBMSs embody effective query optimizers capable of providing significant performance. However, run-time optimization is inherent- ly local (i.e. in general it analyzes one transaction at a time), and makes use of the existing data structures (i.e its action does not concern any changes in the internal schema, it only regards the correct use of existing ones).

The fourth issue refers to application transactions. The way a transaction is coded influences the behavior of the QO and, consequently, the way data are accessed. For example, let us consider the way the query optimizer of the relational DBMS used in our case study (see Section 5) manages the following transactions when an index on the attribute-name appearing in the WHERE clause has been built:

1) SELECT * FROM(relation-name) W H E R E (attr-name) = K

2) SELECT * FROM(relation-name) WHERE (attr-name) + 0 = K

The first transaction is executed with the help of the index which, on the contrary, is not used for the second one. As a matter of fact, the presence of an expression in the WHERE clause prevents the QO from using the index. Indeed, there are some cases in which the sequential scan of a relation (that avoids the use of the index) is more convenient. When most of the transactions require the use of an index, the few that function bet ter with a sequential scan can be modified by introducing a dummy expression into them, as exemplified above.

For a given transaction, there is (at least) one optimal access path and a syntactic form best suited for it. DBMS manuals report directions on how to structure transactions in a convenient way. The problem of determining the optimal form of transactions, more precisely, of (transaction, access_path) pairs, arises when the physical design is made and each time a restructuring process is required. The activity aimed at solving this problem is referred to as database application tuning.

Tuning extends the activity of physical database design (and restructuring) because its scope is not limited to the internal schema of the database but also impacts the transactions (unlike other approaches, such as [15]).

The tuning process is performed in two phases. In the first phase, called local optimiation, each transaction is examined in order to determine the ideal access path and the syntactical form for the path's optimal use. The second phase, called global optimization, has the task of

Database application tuning 37

determining the internal schema that implements the highest number of access paths identified in the previous phase. In this phase conflicts may arise between incompatible access paths (i.e. access paths that cannot coexist in the same internal schema) and so a conflict resolution strategy is required. The optimal tuning of an application is extremely difficult to achieve because of the conflicting needs of different transactions.

The approach presented in this paper is based on expert system (ES) methodology [20] and presents a number of advantages.

An expert system is based on a collection of rules and facts (knowledge base: KB) that can be easily incremented and modified. It can be easily updated to embody new tuning strategies (drawn from experience in the field) and reflect modifications of the DBMS and/or application procedures. Furthermore, even incomplete knowledge can be accepted by an expert system thus making experimentation of partial solutions possible. Finally, an expert system is able to display its line of reasoning and is therefore a useful support tool for the Database Administrator.

The rest of this paper is organized as follows. In Section 2, basic definitions and the principles of the tuning process are presented. In Section 3 we describe the organization of the KB of the X-TUNER expert system, while its architecture is presented in Section 4. In Section 5 we show an example of knowledge base, implemented for a specific DBMS and a simple set of transactions. The last section reports a summary and the conclusions.

2. Database restructuring and tuning

In this section we describe the method for tuning database applications which is at the base of X-TUNER. As mentioned in the introduction, the tuning process is twofold: it not only operates on the internal schema of the database, as other methods do, but also on the form of the transactions, in order to determine the best (transaction, access_path) pairs.

Before describing the tuning process, we introduce a set of definitions aimed at modelling the elements of a database application that are of interest to the tuning process.

2.1. Database state modelling

A database application is characterized by a set of transactions and a database. For our purposes, the database is considered to have two parts: the data content, referred to as the extensional part, and the internal schema according to which the data part is organized.

A database extension E is a collection of relations, each of which is a subset of the Cartesian product of a given set of domains [8]. The relations, together with their attributes and domains, are defined in the logical schema of the database.

E = { r l , r 2 . . . . , rm}.

Given an application, E evolves in time after each successful update. For a given DBMS, there is a finite set of physical organizations applicable to database

relations, such as indexing, clustering, and sorting. The Database Administrator (DBA) has the task of deciding for each relation what auxiliary data structures should be associated with each attribute (even none, in case of flat file organization). The internal schema of a database is represented by a set of organization descriptors:

O={oi . j } i = l , . . . , m j = l . . . . . h .


Each oi4 indicates how the relation i is physically organized on the attribute j; h is the maximal arity of the database relations.

A relation state is represented by a pair (relation, organization), where the first component of the pair represents the data (set of tuples) and the second defines the internal schema for that relation. A database state is a set of relation states.

At a given time, the kth state of the DB is defined by the set:

Sl, {(r k ' /, = {oi4}) } where i = l . . . . , m a n d j E ( 1 . . . . , h } .

A pair in s k denotes a relation state. The second component of the pair is a set, since several types of physical organization can be defined over the same relation at the same time. The null organization corresponds to the flat file.

For a given database state it is possible to define the set of semantically equivalent states. Two states s x, Sy are semantically equivalent if they agree on the relations, even if they

have different organizations.

s x ~ Sy iff r x = r y for all i = 1 . . . . . m .

Restructuring is the phase of tuning that impacts the database by modifying the internal schema while keeping the logic schema and the data values (i.e. the relations) unchanged. We also refer to this as a semantics preserving transformation.

During the process of restructuring, each pair in s k is examined and, while the data component is left unchanged, the physical organization component is modified.

2.2. Database application modelling

In order to model a database application, A, we first need to introduce the finite set of its transactions: T C_U, where U is the universe of the possible transactions: U = { t l , t2, t 3 . . . . }. In our work, we only consider query transactions since update transactions are usually preceded by the retrieving of the target data.

The tuning process starts analyzing each transaction in A in order to determine the proper tuning actions. To this end, it is necessary to define a set of transaction classes, each of which is identified by a given query skeleton and the characteristics of the attributes (such as having unique or non-unique, null or not-null values).

The skeleton of a query is obtained by substituting actual relation names, attribute names and values with non-terminal symbols. A typical query skeleton is presented in the example below:

Ex. SELECT(at t_ l is t ) FROM(re l_name) W H E R E (att-name) = (const)

[1. In determining the tuning actions, besides the class, it is necessary to consider additional

extensional information, such as the cardinality of the relation and the selectivity of the W H E R E clause.

2.3. The tuning process

The tuning process aims at the transformation of both components: transactions t x are transformed into semantically equivalent transactions t" [13] and the database state s x into a new state sy, equivalent to former, which implements the optimal access path for t:.


We define the access path p~ as the set of physical organizations over relations and attributes accessed by the transaction tx, that is:

X

px = {oi,j} ,

where i E ( 1 , . . . , m} and j E ( 1 , . . . , h} range over the relations and attributes accessed by the transaction G, respectively.

The tuning process operates in two phases. First a local optimization phase, which examines each transaction separately, then a global optimization which aims at unifying and integrating the access paths required by the different transactions into a single physical schema (after resolving eventual conflicts).

Phase 1 - Local optimization The local optimization phase analyzes one transaction t x at a time, aiming at determining: - t h e class C i to which G belongs; - t h e optimal transformation that maps t x into a semantically equivalent transaction t '; - the new physical schema which guarantees the optimal access path Px for t'.

The local optimization ends with the production of the set of triples L:

L = { ( tx , t'x, P x ) } , where ILl = Iz l .

The above set may contain conflicting elements. A conflict arises whenever two transactions require two incompatible access paths. There are two situations that lead to conflicts: (a) When incompatible physical organizations are required, such as: (i) a compress and

nocompress index on the same attribute; (ii) a clustering on two different attributes; (iii) index and clustering on the same attribute.

(b) When opposite physical organizations are required, such as: the presence and absence of an index on the same attribute.

We will refer to a set of transactions requiring incompatible access paths as a conflict set.

Phase 2 - Global optimization In Phase 2 (global optimization), the set of transformations L identified during Phase 1 is

taken into account, with the aim of determining a physical schema that implements the access paths identified during phase 1, after resolving eventual conflicts. Each time a conflict set is detected, a transaction is selected and granted the optimal access path; for the remaining transactions a suboptimal transformation is considered. The conflict resolution strategy is essentially based upon two criteria: merit and priority. • Meri t o f a transaction

We define a set M of merit indicators that associates a code indicating the effectiveness of the coupling to each (transaction, database_state) pair:

M ( t i , s x ) = w .

Merit codes are used for representing the fact that a class of transactions is better executed with certain physical structures and that an unsuited choice can significantly penalize the execution. In our work, the set M was constructed on the basis of DBA manuals and other information acquired by benchmark tests. However, we believe that an automatic generation is possible. Merit indicators are used in the implementation of transformation rules. • Priority o f a transaction

An application function fj represents a set of transactions that always has to be considered jointly. This means that the application is organized into tasks, and the activation of a task implies the activation of a predefined set of transactions.


In a database application, we assume that some functions have higher priority than others. Their priority can be decided by organization managers, in accordance with the DBA. To this end, we associate a priority with each f~, and then define a partial order over T.

Global optimization starts by identifying the conflict sets, and, for each set, only one transaction is selected for transformation. The selection is essentially performed on the basis of the following criteria: (a) Maximal m e r i t - This criterium allows X-TUNER to select the candidate transaction having the highest merit associated to its transformation. (b) Highest priority - Each transaction is part of an application function, and the functions are ranked within a poset. The rank (priority) of a function is transitively applied to its transactions. Among the conflicting transactions, the one having the highest ranking is the candidate for transformation.

If these two criteria are not sufficient for resolving the conflict set, there is a selection made interactively by the DBA or, if necessary, made randomly by the system.

For the unselected transactions their possible subsequent transformations are then considered and new conflict sets are eventually built. This process is iterated until a transformation has been applied to all the transactions.

In the next sections, we focus our attention on the design and implementation of X-TUNER, the expert system based on the tuning method presented in this section.

3. The knowledge base of X-TUNER

In this section, the structure of the knowledge base of the expert system X-TUNER is illustrated showing how the tuning process and the information defined in the previous section are implemented in the system proposed.

The KB of X-TUNER is divided into three main sections: (i) a fact base, that models the database application;

(ii) a rule base, that models the tuning strategy; (iii) procedural attachments, that implement particular tuning functions not included in the rule base.

The organization of the KB is outlined in Fig. 1.

fact base rule base

CIessl f lcat lon rules database Oascrlpt loN

- logical schema

- Internal schema

user t ransact ion

descr ip t ion

Batch Interact ive

tuning tuning

\ ",,, Application procedures

Database

procedural attachment

I - mer i t evaluation !

- con f l i c t resolut ion

Appl icat ion Programmer

Fig. 1. The overall system architecture.

local rules

| conf l i c t detect ion

rules

! rule bssa

edi tor

r Database Adminis t rator


This figure depicts two different users of X-TUNER: the Application Programmer and the Data Base Administrator (note that the casual user, able to formulate impromptu queries, is not taken into consideration for tuning). For these two types of user, we have two different interfaces which connect to two sections of the knowledge base: the fact base and the rule base.

3.1. Fact base

The fact base models the database application with its two components: the database and the application transactions.

Database description The database description contains the logical schema of the database, together with the

characteristics of the attributes (i.e. unique or non-unique, null or not-null). At an extensional level, the description contains relation cardinalities and information on the physical organization (i.e. flat files, indexes, and clusters) used for storing the relations in the current database state (internal schema).

Transaction description The transaction description contains an internal representation of the information related

to each application transaction. A transaction description consists of five parts:

(A) transaction id; (B) transaction code; (C) transaction class; (D) selectivity level; (E) transaction priority;

The transaction id uniquely identifies the transaction within the application. The transaction code is a compressed representation of the application transaction as it

appears in the source program. In our implementation, we refer to SQL transactions. A transaction class is determined by the syntactic form (skeleton) of the transaction and

the characteristics of the attributes involved (as reported in the database description). The class is the key element for the identification of tuning actions.

The selectivity of the WHERE clause is represented by the ratio between the tuples extracted and the tuples in the database. The priority ranges from 0.1 to 1.0, it is given by the DBA and is used in the conflict resolution phase.

An example of transaction description (with a non-compressed code for the reader's convenience) is shown in Fig. 2.

Most of the information that resides in the fact base can be extracted automatically from the database application; in particular the information about the database can be obtained

Part (A) TR-123 Part (B) SELECT*

FROM emp WHERE salary = 1000

Part (C) Class: C2 Part (D) ~r ~< .05 Part (E) priority: 0.7

Fig. 2. Transaction description.


from the object DBMS (if necessary by developing simple ad hoc procedures). The information about the application transactions can be extracted from the source code. The information that cannot be extracted automatically from the object application, such as priority, merit, and selectivity, is supplied by the DBA.

Selectivity appears to be the most crucial type of information. Hypothetically, it is possible to obtain selectivity by actually running the query, but this solution would be extremely expensive. Furthermore, selectivity can change over time as a result of database updates. On the basis of our experience in real applications and considering the results obtained from an example (see Section 5), we realized that an accurate evaluation of selectivity is not necessary, but that it is sufficient to have it coded by using few predefined intervals (that essentially depend on the target DBMS). In our case study, selectivity was only considered for selection transactions, and we identified the following four intervals (the symbol o- indicates the query selectivity):

SI: ~<~.05 $2: .05 < tr ~<.10 $3: .10< tr ~<.25 $4: . 2 5 < t r .

3.2. Ru le base

The rule base contains: (i) classification rules; (ii) local rules; (iii) conflict detection rules.

Classification rules The classification rules have the task of analyzing the application transactions in order to

identify the classes to which they belong. The class a given transaction is put into is determined by syntactic form (Q-SKEL) and the characteristics of the attributes (A-CHRT) referred to in the W H E R E clause. An example of class definition is reported in Fig. 3.

Class definitions are used in classification rules that take the following form:

if Q-SKEL = x and A-CHRT = y then set transaction_class = z

Local rules Local rules aim at identifying the actions necessary for improving transaction execution, by

modifying its form and access path. More precisely, in the then part, a local rule specifies the action needed to obtain the optimal form of the transaction (by supplying the skeleton) and the auxiliary structures (to be prescribed or inhibited) necessary for obtaining the access path desired.

With regard to auxiliary structures, two basic actions are admitted: prescribe or inhibit. As a matter of fact, it is necessary to distinguish a case in which a certain index is not present because it is not required (i.e. no transaction needs it for its optimal execution) but if it were needed nothing would prevent it from being introduced, from a case in which the index is inhibited because its introduction would worsen the execution of one or more transactions.

Q-SKEL: SELECT FROM rel WHERE column = val

A-CHRT: column with non-unique values

Fig. 3. Query class definition: C2.


The action part specifies both the transformation to be performed on the syntax of the transaction and the physical organizations to be built (or avoided) in order to obtain the optimal access path. The result of an action is a ( t ' , p~) pair. In addition to the primary action, a local rule supplies a list of alternative actions as a secondary choice to be considered during global tuning if the transaction is in a conflict set and is not selected.

The conditions in the if part of the rule refer to a transaction class C n and to specific characteristics of the database extension, such as an interval for the relation cardinality (in case of a join operation) or a given level for the query selectivity (in case of a select operation).

Knowledge about the extension plays an important role because it determines the activation of the rules referring to the most appropriate actions. As already mentioned, there are several rules referring to class C i in their condition part. The selection of the rules to be fired is then determined by the (D) part of the transaction description. This fact implies that the skeleton of a transaction and the characteristics of the attributes involved (and therefore its class) are not sufficient for determining the best transformation, but relation cardinality and transaction selectivity must also be considered.

The structure of a local rule appears as follows:

if (tx has_class C~) and (extension has certain characteristics)

then (actionx,1) or (actionx, 2)

or (actionx, *)

The choice of a specific action is made in the conflict resolution phase. We wish to point out here that the actions are ordered according to the improvement they induce. In ordering them, we do not need to know the exact value of the improvement associated to each action (related to the merit indicators M) but we only need to know if a certain action is better or worse than another one.

Conflict detection rules Conflict detection rules have to check pairwise if transactions require conflicting access

paths. They are designed on the base of conflict conditions. The types of conflict depend very much on the characteristics of the target DBMS, and especially on the auxiliary structures that it allows to be introduced and eventually combined together.

During the implementation of X-TUNER, we realized that it was more effective to perform conflict detection within the conflict resolution process. For this reason, this part of the KB contains the conflict detection criteria, while the actual application of the conflict detection rules is described in the next section.

4. Applying the tuning process

The output of the Local Optimization (LO) phase is a set of actions identified for each transaction, as if they were independent of each other. We now focus on the Global Optimization and, in particular, on the steps necessary to obtain the new internal schema.

As explained in the previous section, each local rule contains a list of transformation actions in the then part; therefore, the set L, produced by the LO phase, is organized as a set


of lists:

L = {[actionk,i] } = {[(tk, t'k, i, Pk,,}]} ,

where k = 1 , . . . , IT] and i = 1 . . . . . n k. The maximal range n k of index i for a transaction t k represents the maximal number of alternative tuning actions including the null action. The null action is necessary for guaranteeing the termination of the conflict resolution algorithm that operates on L (see below).

When the restructuring process actually takes place at the end of the tuning process, the current state of the DB is considered in order to minimize the work: existing auxiliary structures are undone only if explicitly inhibited, and required structures are created if not already included.

4.1. F r o m local to global opt imizat ion

The set L produced in the LO phase is examined in order to identify the presence of conflict sets.

If the primary actions are all compatible (i.e. no conflict condition is verified), the optimal database organization Oop t is given by the composition of the auxiliary structures required by each transaction:

Oopt ~-~ ~ { P k , 1 ) " k

As we previously pointed out, actionx, 1 is the most advantageous one on the list of actions that improve the execution of t x.

When there is a conflict set, only one of the conflicting transactions is selected and its first (i.e. its best) tuning action is adopted (the selected transaction is then referred to as a candidate transaction). Secondary actions are taken into account for the remaining transactions belonging to the conflict set and may again produce conflicting access paths, so it is necessary to proceed iteratively checking for possible conflicts at every cycle of the global optimization phase.

It is important to note that the effort of determining the optimal set of nonconflicting actions would produce an exponential growth of the search space if the combinations of all the possible actions were examined.

The number of possible actions is:

Np = ~ n k where k = 1 , . . . , IT[ • k

In order to reduce the computational burden, it is necessary to adopt a heuristic strategy, as shown below.

The selection of the candidate transaction is made on the basis of merit and priority, defined in the previous section. Whenever these two criteria are not sufficient for selecting a candidate transaction, X-TUNER either prompts the DBA for selection or makes a random choice.

4.2. The conflict resolution algorithm

The algorithm illustrated in Fig. 4 refers to a three-dimensional array B where the element bi.j. k represents the physical organization oi,j required on the ]th attribute of the ith relation for the transaction t k. The final access path Pk for a given transaction t k will be: Pk = {bi.j,k}"


BEGIN

Set action level to 1.

DO-FOREVER:

WHILE there are transactions in T to be examined DO: Apply the action with the selected level to the next transaction t,; Set the values bi.j. k in B according to the access paths

determined by the selected action; END_DO.

WHILE there are conflict types to be examined DO: Consider next conflict type; Analyze array B to identify the existence of conflicts of the type considered; Build the conflict sets for the current conflict type;

END_DO.

WHILE there are conflict sets to be examined DO: Consider next conflict set; Determine the possible candidate transaction to be flagged with*:

CASE-OF: transactions flags CASE: a # marked transaction is present THEN mark with + all the other transactions; CASE: * marked transactions are present THEN mark with # one among them, mark with + the others; CASE: no # or * marked transaction is present THEN select the best among unmarked transactions

and mark it with *; Flag with + the unmarked transactions in the conflict set;

END_DO.

Change all the * flags to # flags {the elements of B so determined are asserted irrevocably}.

IF no + flagged transactions are present THEN conflict resolution terminated, exit.

Increment action level for + flagged transactions to consider the next secondary actions.

Reset all + flags.

END_DO.

END

Fig. 4. The conflict resolution algorithm.

Three kinds of marks are used in this algorithm: pound (#) , star (*) and plus (+) . The # flag marks irrevocably processed transactions, *flag marks candidate transactions, +flag marks the unselected conflicting transactions for which the subsequent actions on the list will be considered.

It is important to note that the conflict sets are not disjoint and, therefore, a transaction may be present in more than one set. At the end of each cycl~ of the DO-FOREVER loop, the algorithm sets the access paths for a number of transactions (marked with #) not exceeding the number of conflict sets identifi¢~.

Conflict resolution is sensitive to the order in which conflicts are examined. A transaction that is selected (i.e. *-flagged) during the analysis of a conflict set must be selected again if it appears in a conflict set examined later. This means that a *-flagged transaction is privileged


% teractive / / /~ransaction ~'"\ /- interface//

ing I I/ '~, , , optimizerS] at!ion ]

I II I KB Acquisition Loeal opt imizat ion

schema / f ~ I I ~ Conflict

/ formats / / - Merit Priority

I Jl J Tuning execution Global optimization

Fig. 5. Layout of tuning process.

when encountered again, unless it gets into conflict with other *-flagged transactions, in which case, only one of them is selected.

This algorithm irrevocably selects candidate transactions by making use of local knowledge. An irrevocable strategy performs well when 'infallible' local knowledge is available [14]. In our case, the quality of local knowledge depends on the accuracy of the knowledge base and especially of its classification and local rules.

4.3. The architecture of X-TUNER

Figure 5 illustrates the main functionalities of X-TUNER and the control and information flow through the various modules.

5. An example of KB construction

Having illustrated the principles of tuning and the general architecture of the X-TUNER system, Wg now examine an example of a simple KB related to the relational DBMS ORACLE (by Oracle Corp.), and, in particular, the version for IBM Personal Computer: Oracle/PC [23] (henceforth referred to as Oracle).

The goal of this study is to experiment with the feasibility of an expert system based on the method presented in the paper and to show the process of constructing a rule base related to a specific DBMS. An actual measure of the improvement that can be obtained by applying our method depends on the type of application considered, a topic not specifically included in the present study.


The application transactions considered in our study are represented by a set of queries typical of corporate applications (such as inventory or personnel administration) [1]. The types of queries considered are: - s e l e c t i o n s on unique and non-unique attributes; - s e l e c t i o n s with aggregate functions on unique and non-unique attributes; - relation sorting; - equijoin.

Five transaction classes were defined on the basis of the above query types (as reported in Appendix A).

The construction of local rules is based on: (a) the types of auxiliary structures available in Oracle; (b) the characteristics of the Query Optimizer, especially its behavior when certain access

paths are present; (c) the results of a benchmark for the case study. In the following, we make a detailed description of these elements.

5.1. Physical organizations for Oracle databases

Oracle supports the following kinds of physical organizations for its relations: compress and nocompress indexing, and clustering.

Indexes are implemented as B*-trees. They can be declared with unique values if they refer either to key attributes, or to other attributes having unique values. An index can be either compress or nocompress: an attribute value is totally stored in the index only if it is of a nocompress type.

A compress index uses less storage space and therefore allows a faster access than a nocompress one. However, there are cases in which a nocompress index produces better response time.

Let us consider a query whose execution needs to access only one indexed attribute:

SELECT average (salary) FROM emp WHERE salary < 1000.

In this case, a nocompress index is better than a compress one because the salary values are totally stored in the index and the answer can be produced simply by accessing it.

Clustering is another feature of Oracle. Two relations can be clustered on a common attribute in order that the tuples with the same value on this attribute be stored adjacently on the disk. An index is also built on the clustered attributes to speed up the access to the tuples.

5.2. Query optimizer characteristics

The tuning strategy must take into account both the behavior of the Query Optimizer and the use it makes of the available auxiliary structures.

The following is a brief summary of the information regarding query execution and optimization in Oracle. A more extensive description can be found in the ORACLE manuals [23].

First of all, the Oracle optimizer analyzes the conditions in the WHERE clause and takes into account the presence of index and cluster organizations. It does not consider relation cardinalities and value distributions, which, on the contrary, are considered by X-TUNER.


Selection conditions and indexes A relation is accessed by means of an index only if the query contains a W H E R E clause

with an attribute compared against a constant. For example, an index built on the attribute 'age' is used for answering a query with a W H E R E condition like 'age = 30', while a condition like 'age + 0 = 30' causes a sequential scan of the relation (since the attribute appears within an expression).

Join execution When the optimizer deals with join queries, it selects the driving relation, i.e. the relation

to be examined first. If an index is defined on a join attribute, the driving relation is selected so that it drives to the index. Therefore, the relation without an index is the driving one; its tuples are examined sequentially and for each tuple the second relation is accessed by using the index.

If the access paths of the join relations are the same, the query optimizer accesses the relations in the FROM clause from right to left. If the relations are all without an index (or, if an index exists for each of them), the relation on the right is scanned sequentially and the relation on the left is accessed in the specified sequence (using its index if it has one). In this case, it is better for the FROM clause relations to be organized according to their decreasing size, so that the smallest is accessed first.

5.3. Information obtained by benchmarks

In addition to the knowledge about the behavior of the DBMS that can be obtained from technical documentation, it is advisable to do some experimental work in order to determine the effectiveness of the auxiliary structures [1, 2]; in particular, it is important to assess the convenience of using indexes, according to query selectivity. To this end, we made a set of benchmarks [5] on sample relations having attributes with predefined value distribution [6]. The reported results refer to relations with 4096 tuples, 100 bytes long each.

Figure 6 shows the influence of the selectivity factor o- in performing selections on non-unique attributes. Both compress and nocompress indexes improve performance if the selectivity factor is less, or approximately equal to, 0.25; furthermore, a nocompress index is always slower than a compress one.

Figure 7 shows the execution time of a query with an aggregate function when the

s e c

1 0 0

8 0

6 0 -

4 0 -

2 0 -

0

......... /'//}'~ - - no i n d e x

n o c o m p r e s s i n d e x

- c o m p r e s s i n d e x

5 10 15 20 25 s o 35 40 45 50

s e l e c t i v i t y

F ig . 6. S e l e c t i o n wi th ' = ' c o n d i t i o n o n n o n - u n i q u e a t t r i b u t e .


s e c , 100 -

8 0 -

60 -

40 -

20-

0

8O

unique attribute

40

non-unique attribute

Selection on:

m no index [ ~ compress index ~ nocompress index

Fig. 7. Selections with aggregation function.

selectivity factor o- is 0.25. As we can see, the nocompress index corresponds to the best organization while it is better to have no index at all rather than a compress one.

Another result refers to sorting. The execution of O R D E R BY clauses is very slow because it implies the construction and sorting of a new, temporary table. We found that in many cases it is faster to build an index (which is sorted), execute the selection driven by the index, and then delete the index.

All the above considerations are the basis of the construction of the rule base, as illustrated in the following section.

5.4. The rule base

The rule base contains three types of rules: classification, local, and conflict detection rules.

5.4.1 Classification rules In our study, we identified five classes of transactions. In Appendix A we report the

conditions that are tested in the IF part of the rules. The classes identified reflect the types of queries we chose for our study; in a real application, we believe that the number of classes would increase, but remain in the same order of magnitude.

5.4.2 Local rules A set of local rules suitable for the transactions considered in our case study can be

derived from the DBMS characteristics previously described. A significant subset of the rules identified is reported in Appendix B, and is described as follows.

Rules about index construction A typical problem related to selections is whether or not it is advisable to use an index. On

the basis of some benchmark results, we are able to state some rules that suggest building an index (of a compress or nocompress type) on the basis of query selectivity. When the selectivity factors are high, accessing the relation through the index always worsens the execution time. According to the optimizer's behavior, it is possible to inhibit the use of existing indexes by transforming a query t into a semantically equivalent one, t', where a null


value is added to the attribute mentioned in the WHERE clause (or a concatenation with a null string, when there is a non-numeric attribute).

Rules 1, 2, 3 and 4, in Appendix B, illustrate this issue applied to WHERE clauses with '= ' operator. Note that in case of index inhibition, sequential scanning is implied and non-clustered relation storing is preferable because of its contiguous allocation.

Compress versus nocompress indexes When a selection query accesses only one attribute, it is important to decide whether to

build a compress or a nocompress index. Benchmark results (a synthesis of which is illustrated in Fig. 7) show the selectivity threshold that makes the nocompress index necessary. Rules 5 and 6 implement the criteria identified by these results.

Relation sorting As previously mentioned, there are cases in which the option 'ORDER BY' should be

substituted by the construction of a temporary index. This situation is dealt with by rule 7.

Auxiliary structures for join execution Join execution needs to be supported by some auxiliary structures. When the organization

is fiat, the nested loop algorithm (which gives poor performance) is applied. In our tests, it took more than 6 hours to join a relation having 4096 tuples to a relation of 100 tuples; for both relations the tuple size was 100 bytes.

Clustering speeds up join more than indexing does because the resulting tuples can be retrieved without large disk seeks; on the other hand, a single relation belonging to a cluster loses its contiguous allocation so that other operations, such as sequential scanning, take longer. For these reasons, in rule 8 of Appendix B, we give preference to a cluster construction, but we also use an index construction (as a secondary choice) if the cluster is not allowed by other queries that have higher priority and require a sequential scan.

In any case, due to the behavior of the optimizer, it is always convenient to list the relation having the lowest cardinality on the right. Therefore, we developed rule 8, which transforms the query syntax in order to satisfy this requirement.

5.4.3 Conflict detection rules In our case, conflict detection rules are based on the following conflict types:

(a) an index cannot be prescribed and inhibited at the same time; (b) a cluster cannot be prescribed and inhibited at the same time; (c) an index cannot be of compress and nocompress type at the same time; (d) a relation can belong to one cluster only.

The possible conflicts can vary from one DBMS to another, but, in general, we believe that their number would remain in the same order of magnitude.

6. Summary and conclusions

In this paper, we presented a method for database application tuning and the architecture of an expert system, X-TUNER, based on the method proposed. X-TUNER is characterized by a KB with two main sections: fact base and rule base. The former concerns the description of the database application and its transactions and the database characteristics, while the latter contains the rules for the tuning process. Tuning rules are identified by starting out from the characteristics of the DBMS at hand, with its internal schema facilities


and the functionality of the query optimizer. The characteristics of the DBMS are described to some extent in the technical literature supplied by the producer and can be more thoroughly investigated by means of tests and experimental work.

Finally, we presented the study of a case built on the top of Oracle/PC, that allowed us to construct a simple KB including the transaction classes and rules reported in the Appendix.

The KB construction was conceived after a careful consideration of the optimization techniques reported in the DBA manual and some results described in literature on DBMS performance evaluation. As a matter of fact, we took into account a set of transactions proposed for benchmarking database systems [1] because valid in a large number of corporate application.

The approach proposed in this paper appears general enough to be applied in most database applications, and the same is true for the benchmarks aimed at identifying the effectiveness of (transaction, access_path) pairs.

The KB constructed in our case study contains prototypical knowledge relative to database tuning. Part of this knowledge (e.g. the rules concerning selections) is valid for a large number of applications and relational DBMS, but there is a part of the KB that must be redesigned in accordance with the application characteristics and the DBMS used, including its query optimizer and auxiliary structures.

In general, we believe that a DBA can usefully apply the X-TUNER methodology by following the steps outlined in Section 5:

(i) point out the characteristics of the various types of physical organization; (ii) summarize the query optimizer's features;

(iii) experimentally verify the behavior of the access paths for which the manuals do not give sufficient explanations;

(iv) take selectivity into account by making reference to existing literature or by performing specific tests; and

(v) define a set of local rules, each of which treats a specific issue. A prototype of X-TUNER was built by using the Expert System Shell NEXPERT-

OBJECT (by Neuron Data). Even in the simple case we considered, the flexibility of the Expert System approach was successfully experimented.

Future work will address the quantitative assessment of the improvement that X-TUNER can produce in tuning real database applications. It seems difficult to establish a general approach and, therefore, we are looking for a single application which is representative of a significant class of real world database applications.

Appendix A: Transaction classes

CLASS C 1 Q-SKELETON) SELECT

FROM rel WHERE column = val

A-CHRT) column with unique values

CLASS C z Q-SKELETON) SELECT

FROM rel WHERE column = val

A-CHRT) column with non-unique values


CLASS C 3 Q-SKELETON) SELECT funct(column) /*funct E {AVG,COUNT}* /

FROM rel W H E R E column op val / *opE {~<, < , > , />}*/

CLASS C 4 Q-SKELETON) SELECT

FROM rel O R D E R BY column

A-CHRT) column with non-null values

CLASS C s Q-SKELETON) SELECT

FROM rel 1, rel 2 W H E R E col I .rel I = col 2.rel:

Appendix B: Local rules

We use the symbol q~ for indicating that the action does not provide for constructing or inhibiting any specific physical organization. Moreover, we note that adding a null value (0 or ") has the effect of inhibiting the use of possible indexes.

1. if (t~ has-class C 1) ( t ' t x then x,1 4 - , Px,1 4- [index prescribed on column])

2. if (t x has-class C2) and (or ~< 5%) then( lxl,1 ~-- ix ,

Px,1 4-[ index prescribed on column])

3. if (t x has-class C2) and (5% < cr ~< 25%) then ( ( ,1 4 - tx,

P~,I 4- [compress index prescribed on column]) o r ( ( ,2 4- [SELECT

FROM rel W H E R E column + null value = vail,

Px,2 4- [clusters on rel are inhibited]) or( tx, 3 <--- [SELECT

FROM rel W H E R E column + null value = val],


. if ( t x has-class C2) and (tr > 25%) t h e n ( t ' l <-- [SELECT

F R O M rel W H E R E column + null value = vail,

Px , t ~-- [clusters on rel are inhibited]) o r ( t x , 2 <-- [SELECT

F R O M rel W H E R E column + null value -- val],

5. if (t~ has-class C3) and (tr ~< 10%) then(t' , l ~-- t~,

Px,1 ~ - [ i n d e x prescribed on column])

.

Px,2 <-" o r ( t " 3 , - -

if ( t x has-class Ca) and (tr > 10%) then ( t',l ~-- t~,

Px,1 "-- [no compress index prescribed on column]) or(t'x, 2 ~ [SELECT

F R O M rel W H E R E column + null value -- val], [clusters on rel are inhibited]) [SELECT F R O M rel W H E R E column + null value = val],

7. if ( t x has-class C4) then(t' , l ~-- [SELECT

F R O M rel W H E R E column > null value],

Px,1 ~-- [index prescribed on column], or( t ' ,2 ~ - - [CREATE I N D E X ind ON rel(column);

SELECT F R O M rel W H E R E column > null value; D R O P I N D E X ind;],

8. if (t x has-class C 5) and (card(reli) < card(relj)) then(t'.1 ..-- [SELECT

F R O M relj,rell W H E R E col 1.rel I = col2.rel2],

Px,1 ~--[cluster of rel I and rel 2 prescribed o n c o l 1 and c012] )


o r ( t ' . 2

Px,2 <--- or(t" 3 <---

PX,3 <"--

[ S E L E C T

F R O M re l j , re l i W H E R E col l . re l ~ = col 2.re12], [ i ndex p r e s c r i b e d on co l i . r e l j ] )

[ S E L E C T F R O M re l j , re l i

W H E R E col 1.rel 1 = col2.rel2] ,

[ i ndex p r e s c r i b e d o n co l / . r e l i ] )

R e f e r e n c e s

[1] D. Bitton, D.J. DeWitt, and C. Turbyfill, Bench- marking database systems: A systematic approach, in: Proc. VLDB 83 (1983) 8-19.

[2] R. Bogdanowicz, M. Crocker, D.K. Hsiao, C. Ryder, V. Stone and P. Strawser, Experiments in benchmarking relational database machines, in: H.-O. Leilich and M. Missikoff, eds. Proc. Data- base Machines, (Springer, Berlin, 1983) 106-134.

[3] M. Bouzeghoub, G. Gardarin and E. Metais, Database design tools: An expert system approach, in: Proc. VLDB 85 (1985) 82-95.

[4] S. Ceri, ed., Methodology and Tools for Data Base Design (North-Holland, Amsterdam, 1983).

[5] F. Cesarini and G. Soda, Analysis of data access methods in a relational DBMS for tuning physical database and transactions, RT 12/88, Dipar- timento di Sistemi e Informatica, Firenze, 1988.

[6] F. Cesarini and G. Soda, Generating sample relational databases on small machines, to appear in Comput. J.

[7] J. Choobineh, M. Mannino, Jay. F. Nunamaker and Benn R. Konsynsky, An expert database design system based on analysis of forms, 1EEE Trans. Software Engrg. 14(2) (1988) 242-253.

[8] E.F. Codd, A relational model of data for large shared data banks, Comm. A C M 13(6) (1970) 79-90.

[9] C.J. Date, An Introduction to Database Systems, 4th ed. (Addison-Wesley, Reading, MA, 1986).

[10] S. Finkelstein, M. Schkolnick and P. Tiberio, Physical database design for relational databases, ACM TODS 13(1) (1988) 91-128.

[11] M. Jarke, Current trends in database query pro- cessing, in M.L. Brodie and J. Mylopolus, eds., Knowledge Base Management Systems (Springer, Berlin, 1986) 111-120.

[12] H. Lam, Stanley Y.U. Su and R. Koganti, A physical database design evaluation system for CODASYL databases, IEEE Trans. Software Engrg. 14(7) (1988) 1010-1022.

[13] J.J. King, Query Optimization by Semantic Reasoning (Umi Research Press, 1984).

[14] N.J. Nilsson, Principles of Artificial Intelligence, (Springer, Berlin, 1982).

[15] D. Reiner, G. Brown, M. Friedell, J. Lehman, R. McKee, P. Rheingans and A. Rosenthal, A database designer's workbench, in: Proc. Fifth lnternat. Conf. on Entity-Relationships Approach: Ten Years of Experience in Informa- tion, Dijon (1986) 347-360.

[16] P. Rullo and D. Sacca, An automatic physical designer for network model databases, 1EEE Trans. Software Engrg. 14(9) (1988) 1293-1306.

[17] M. Schkolnik, Physical database design techniques, in: Proc. N U Y Syrup. on Database De- sign (1978).

[18] M. Schkolnik and P. Tiberio, Estimating the cost of updates in a relational database, ACM Trans. Database Systems 10(2) (1985) 163-179.

[19] T.J. Teorey and J.P. Fry, Design of Database Structures (Prentice-Hall, Englewood Cliffs, NJ, 1982).

[20] D.J. Waterman, A Guide to Expert Systems, (Ad- dison-Wesley, Reading, MA, 1986).

[21] G. Wiederhold, Database Design, 2nd ed. (McGraw-Hill, New York, 1983).

[22] S. Bing Yao, Optimization of query evaluation algorithms, ACM TODS 4(2) (1979) 133-155.

[23] Database Administrator's Guide, ORACLE Cor- poration, Menlo Park, CA, 1985.

~=~, Francesca Cesarini received her degree in Mathematics from the University of Flor- ence (Italy) in 1968. She be- came researcher at the Nation- al Council of Research in 1971 and since 1983 she is Associate Professor of Computer Sys- tems Organization at the De-

rtment of Systems and mputer Science of the Flor-

ence University. Her research , interests include data struc-

tures and algorithms for database management systems, multimedia databases, Knowledge Representa- tion Systems and application of AI techniques to deductwe databases.


Michele Missikoff received his degree in Physics from the University of Rome 'La Sapienza' in 1972. He worked until 1980 in research labs of two Italian companies, leader in electronic equipement and software development respectively. Since 1980 he is researcher at IAIS, Istituto di Analisi dei Sistemi ed Infor- matica, of National Research Council, where he held the

position of Coordinator of Database department till 1985, and then that of Knowledge Engineering, which he currently holds. Since 1974 he has been appointed at University of Rome, giving lectures on Operating Systems, Databases, and currently on Principles of Knowledge Bases. His main research interest has been on special architectures for database management (database machines) and on knowledge bases. Cur- rently the focus of his activity concerns the merging of object-oriented databases and deductive systems.

Giovanni Soda received his de- ree in mathematics in 1969 om the University of Flor-

ence (Italy). From 1971 he was researcher at the National Council of Research, where his activity included formal systems for language manipu- lation. Since 1975 he has been at the University of Florence where he is presently As- sociate Professor of Program- ming Languages at the De-

partment of Systems and Computer Science (DSI) of the University of Florence. His current interests include Knowledge Representation Systems, deductive databases, integration of Artificial Intelligence techniques with neural networks. Prof. Soda is a member of AI*IA, the Italian Association of Artificial In- telligence.

Documents

An expert system approach for database application tuning