Advanced Database Systems Ass 5

8/2/2019 Advanced Database Systems Ass 5

1/33

Advanced DatabaseSystems

Assignment : No.5

Analyse and Compare the Physical Storage Structures and typesavailable INDEX of the latest versions of: Oracle SQL Server DB2 MySQL Teradata

Define a comparative framework. Recommend one product foorganizations of around 2000-4000employees with sound reason

based on Physical Storage Structures.

SUBMITTED TO: DR. FAROOQUE AZAMSUBMITTED BY:

SHAHID IQBAL USMAN HUMAYUN

AQSA BAJWA MADIHA WARIS

DEPARTMENT: COMPUTER SOFTWAREENGINEERING

COLLEGE OF E&ME, NUST


2/33

Advanced Database Systems Comparison of Latest Database

Management Systems

ABSTRACT:

In this document the detailed analysis of all the latest versions of Database Management Systems

has been done. The Physical Storage Structure and Types of available indexes of Oracle, SQL

server, DB2, MySQL, and Teradata have been explained and then a comparative analysis has been

done on the basis of Physical Storage Structure of all these Database Management Systems. Basedon this comparative analysis one of the Database Management systems is proposed as best fit for

the organizations having 2000 to 4000 employees.

MS-11 College of E&ME, NUST 2


3/33


Management Systems

1. INTRODUCTION:

A database management system (DBMS) is a collection of programs that enables users to create

and maintain a database. Database management systems provide several functions in addition to

simple file management which are as follows:

Allow concurrency

Control security

Maintain data integrity

Provide for backup and recovery

Control redundancy

Allow data independence

Provide non-procedural query language

Perform automatic query optimization [1]

A DBMS is a complex set of software programs that controls the organization, storage,

management, and retrieval of data in a database. Database Management Systems are categorized

according to their data structures or types, sometime DBMS is also known as Data base Manager

[2].

1.1.DATABASE MANAGEMENT SYSTEM SOFTWARES:

The examples of different database management systems are as follows:

Oracle

DB2

SQL Server

MySQL

Teradata

All above stated Database Management Systems should have the following features:

They must provide a way to structure data as records, tables, or objects

They should accept data input from operators and store that data for later retrieval

Query languages for searching, sorting, reporting, and other "decision support" activitiesthat help users correlate and make sense of collected data must be provided

They must provide multiuser access to data, along with security features that prevent some

users from viewing and/or changing certain types of information

They should provide data integrity features that prevent more than one user from accessing

and changing the same information simultaneously

A data dictionary (metadata) that describes the structure of the database, related files, and

record information must be provided [3]



4/33


Management Systems

2. ANALYSIS OF THE PHYSICAL STORAGE STRUCTURES AND

TYPES OF AVAILABLE INDEX OF THE LATEST DBMSs:In this section a complete analysis of the following latest versions of Database Management

Systems has been done on the basis of their Physical Storage Structure and their available

indexes:

a. Oracle

b. Sql server

c. Mysqld. DB2

e. Teradata

A. ORACLE:

An Oracle database is made up of physical and logical structures. Physical structures are those that

can be seen and operated on from the operating system, such as the physical files that store data on

a disk. Whereas logical structures are created and recognized by Oracle Database and are notknown to the operating system. In this section only the physical storage structure in oracle is

discussed as follows:

2.1. PHYSICAL STORAGE STRUCTRUE IN ORACLE:

Structures of an Oracle database include data files on disk that are not directly manipulated byusers of the database. Physical structures exist at the operating system level.

In the Figure 1 below, the physical database structures of an Oracle database, including data files,

redo log files, and control files are shown.



5/33


Management Systems

Figure 1: Oracle Database Storage Structure

2.1.1.Data files:

Files that contain all of the database data that the users of the database save and retrieve using

SELECT and otherDML statements are called data files. A table space comprises one or more data

files.

A single data file is an operating system file on a servers disk drive. This disk may be local to theserver or a drive on a shared storage array. Each data file belongs to only one table space; a table

space can have many data files associated with it.

There are fivephysical data files in the database in the physical structure illustration as shown in

Figure 1: one is used for the SYSTEM table space, one is used for the SYSAUX table space, two

data files are assigned to Table space 1, and the fifth data file is assigned to Table space 2.

2.1.2.Redo Log Files:

Files that contain a record of all changes made to the data in both tables and indexes as well as

changes to the database structures themselves. These files are used to recover changed data thatwas in memory at the time of a crash.

The redo log filesfacilitate the Oracle mechanism to recover from an instance failure or a media

failure. When any changes are made to the database, such as updates to data or creating or

dropping database objects, the changes are recorded to the redo log files first.



6/33


Management Systems

A database has at least two redo log files, and it is recommended that multiple copies of the redo

log files be stored on different disks. If the instance fails, any changed database blocks that were

not yet written to the data files are retrieved from the redo log files and written to the data fileswhen the instance is started again and this process is called instance recovery.

2.1.3.Control Files:

Files that record the physical structure of a database, the database name, and the names and

locations of data files and redo log files are called a control files. The control file maintains themetadata for the physical structure of the entire database.

It stores the following metadata:

The name of the database

The names and locations of the table spaces in the database

The locations of the redo log files and

The information about the last backup of each table space in the database.

Because of the importance of control file file, Oracle best practices recommend that you keep acopy of the control file on at least three different physical disks.

The control file and redo log file contents do not map directly to any database objects, but their

contents and status are available to the DBA by accessing virtual tables called data dictionary

views, which are owned by the SYS schema [4].

2.1.4.Mechanisms for Storing Database Files:

Several mechanisms are available for allocating and managing the storage of these files. The most

common mechanisms include:i. Oracle Automatic Storage Management (Oracle ASM):

Oracle ASM includes a file system designed exclusively for use by Oracle Database.

ii. Operating system file system:

Most Oracle databases store files in a file system, which is a data structure built inside acontiguous disk address space. All operating systems have file managers that allocate and deal

locate disk space into files within a file system.

A file system enables disk space to be allocated too many files. Each file has a name and is madeto appear as a contiguous address space to applications such as Oracle Database. The database can

create, read, write, resize, and delete files.

A file system is commonly built on top of a logical volume constructed by a software packagecalled a logical volume manager (LVM). The LVM enables pieces of multiple physical disks to

be combined into a single contiguous address space that appears as one disk to higher layers of

software.iii. Raw device:

Raw devices are disk partitions or logical volumes not formatted with a file system. The primary

benefit of raw devices is the ability to perform direct I/O and to write larger buffers. In direct I/O,

applications write to and read from the storage device directly, bypassing the operating systembuffer cache.

iv. Cluster file system:

A cluster file system is software that enables multiple computers to share file storage whilemaintaining consistent space allocation and file content. In an Oracle RAC environment, a cluster



7/33


Management Systems

file system makes shared storage appears as a file system shared by many computers in a clustered

environment. With a cluster file system, the failure of a computer in the cluster does not make the

file system unavailable. In an operating system file system, however, if a computer sharing filesthrough NFS or other means fails, then the file system is unavailable [5].

2.2. INDEX TYPES IN ORACLE:Oracle includes numerous data structures to improve the speed of Oracle SQL queries. Taking

advantage of the low cost of disk storage, Oracle includes many new indexing algorithms that

dramatically increase the speed with which Oracle queries are serviced.Oracle has different types of indexes available in it which are designed for certain scenarios and

are described as follow:

b*tree indexes - the most common type (especially in OLTP environments) and the default

type

b*tree cluster indexes - for clusters

hash cluster indexes - for hash clusters

reverse key indexes - useful in Oracle Real Application Cluster (RAC) applications bitmap indexes - common in data warehouse applications

partitioned indexes - also useful for data warehouse applications

function-based indexes

index organized tables

domain indexes

2.2.1.B*Tree Indexes:

B*tree stands for balanced tree. This means that the height of the index is the same for all valuesthereby ensuring that retrieving the data for any one value takes approximately the same amount of

time as for any other value. Oracle b*tree indexes are best used when each value has a high

cardinality (low number of occurrences)for example primary key indexes or unique indexes. One

important point to note is that NULL values are not indexed. They are the most common type ofindex in OLTP systems.

B-tree indexes are used to avoid large sorting operations. For example, a SQL query requiring

10,000 rows to be presented in sorted order will often use a b-tree index to avoid the very large sortrequired to deliver the data to the end user.

Figure 2: Oracle B-tree index



8/33


Management Systems

2.2.2.B*Tree Cluster Indexes:

These are B*tree index defined for clusters. Clusters are two or more tables with one or more

common columns and are usually accessed together (via a join).CREATE INDEX product_orders_ix ON CLUSTER product_orders;

2.2.3.Hash Cluster Indexes:In a hash cluster rows that have the same hash key value (generated by a hash function) are stored

together in the Oracle database. Hash clusters are equivalent to indexed clusters, except the indexkey is replaced with a hash function. This also means that here is no separate index as the hash is

the index.

CREATE CLUSTER emp_dept_cluster (dept_id NUMBER) HASHKEYS 50;

2.2.4.Reverse Key Indexes:

These are typically used in Oracle Real Application Cluster (RAC) applications. In this type of

index the bytes of each of the indexed columns are reversed (but the column order is maintained).

This is useful when new data is always inserted at one end of the index as occurs when using a

sequence as it ensures new index values are created evenly across the leaf blocks preventing theindex from becoming unbalanced which may in turn affect performance.

CREATE INDEX emp_ix ON emp(emp_id) REVERSE;

2.2.5.Bitmap Indexes:

These are commonly used in data warehouse applications for tables with no updates and whose

columns have low cardinality (i.e. there are few distinct values). In this type of index Oracle storesa bitmap for each distinct value in the index with 1 bit for each row in the table. These bitmaps are

expensive to maintain and are therefore not suitable for applications which make a lot of writes to

the data.

For example consider a car manufacturer which records information about cars sold including thecolour of each car. Each colour is likely to occur many times and is therefore suitable for a bitmap

index.

CREATE BITMAP INDEX car_col ON cars(colour) REVERSE;



9/33


Management Systems

Oracle uses a specialized optimizer method called a bitmapped index merge to service this query.

In a bitmapped index merge, each Row-ID, or RID, list is built independently by using the

bitmaps, and a special merge routine is used in order to compare the RID lists and find theintersecting values. Using this methodology, Oracle can provide sub-second response time when

working against multiple low-cardinality columns.

Figure 3: Oracle Bitmap Merge Join

2.2.6.Partitioned Indexes:

Partitioned Indexes are also useful in Oracle data warehouse applications where there is a large

amount of data that is partitioned by a particular dimension such as time.Partition indexes can either be created as local partitioned indexes or global partitioned indexes.

Local partitioned indexes mean that the index is partitioned on the same columns and with the

same number of partitions as the table. For global partitioned indexes the partitioning is userdefined and is not the same as the underlying table.

2.2.7.Function-based Indexes:

These are indexes created on the result of a function modifying a column value. For example

CREATE INDEX upp_ename ON emp(UPPER(ename((;

The function must be deterministic (always return the same value for the same input).

2.2.8.Index Organized Tables:

In an index-organized table all the data is stored in the Oracle database in a B*tree index structuredefined on the table's primary key. This is ideal when related pieces of data must be stored together

or data must be physically stored in a specific order. Index-organized tables are often used for

information retrieval, spatial and OLAP applications.

2.2.9.Domain Indexes:

These indexes are created by user-defined indexing routines and enable the user to define his or herown indexes on custom data types (domains) such as pictures, maps or fingerprints for example.

These types of index require in-depth knowledge about the data and how it will be accessed.



10/33


Management Systems

B. SQL SERVER:

SQL Server does not see data and storage in exactly the same way a DBA or end-user does. DBAsees initialized devices, device fragments allocated to databases, segments defined within

databases, tables defined within segments, and rows stored in tables.

2.3. PHYSICAL STORAGE STERUCTURE IN SQL SERVER:

SQL Server views storage at a lower level as device fragments allocated to databases, pages

allocated to tables and indexes within the database, and information stored on pages.

There are two basic types of storage structures in a database:

Linked data pages and

Index trees.

All information in SQL Server is stored at the page level. When a database is created, all space

allocated to it is divided into a number of pages, each page is 2KB in size.There are five types of pages within SQL Server:

i. Data and log pages

ii. Index pages

iii. Text/image pages

iv. Allocation pages

v. Distribution pages

All pages in SQL Server contain a page header. The page header is 32 bytes in size and contains

the logical page number, the next and previous logical page numbers in the page linkage, theobject_id of the object to which the page belongs, the minimum row size, the next available row

number within the page, and the byte location of the start of the free space on the page.

2.3.1. Data Pages:

A data page is the basic unit of storage within SQL Server. All the other types of pages within a

database are essentially variations of the data page.

All data pages contain a 32-byte header, as described earlier. With a 2KB page (2048 bytes) this

leaves 2016 bytes for storing data within the data page. In SQL Server, data rows cannot cross

page boundaries. The maximum size of a single row is 1962 bytes, including row overhead.

Data pages are linked to one another by using the page pointers (prevpg, nextpg) contained in the

page header. This page linkage enables SQL Server to locate all rows in a table by scanning all

pages in the link. Data page linkage can be thought of as a two-way linked list. This enables SQLServer to easily link new pages into or unlink pages from the page linkage by adjusting the page

pointers.

In addition to the page header, each data page also contains data rows and a row offset table. The

row-offset table grows backward from the end of the page and contains the location or each row on

the data page. Each entry is 2 bytes wide.

2.3.1.1.Data Rows:



11/33


Management Systems

Data is stored on data pages in data rows. The size of each data row is a factor of the sum of the

size of the columns plus the row overhead. Each record in a data page is assigned a row number. A

single byte is used within each row to store the row number.Therefore, SQL Server has a maximum limit of 256 rows per page, because that is the largest value

that can be stored in a single byte (2^8).For a data row containing all fixed-length columns, there

are four bytes of overhead per row:

1 byte to store the number of variable-length columns (in this case, 0)

1 byte to store the row number

2 bytes in the row offset table at the end of the page to store the location of the row on the

page

If a data row contains variable-length columns, there is additional overhead per row. A data row is

variable in size if any column is defined as varchar, varbinary, or allows null values.

In addition to the 4 bytes of overhead described previously, the following bytes are required to

store the actual row width and location of columns within the data row:

2 bytes to store the total row width

1 byte per variable-length column to store the starting location of the column within the

row

1 byte for the column offset table

1 additional byte for each 256-byte boundary passed (adjust table)

2.3.2.Allocation Pages:

Space is allocated to a SQL Server database by the create database and alter database commands.

The space allocated to a database is divided into a number of 2KB pages. Each page is assigned alogical page number starting at page 0 and increased sequentially. The pages are then divided into

allocation units of 256 contiguous 2KB pages, or 512 bytes (1/2 MB) each. The first page of each

allocation unit is an allocation page that controls the allocation of all pages within the allocation

unit.

The allocation pages control the allocation of pages to tables and indexes within the database.

Pages are allocated in contiguous blocks of eight pages called extents. The minimum unit ofallocation within a database is an extent. When a table is created, it is initially assigned a single

extent, or 16KB of space, even if the table contains no rows. There are 32 extents within an

allocation unit (256/8).

An allocation page contains 32 extent structures for each extent within that allocation unit. Each

extent structure is 16 bytes and contains the following information:

Object ID of object to which extent is allocated

Next extent ID in chain

Previous extent ID in chain

Allocation bitmap



12/33


Management Systems

Deallocation bitmap

Index ID (if any) to which the extent is allocated

Status

The allocation bitmap for each extent structure indicates which pages within the allocated extent

are in use by the table. The deallocation bit map is used to identify pages that have become emptyduring a transaction that has not yet been completed. The actual marking of the page as unused

does not occur until the transaction is committed, to prevent another transaction from allocating the

page before the transaction is complete [6].

2.4. INDEX TYPES IN SQL SERVER:

All SQL Server indexes are B-Trees. There is a single root page at the top of the tree, branching

out into N number of pages at each intermediate level until it reaches the bottom, or leaf level, of

the index. The index tree is traversed by following pointers from the upper-level pages down

through the lower-level pages. In addition, each index level is a separate page chain. There may be

many intermediate levels in an index. The number of levels is dependent on the index key width,the type of index, and the number of rows and/or pages in the table. The number of levels is

important in relation to index performance.

2.4.1. Non Clustered Index:

A non clustered index is analogous to an index in a textbook. The data is stored in one place,

the index in another, with pointers to the storage location of the data. The items in the index are

stored in the order of the index key values, but the information in the table is stored in a different

order (which can be dictated by a clustered index). If no clustered index is created on the table, the

rows are not guaranteed to be in any particular order.

Similar to the way you use an index in a book, Microsoft SQL Server 2000 searches for a datavalue by searching the non clustered index to find the location of the data value in the table and

then retrieves the data directly from that location. This makes non clustered indexes the optimal

choice for exact match queries because the index contains entries describing the exact location in

the table of the data values being searched for in the queries. If the underlying table is sorted using

a clustered index, the location is the clustering key value; otherwise, the location is the row ID

(RID) comprised of the file number, page number, and slot number of the row. For example, tosearch for an employee ID (emp_id) in a table that has a non clustered index on the emp_id

column, SQL Server looks through the index to find an entry that lists the exact page and row in

the table where the matching emp_id can be found, and then goes directly to that page and row.



13/33


Management Systems

Figure 4: Non Clustered Index

2.4.1.1. Considerations:

Consider using non clustered indexes for:

Columns that contain a large number of distinct values, such as a combination of lastname and first name (if a clustered index is used for other columns). If there are very few

distinct values, such as only 1 and 0, most queries will not use the index because a table

scan is usually more efficient.

Queries that do not return large result sets. Columns frequently involved in search conditions of a query (WHERE clause) that return

exact matches.

Decision-support-system applications for which joins and grouping are frequently required.

Create multiple non clustered indexes on columns involved in join and grouping

operations, and a clustered index on any foreign key columns.

Covering all columns from one table in a given query. This eliminates accessing the table

or clustered index altogether.

2.4.2.Clustered Indexes:

A clustered index determines the physical order of data in a table. A clustered index is analogous

to a telephone directory, which arranges data by last name. Because the clustered index

dictates the physical storage order of the data in the table, a table can contain only one clusteredindex. However, the index can comprise multiple columns (a composite index), like the way a

telephone directory is organized by last name and first name. Clustered Indexes are very similar to

Oracle's IOT's (Index-Organized Tables).



14/33


Management Systems

Figure 5: Clustered Index

A clustered index is particularly efficient on columns that are often searched for ranges of

values. After the row with the first value is found using the clustered index, rows with subsequent

indexed values are guaranteed to be physically adjacent. For example, if an application frequently

executes a query to retrieve records between a range of dates, a clustered index can quickly locatethe row containing the beginning date, and then retrieve all adjacent rows in the table until the last

date is reached. This can help increase the performance of this type of query. Also, if there is a

column(s) that is used frequently to sort the data retrieved from a table, it can be advantageous to

cluster (physically sort) the table on that column(s) to save the cost of a sort each time the

column(s) is queried.

Clustered indexes are also efficient for finding a specific row when the indexed value is unique.

For example, the fastest way to find a particular employee using the unique employee ID columnemp_idis to create a clustered index or PRIMARY KEY constraint on the emp_idcolumn.

Note PRIMARY KEY constraints create clustered indexes automatically if no clustered

index already exists on the table and a non clustered index is not specified when you create the

PRIMARY KEY constraint.

2.4.2.1. Considerations:

It is important to define the clustered index key with as few columns as possible. If a large

clustered index key is defined, any non clustered indexes that are defined on the same table will be

significantly larger because the non clustered index entries contain the clustering key.

Consider using a clustered index for:

Columns that contain a large number of distinct values.



15/33


Management Systems

Queries that return a range of values using operators such as BETWEEN, >, >=,


16/33


Management Systems

Covering index: A type of index that includes all the columns that are needed to process a

particular query. For example, your query might retrieve the FirstName and LastName

columns from a table, based on a value in the ContactID column. You can create a coveringindex that includes all three columns.

C. My SQL:

The MySQL server provides a database management system with querying and connectivity

capabilities, as well as the ability to have excellent data structure and integration with manydifferent platforms. It can handle large databases reliably and quickly in high-demanding

production environments. The MySQL server also provides rich function such as its connectivity,

speed, and security that make it suitable for accessing databases.

The MySQL server works in a client and server system. This system includes a multiple-threaded

SQL server that supports varied back ends, different client programs and libraries, administrative

tools, and many application programming interfaces (API)s.

2.5.PHYSICAL STORAGE STRUCUTRE IN MySQL:

The storage engines supported by MySQL are explained as follows:

2.5.1.InnoDB:

A transaction-safe (ACID compliant) storage engine for MySQL that has commit, rollback, and

crash-recovery capabilities to protect user data. InnoDB row-level locking (without escalation tocoarser granularity locks) and Oracle-style consistent non locking reads increase multi-user

concurrency and performance.

InnoDB stores user data in clustered indexes to reduce I/O for common queries based on primary

keys. To maintain data integrity, InnoDB also supports FOREIGN KEY referential-integrityconstraints. InnoDB is the default storage engine as of MySQL 5.5.5.

2.5.2.MyISAM:

The MySQL storage engine that is used the most in Web, data warehousing, and other application

environments. MyISAM is supported in all MySQL configurations, and is the default storageengine prior to MySQL 5.5.5.

2.5.3.Memory:

This storage engine stores all data in RAM for extremely fast access in environments that require

quick lookups of reference and other like data. This engine was formerly known as the HEAP

engine.

http://dev.mysql.com/doc/refman/5.5/en/innodb-storage-engine.htmlhttp://dev.mysql.com/doc/refman/5.5/en/myisam-storage-engine.htmlhttp://dev.mysql.com/doc/refman/5.5/en/memory-storage-engine.htmlhttp://dev.mysql.com/doc/refman/5.5/en/myisam-storage-engine.htmlhttp://dev.mysql.com/doc/refman/5.5/en/memory-storage-engine.htmlhttp://dev.mysql.com/doc/refman/5.5/en/innodb-storage-engine.html


17/33


Management Systems

2.5.4.Merge:

Enables a MySQL DBA or developer to logically group a series of identical MyISAM tables and

reference them as one object. Good for VLDB environments such as data warehousing.

2.5.5.Archive:

This storage engine provides the perfect solution for storing and retrieving large amounts of

seldom-referenced historical, archived, or security audit information.

2.5.6.Federated:

It offers the ability to link separate MySQL servers to create one logical database from many

physical servers. This storage engine is very good for distributed or data mart environments.

2.5.7.CSV:

The CSV storage engine stores data in text files using comma-separated values format. It can be

used to easily exchange data between other software and applications that can import and export in

CSV format.

2.5.8.Blackhole:

The Blackhole storage engine accepts but does not store data and retrievals always return an empty

set. The functionality can be used in distributed database design where data is automatically

replicated, but not stored locally.

It is important to remember that you are not restricted to using the same storage engine for anentire server or schema: you can use a different storage engine for each table in your schema [9].

The following table provides an overview of some storage engines provided with MySQL:

Feature MyISAM Memory InnoDB Archive NDB

Storage limits 256TB RAM 64TB None 384EB

Transactions No No Yes No Yes

Locking granularity Table Table Row Table Row

MVCC No No Yes No NoGeospatial data type support Yes No Yes Yes Yes

Geospatial indexing support Yes No No No No

B-tree indexes Yes Yes Yes No Yes

Hash indexes No Yes No No Yes

Full-text search indexes Yes No No No No

http://dev.mysql.com/doc/refman/5.5/en/merge-storage-engine.htmlhttp://dev.mysql.com/doc/refman/5.5/en/archive-storage-engine.htmlhttp://dev.mysql.com/doc/refman/5.5/en/federated-storage-engine.htmlhttp://dev.mysql.com/doc/refman/5.5/en/csv-storage-engine.htmlhttp://dev.mysql.com/doc/refman/5.5/en/blackhole-storage-engine.htmlhttp://dev.mysql.com/doc/refman/5.5/en/merge-storage-engine.htmlhttp://dev.mysql.com/doc/refman/5.5/en/archive-storage-engine.htmlhttp://dev.mysql.com/doc/refman/5.5/en/federated-storage-engine.htmlhttp://dev.mysql.com/doc/refman/5.5/en/csv-storage-engine.htmlhttp://dev.mysql.com/doc/refman/5.5/en/blackhole-storage-engine.html


18/33


Management Systems

Feature MyISAM Memory InnoDB Archive NDB

Clustered indexes No No Yes No No

Data caches No N/A Yes No Yes

Index caches Yes N/A Yes No Yes

Compressed data Yes No Yes Yes No

Encrypted data[d] Yes Yes Yes Yes Yes

Cluster database support No No No No Yes

Replication support[e] Yes Yes Yes Yes Yes

Foreign key support No No Yes No No

Backup / point-in-time recovery[f] Yes Yes Yes Yes Yes

Query cache support Yes Yes Yes Yes Yes

Update statistics for data dictionary Yes Yes Yes Yes Yes

2.6.INDEX TYPES IN MySQL:

MySQL allows four general types of indexes (keys). These indexes can be created either on single

column or multi-columns however both single column index and multi columns index have some

different behaviors.

Primary Indexes (also called clustered indexes)

Unique Key Indexes

Normal Indexes

Full text Indexes2.6.1. Primary Index:

The primary keys are almost always added when creating the table. However, MySQL providesmore than one SQL statements to create primary key indexes.

MySQL includes the standard storage engine, as well as the InnoDB storage engine, which is

touted as a transaction-safe, ACID-compliant database with some additional features over the

standard version. Every InnoDB table has a special index called the clustered index where the datafor the rows is stored. Typically, the clustered index is synonymous with theprimary key.

When a PRIMARY KEY is defined on a table, InnoDB uses it as the clustered index.Primary key for each table must be created. If there is no logical unique and non-null

column or set of columns, a new auto-increment column must be added, whose values arefilled in automatically.

If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally

generates a hidden clustered index on a synthetic column containing row ID values. Therows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is

http://dev.mysql.com/doc/refman/5.5/en/storage-engines.html#ftn.id941594http://dev.mysql.com/doc/refman/5.5/en/storage-engines.html#ftn.id941594http://dev.mysql.com/doc/refman/5.5/en/storage-engines.html#ftn.id941594http://dev.mysql.com/doc/refman/5.5/en/storage-engines.html#ftn.id941640http://dev.mysql.com/doc/refman/5.5/en/storage-engines.html#ftn.id941640http://dev.mysql.com/doc/refman/5.5/en/storage-engines.html#ftn.id941685http://dev.mysql.com/doc/refman/5.5/en/storage-engines.html#ftn.id941685http://www.mysqlfaqs.net/mysql-faqs/Indexes/What-is-single-column-index-or-key-in-MySQLhttp://www.mysqlfaqs.net/mysql-faqs/Indexes/What-is-multi-column-index-or-key-in-MySQLhttp://dev.mysql.com/doc/refman/5.6/en/glossary.html#glos_clustered_indexhttp://dev.mysql.com/doc/refman/5.6/en/glossary.html#glos_primary_keyhttp://dev.mysql.com/doc/refman/5.6/en/glossary.html#glos_auto_incrementhttp://dev.mysql.com/doc/refman/5.5/en/storage-engines.html#ftn.id941594http://dev.mysql.com/doc/refman/5.5/en/storage-engines.html#ftn.id941640http://dev.mysql.com/doc/refman/5.5/en/storage-engines.html#ftn.id941685http://www.mysqlfaqs.net/mysql-faqs/Indexes/What-is-single-column-index-or-key-in-MySQLhttp://www.mysqlfaqs.net/mysql-faqs/Indexes/What-is-multi-column-index-or-key-in-MySQLhttp://dev.mysql.com/doc/refman/5.6/en/glossary.html#glos_clustered_indexhttp://dev.mysql.com/doc/refman/5.6/en/glossary.html#glos_primary_keyhttp://dev.mysql.com/doc/refman/5.6/en/glossary.html#glos_auto_increment


19/33


20/33


Management Systems

column), FULLTEXT indexes are only useful for full text searches done with the MATCH() /

AGAINST() clause.

A full text index can be created for a TEXT, CHAR or VARCHAR type field, or combination offields. FULLTEXT indexes are most often used to search natural language text, such as through

newspaper articles, web page contents and so on. For this reason MySQL has added a number of

features to assist this kind of searching. MySQL does not index any words less than or equal to 3characters in length, nor does it index any words that appear in more than 50% of the rows. This

means that if your table contains 2 or less rows, a search on a FULLTEXT index will never return

anything.

D. DB2:

DB2 Universal Database is a Web-enabled relational database management system that supports

data warehousing and transaction processing. It can be scaled from hand-held computers to single

processors to clusters of computers and is multimedia-capable with image, audio, video, and textsupport.

2.7.PHYSICAL STORAGE STRUCTURE IN DB2:

In DB2, the hierarchy of storage units (from least inclusive to most inclusive) is Page, Extent,

Container, Table space, and Database as shown in Figure 6.

Figure 6: Storage Structure of DB2

2.7.1. Page:



21/33


Management Systems

In DB2, data is stored in a structure called a Page, it is the equivalent of an Oracle Block. The

basic page structure in DB2 is that there is a fixed length page header, a variable length page

trailer, and the space in between is used for data or free space. The information stored in a page canbe data records (rows or indexes) or system information such as a table space spacemap page,

which keeps track of the free space in data pages in a table space. The Page structure has been

shown in Figure 7.

Figure 7: DB2 database page

Database page size can vary and is determined by the buffer pool size. Pages sizes can be 4, 8, 16

and 32KB. It is possible to set up a database instance that has mixed page sizes in it 4KB for

system pages, and 16KB for user data page sizes.

2.7.2.Extent:

An extent in DB2 contains a number of contiguous pages. The page size and extent size in DB2 are

set at the time that the table space is created and cannot be altered easily (i.e. by an ALTER

command), once they are defined they are set for the life of the database. It is inefficient for aDBMS to work on individual pages, the I/O overhead eats into performance. Database I/O works

with extents, in this way a number of contiguous pages are fetched from disk in a single I/O. Table

data cannot be mixed within extents, so one extent is for one table only.

An extent is a multiple of pages, so selection of the page and extent size will have an impact on thetype of application processing you are doing. Typically an OLTP environment will have small

pages (4k,8k), and a small number of pages per extent: 4,6,16. Decision Support and Data

Warehousing applications will have larger pages (16 or 32K), and larger extents (16, 32, 64 or128). The extent size should be chosen based on the table size and anticipated usage, depending on

whether the application is query intensive, transaction intensive or a mixture of both. Smaller

tables are handled more efficiently with smaller extents.



22/33


Management Systems

The extent size is used by DB2 to determine the size of the prefetch block when data is retrieved. It

also determines the number of pages that will be written to a container before skipping to the next

container. If new data needs to be written and there is insufficient space, the write will physicallyallocate a full extent of contiguous pages.

2.7.3. Container:

Containers are how DB2 handles the storage of data on the operating systems file system. Data in

the database is viewed logically as table spaces, tables, rows and so on. Each table space is stored

and managed in one or more containers, and how the table space is defined determines how data in

the table space is handled on disk.

Containers can be managed by the DBA Database Managed Storage, by the Operating System

File System System Managed Storage, or by the Database Automatic Storage.

Containers (and how they are managed) are more concerned with the efficient usage of storage

than the logical considerations of the database.

2.7.4. Table Space:

A tablespace is a very important structure in DB2. Tablespaces are made up of a number of

containers which map directly to the disk. The containers in the tablespace specify the storage

paths or directory structures that will physically hold the data.

Tablespaces contain tables, and hence user data that maps to the user and database applications.

When you create a table, it has to go in a tablespace, and you can specify which tablespace to use

(if you dont, the system will default where it puts the table, and it might not be where you need it).

The tablespace is also what is mapped into RAM. For each tablespace, you have to define the pagesize, and each tablespace must have access to a bufferpool of the same page size, and a system

temporary tablespace of the same page size. You have to specify which buffer pool the table space

is associated with.



23/33


Management Systems

Figure 8:DB2 Extents, Container and Disk

2.7.5. Database:

In DB2 (like Oracle) a database is comprised of a number of tablespaces. The size of a database is

the sum of the tablespaces it contains, and the containers within each tablespace have storage pathswithin them that map to the underlying disk system, whether this is managed by the OS, DBMS or

DBA. When a database is first created, three system tablespaces are created:

The Catalog tablespace:

The catalog tablespace is a Regular tablespace called SYSCATSPACE and it holds the system

catalog tables. There is only one catalog table space per database.

Large table space:

DB2 creates one large table space named USERSPACE1 when a database is created. A large table

space stores all permanent data just as a regular table space does, including LOBs. This table space

type must be DMS.

http://commons.wikimedia.org/wiki/File:DB2_Extents_Containers_and_Disk.jpg


24/33


Management Systems

System temporary table space:

At least one system temporary must exist per database, and the default named TEMPSPACE1 is

created with the database. A system temporary table space stores internal temporary data requiredduring SQL operations such as joins, sorts, index creation, and table reorganization.

At database creation time, the recovery logs for the database are created, and a set of system tables

is created in SYSCATSPACE. These system catalog tables contain information about the

definitions of the database objects (i.e. tables, views, indexes, and packages), and database securityinformation (i.e. which users can have access to these objects.) These tables are updated during the

operation of a database; for example, when a table is created. The information in these tables is

available through a series of routines and views.

After the database is created, the user can create additional Large Tablespaces, Regular

Tablespaces and User Temporary Tablespaces. By default, no User Temporary Tablespaces are

created along with the database [10].

2.8.INDEX TYPES IN DB2:

2.8.1. Unique indexes::

When you define a unique index on a DB2 table, you ensure that no duplicate values of the index

key exist in the table.

2.8.2. Clustering indexes:

When you define a clustering index on a DB2 table, you direct DB2 to insert rows into the table in

the order of the clustering key values. The first index that you define on the table serves implicitly

as the clustering index unless you explicitly specify CLUSTER when you create or alter anotherindex. You can specify CLUSTER for any index, whether or not it is a partitioning index.

2.8.3. Partitioning indexes:

Before DB2 Version 8, when you defined a partitioning index on a table in a partitioned table

space, you specified the partitioning key and the limit key values in the PART VALUES clause ofthe CREATE INDEX statement. This type of partitioning is referred to as index-controlled

partitioning.

A partitioning index is optional; table-controlled partitioning is a replacement for index-controlledpartitioning. For more information, see Moving from index-controlled to table-controlled

partitioning.

2.8.4. Secondary indexes:

A secondary index is any index that is not a partitioning index. You can create an index on a tableto enforce a uniqueness constraint, to cluster data, or most typically to provide access paths to data

for queries.



25/33


Management Systems

The usefulness of an index depends on the columns in its key and on the cardinality of the key.

Columns that you use frequently in performing selection, join, grouping, and ordering operations

are good candidates for keys. In addition, the number of distinct values in an index key for a largetable must be sufficient for DB2 to use the index for data retrieval; otherwise, DB2 could choose to

perform a table space scan.

A secondary index can be partitioned or not. This section discusses the two types of secondary

indexes:

Non partitioned secondary index (NPSI)

Data-partitioned secondary index (DPSI)

2.8.4.1.Non partitioned secondary index (NPSI):

A non partitioned secondary index is any index that is not defined as a partitioning index or a

partitioned index. You can create a non partitioned secondary index on a table that resides in apartitioned table space or a non partitioned table space.

2.8.4.2.Data-partitioned secondary index (DPSI):

A data-partitioned secondary index is any index that is not defined as a partitioning index but isdefined as a partitioned index. You can create a partitioned secondary index only on a table that

resides in a partitioned table space. The partitioning scheme is the same as that of the data in the

underlying table. That is, the index entries that reference data in physical partition 1 of a tablereside in physical partition 1 of the index, and so on [11].

E. TERADATA:2.9.PHYSICAL STORAGE STRUCTURE IN TERADATA:

The physical storage of the database has been virtual for a long time. Virtual in the sense that the

data placement, organization and management are automatically taken care of by the Teradata

Database with minimal user input or administration. There are no tablespaces to define ormaintain, no buffer storage or temporary workspaces to manage, no requirement to define how to

spread data evenly across drives and controllers, and no need to perform periodic table or index

reorganizations. The new Teradata Virtual Storage option can enhance storage management. Tomaintain optimum system balance in a data warehouse from Teradata, this new capability allows

the physical storage to include a mix of devices with different capacities and performance levels.

Additionally, Teradata Virtual Storage manages the data that gets distributed across thoseresources.

2.9.1. Virtualization:

Prior to Teradata 13, all storage in the configuration was assumed to have equivalent capacity and

performance, and the only parameter used to determine where to store a piece of data was the ID of



26/33


Management Systems

the owning AMP (The database engine which is the only entity in the database that can create,

read, lock, unlock, update or delete a particular row of data).

Since each drive (or drive pair with RAID1) is reserved for the use of a single AMP, any data from

that AMP could be written to any of the drives that were reserved for it. All drives have equal

performance, so to help balance the system; all AMPs are given access to an equal number ofdrives. However, this makes it difficult to alter the amount of storage in the system for two

reasons: Any new drives must have the same capacity as the original drives, and enough drives

must be added to maintain an equal number for each AMP.

The link between each disk drive and an AMP is removed in Teradata Database 13 so that those

constraints can be relaxed in Teradata Virtual Storage, allowing storage to be made up of mixed

capacities. Even though there might not be enough high-capacity drives for each AMP, TeradataVirtual Storage will ensure the necessary space is allocated to each AMP to maintain balanced

performance.

2.9.2. Allocation:

In addition to changing how mixed storage is handled, Teradata Virtual Storage can alsomanipulate how it is used by the database. Space in the database can be allocated based on the

owning AMP and the performance requirementor response timefor the data that will be

stored. This is accomplished by measuring the expected performance of each drive within thestorage arrays. These measurements are used to match the correct storage resources with the type

of data, including user data (tables/rows), spool data, indexes, logs, journals, and fallback data and

so on.

2.9.3.Temperature Monitoring:

In data warehousing terms, temperature represents the relative demand for a particular set of data

(i.e., tables, partitions, rows). The datas temperature is described by a few qualitative terms, asopposed to a large range of explicit numerical values. Hot refers to data that is accessed

frequently by users, such as the last 30 days of data in the promotions redemption table. Coldrefers to data that is infrequently accessed.

While temperature refers to the frequency of data access, terms like fast and slow are explicit

to the performance characteristics of storage devices. In other words, the relative performance ofeach storage device is measured by response time.

2.9.4. Data Migration:

Teradata Virtual Storage can automatically move data to different storage locations that best match

its currently measured temperature. This is called migration. If there is a large difference betweenthe datas temperature and the response time of the location where it is currently stored, the data

can be moved to a more appropriate location (either faster or slower).This serves two purposes:

correcting any original placements that were made in predicting the initial data temperature andmodifying the placement of data to match its usage as it progresses through different temperature

cycles.



27/33


Management Systems

2.9.5. Varied Uses:

By more closely integrating the database and storage capabilities, Teradata Virtual Storage moreeffectively and efficiently uses storage for active and multi-temperature data warehouses. Many

organizations will factor Teradata Virtual Storage in their planning efforts to expand their data

warehouses with new subject areas and deeper histories, or additional applications and theassociated processing and storage requirements. As updated storage solutions and components are

released, Teradata Virtual Storage will let customers add them to the data warehouse.

2.10. INDEX TYPES IN TERADATA:

In the Teradata RDBMS, an index is used to define row uniqueness and retrieve data rows; it alsocan be used to enforce the primary key and unique constraints for a table.

The Teradata RDBMS support five types of indexes:

Unique Primary Index (UPI)

Unique Secondary Index (USI)

Non-Unique Primary Index (NUPI)

Non-Unique Secondary Index (NUPI)

Join Index

2.10.1. Unique Primary Index:

Primary index determines the distribution of table rows on the disks controlled by AMPs. In

Teradata RDBMS, a primary index is required for row distribution and storage. When a new row isinserted, its hash code is derived by applying a hashing algorithm to the value in the column(s) of

the primary code (as show in the following figure). Rows having the same primary index value are

stored on the same AMP.

Figure 9: Primary index concept



28/33


Management Systems

2.10.2. Secondary Index:

In addition to a primary index, up to 32 unique and non-unique secondary indexes can be defined

for a table. Comparing to primary indexes, Secondary indexes allow access to information in atable by alternate, less frequently used paths. A secondary index is a sub table that is stored in all

AMPs, but separately from the primary table. The sub tables, which are built and maintained by thesystem, contain the following;

RowIDs of the sub-table rows

Base table index column values

RowIDs of the base table rows (points)

As shown in the following Figure 10, the secondary index sub-table on each AMP is associated

with the base table by the rowID.

Figure 10: Secondary Index Concept

2.10.3. Join Index:

A join index is an indexing structure containing columns from multiple tables; specifically the

resulting columns form one or more tables. Rather than having to join individual tables each timethe join operation is needed, the query can be resolved via a join index and, in most cases,

dramatically improve performance.

Effects of Join Index

Depending on the complexity of the joins, the Join Index helps improve the performance of certaintypes of work. The following need to be considered when manipulating join indexes:



29/33


Management Systems

Load Utilities The join indexes are not supported by MultiLoad and FastLoad utilities,

they must be dropped and recreated after the table has been loaded.

Archive and Restore Archive and Restore cannot be used on join index itself. During arestore of a base table or database, the join index is marked as invalid. The join index must

be dropped and recreated before it can be used again in the execution of queries.

Fallback Protection Join index sub tables cannot be Fallback-protected. Permanent Journal Recovery The join index is not automatically rebuilt during the

recovery process. Instead, the join index is marked as invalid and the join index must be

dropped and recreated before it can be used again in the execution of queries.

Triggers A join index cannot be defined on a table with triggers.

Collecting Statistics In general, there is no benefit in collecting statistics on a join index

for joining columns specified in the join index definition itself. Statistics related to thesecolumns should be collected on the underlying base table rather than on the join index.



30/33


Management Systems

3. COMPARISON:

In this section the comparison of above stated database management systems has been done on the

basis of their indexing and physical storage structure.

3.1.COMPARISON ON THE BASIS OF INDEXES:

ORACLE SQL

SERVER

MySQL DB2 TERADAT

A

B tree index X X X

Primary index (Cluster

index)

X X X X X

Partitioned index X X

Reverse key index X X

Unique index X X X

Normal index X

Function based index X

Domain index X

Non Clustered index X

Secondary Index X X

Bitmap index X X X

Hash index X X X

Full Text index X X X X

Partial index X X X

In the above table the available indexes have been compared with the database management

systems. The X sign indicates the presence of certain index in particular DBMS.

3.2.COMPARISON ON THE BASIS OF PHYSICAL STORAGE STRUCTURE:



31/33


Management Systems

The comparative framework is attached as Annex A

4. BEST RECOMMENDED TOOL:

Oracle is well known for its industry-leading database engine, Oracle Database. Oracle Database is

an extremely reliable, highly scalable, client-server, relational database management system

(RDBMS). It serves as the information repository for a wide range of applications in a large variety

of industries around the world.

Oracle is targeted towards large organizations with the high cost it comes with.

Organizations with strength of 2000 to 4000 employees are considered to be in the large

category.

The physical storage structure of Oracle database includes; data files, redo log files, and

control files.

It allows the user to define logical storage units on available physical storage spaces and

allows for creation and management of data files manually.

Oracle has a powerful Clustering mechanism.

One of the main advantage of oracle is that it can be used on all operating systems i.e.

Linux, windows and many others.

Oracle uses so many security features like username, password, profiles, local

authentication, external authentication, advance security enhancements etc. While DBMS

like MySQL uses only three parameters to authenticate a user namely user name, password

and Location.

Oracle has functionality within the database server to automate the management of its

structure.

Oracle Enterprise Manager provides a Web-based graphical user interface to enable easy

management and monitoring of database.



32/33


Management Systems

Oracle provides alerts, advisories, and monitoring pages to help make decisions regarding

database storage.

Oracle support for manual management provides a very reasonable solution for a storage

structure optimization in this scenario of large organization.

Oracle is recommended as its storage structure is better than Sql Server and TeraData

because it has a large list of indexes and its file structure is also very consistent.

With Oracle, the process of choosing partitioning methods and partitioning keys is the

balancing of query access path, performance, and data load requirements. Specifically the

partitioning constraints and their relationship to disk storage are managed.

Teradata parallelism is automatic, pervasive, and database managed.



33/33


Management Systems

REFERENCES:

[1] http://www.esp.org/db-fund.pdf

[2]http://www.nou.edu.ng/noun/NOUN_OCL/pdf/pdf2/MBA%20758%20Database

%20Management%20System.pdf

[3]http://wiki.answers.com/Q/Discuss_the_capabilities_and_features_of_a_database_management

_system_dbms_what_set_of_functions_does_a_dbms_provide_give_some_examples_of_dbms_so

ftware[4] http://churmura.com/technology/computer-science/physical-storage-structures-in-oracle/29192/

[5] http://docs.oracle.com/cd/E11882_01/server.112/e16508/physical.htm

[6] http://technet.microsoft.com/en-us/library/cc966414.aspx

[7] http://www.akadia.com/services/sqlsrv_data_structure.html

[8] http://dev.mysql.com/doc/refman/5.6/en/innodb-table-and-index.html#

[9] http://docs.oracle.com/cd/E12151_01/doc.150/e12155/oracle_mysql_compared.htm

[10]http://en.wikibooks.org/wiki/Oracle_and_DB2,_Comparison_and_Compatibility/Storage_Mod

el/Physical_Storage/DB2

[11] http://yanghui8.wordpress.com/2008/06/24/db2-v8-types-of-indexes/

Documents

Advanced Database Systems Ass 5