IT 20303 The Relational DBMS Section 06. Relational Database Theory Physical Database Design

IT 20303

• The Relational DBMS

• Section 06

Relational Database Theory

• Physical Database Design


• Physical Database Design– Goals

• Improve performance–By minimizing disk I/O

• Improving management of the data

–By grouping tables that can be managed as a group


• Physical design decisions are based on:

– Use of the data (volume, frequency)

– Features supported by the specific RDBMS

– Disk storage configuration


• DBA initially sets up the physical database

– Tunes physical parameters on a ongoing basis

• As usage patterns change

• As new hardware/software options become available

• Steps in Physical Design Process– Determine which tables can be managed

as a group• Many RDBMSs support the concept of

a Container (Oracle Tablespace, db space, Access uses the .mdb)

–A collection of tables, and indexes


– Develop a plan for allocating tables to disk devices• Consider parallel disk controllers• Group tables together that are

frequently joined• Distribute heavily accessed table to

different disk devices–To avoid excessive head movement

on one disk


– Build indexes on table columns, based on frequency of use

– Restructure tables if necessary

• Fragment large tables into multiple smaller ones

• De-normalize tables if appropriate



• Example of a Container

Table 1 Table 2

Table N

Tablespace

OS File

• Managing a collection of Tables, Indexes– Purpose of container concept

• Relate tables, indexes to physical disk files

• Aid in the management of the database–Example: A tablespace can be

taken offline, backed up, and restored while the remainder of the database is online


– Support clustering data from related tables in the same file

• So that related data is read with the same I/O request


• How the RDBMS processes a user request

– RDBMS parses, validates, and optimizes the SQL request

– Determines disk file in which the table is written

• Specific to each RDBMS & OS


– Initiates I/O request to operating system, if necessary

• I/O is requested if file is not currently in buffers

– Processes execution plan using data in its buffer


• Indexes

– Index is a separate structure (table)

• Points into the data table

• Built on one or more columns in the data table


• Comments on Indexing– An index can be built on any column or

combination of columns– An index can be unique or non-unique– An index on the primary key is called the

primary index– Most RDBMSs use an internal row id as

the pointer to the row– Use of the index is transparent to the user


• Use of an index

– Provides access to a row based on data value(s)

– Avoids duplicates – only way

– Supports sequential processing on the indexed field

– Improves performance


• Use of an index improves performance on Retrieval– Processing an index is more efficient than

processing a table – for reads• Index is usually small, relative to the

table–Can be held entirely in memory

• The smaller the index value, the more entries per block the more likely the index will be in memory


• Most RDBMSs use a type of B-Tree Index

– B-tree indexes were designed for efficient search of a sorted list

– Algorithms exist for managing and maintaining B-trees


• B-trees were introduced by Bayer (1972) and McCreight.

– They are a special m-ary balanced tree used in databases because their structure allows records to be inserted, deleted, and retrieved with guaranteed worst-case performance


• B-Tree


• Use of index degrades performance on Updates

– Inserting a row is the source of much disk I/O (overhead)

• Every index on the table must be searched and updated also


• Frequently inserting rows leads to index block overflow

– Causes much disk I/O as overflow condition is processed


• Techniques for managing volatile tables (many interests, deletes)

– Partially fill index blocks when creating the index

– Periodically restructure (Drop, Create) the indexes


• Indexing: Strengths and Weaknesses– Strengths

• Improves performance on retrieval of data

• Can be built or dropped at any time• Usage is transparent to the user

– Weaknesses• Degrades update performance


• De-normalization– De-normalization means combining two

(or more) tables• Usually done when tables are

frequently joined– De-normalization (joining two tables)

depends on usage• Depends on how applications and

users access the data


• De-normalization is done to improve performance

– Tailors data structures for one specific application’s use

– Improves performance of one type of access at expense of others



• De-normalization Trade-Offs

Normalization De-normalization

Eliminates update anomalies Improves performance for specific application(s)

Minimizes data redundancy

Supports simpler logic

Provides application-independent database design

Encourages sharing of data

• When to De-Normalize– This is EVIL, Do Not Do…– When does de-normalization have

minimal impact?• Data is accessed primarily on a

read-only basis• Data is accessed primarily by one

application


• When to de-normalize

– After database design is done and tables are normalized to 3NF

– After clustering related tables in the same logical container

– After considering trade-offs and usage of data


• Alternatives to de-normalization– Physical placement of data

• Use of container• Can improve performance without

impacting logical design– Selective hardware upgrades

• More main memory, expanded storage, cache storage devices


• Fragmentation – Better alternative to de-normalization– Means breaking one table into two (or

more) tables• Usually done when one table is very

large• Or groups of user almost exclusively

access a subset of data in a table


• Fragmentation can be based on selection or projection– Must be able to reconstruct the

original table – by union or join– Primary key column(s) must be

included in all vertical fragments• Disadvantage is that the DBA must be

aware of all the fragmented tables


• Physical Design Review



• Physical Database Design– Goals

• Improve performance–By minimizing disk I/O

• Improving management of the data

–By grouping tables that can be managed as a group

• Indexing: Strengths and Weaknesses– Strengths

• Improves performance on retrieval of data

• Can be built or dropped at any time• Usage is transparent to the user

– Weaknesses• Degrades update performance


• Questions?


Documents

IT 20303 The Relational DBMS Section 06. Relational Database Theory Physical Database Design