Upload
betty-freeman
View
236
Download
2
Tags:
Embed Size (px)
Citation preview
Relational Database Theory
• Physical Database Design– Goals
• Improve performance–By minimizing disk I/O
• Improving management of the data
–By grouping tables that can be managed as a group
Relational Database Theory
• Physical design decisions are based on:
– Use of the data (volume, frequency)
– Features supported by the specific RDBMS
– Disk storage configuration
Relational Database Theory
• DBA initially sets up the physical database
– Tunes physical parameters on a ongoing basis
• As usage patterns change
• As new hardware/software options become available
• Steps in Physical Design Process– Determine which tables can be managed
as a group• Many RDBMSs support the concept of
a Container (Oracle Tablespace, db space, Access uses the .mdb)
–A collection of tables, and indexes
Relational Database Theory
– Develop a plan for allocating tables to disk devices• Consider parallel disk controllers• Group tables together that are
frequently joined• Distribute heavily accessed table to
different disk devices–To avoid excessive head movement
on one disk
Relational Database Theory
– Build indexes on table columns, based on frequency of use
– Restructure tables if necessary
• Fragment large tables into multiple smaller ones
• De-normalize tables if appropriate
Relational Database Theory
• Managing a collection of Tables, Indexes– Purpose of container concept
• Relate tables, indexes to physical disk files
• Aid in the management of the database–Example: A tablespace can be
taken offline, backed up, and restored while the remainder of the database is online
Relational Database Theory
– Support clustering data from related tables in the same file
• So that related data is read with the same I/O request
Relational Database Theory
• How the RDBMS processes a user request
– RDBMS parses, validates, and optimizes the SQL request
– Determines disk file in which the table is written
• Specific to each RDBMS & OS
Relational Database Theory
– Initiates I/O request to operating system, if necessary
• I/O is requested if file is not currently in buffers
– Processes execution plan using data in its buffer
Relational Database Theory
• Indexes
– Index is a separate structure (table)
• Points into the data table
• Built on one or more columns in the data table
Relational Database Theory
• Comments on Indexing– An index can be built on any column or
combination of columns– An index can be unique or non-unique– An index on the primary key is called the
primary index– Most RDBMSs use an internal row id as
the pointer to the row– Use of the index is transparent to the user
Relational Database Theory
• Use of an index
– Provides access to a row based on data value(s)
– Avoids duplicates – only way
– Supports sequential processing on the indexed field
– Improves performance
Relational Database Theory
• Use of an index improves performance on Retrieval– Processing an index is more efficient than
processing a table – for reads• Index is usually small, relative to the
table–Can be held entirely in memory
• The smaller the index value, the more entries per block the more likely the index will be in memory
Relational Database Theory
• Most RDBMSs use a type of B-Tree Index
– B-tree indexes were designed for efficient search of a sorted list
– Algorithms exist for managing and maintaining B-trees
Relational Database Theory
• B-trees were introduced by Bayer (1972) and McCreight.
– They are a special m-ary balanced tree used in databases because their structure allows records to be inserted, deleted, and retrieved with guaranteed worst-case performance
Relational Database Theory
• Use of index degrades performance on Updates
– Inserting a row is the source of much disk I/O (overhead)
• Every index on the table must be searched and updated also
Relational Database Theory
• Frequently inserting rows leads to index block overflow
– Causes much disk I/O as overflow condition is processed
Relational Database Theory
• Techniques for managing volatile tables (many interests, deletes)
– Partially fill index blocks when creating the index
– Periodically restructure (Drop, Create) the indexes
Relational Database Theory
• Indexing: Strengths and Weaknesses– Strengths
• Improves performance on retrieval of data
• Can be built or dropped at any time• Usage is transparent to the user
– Weaknesses• Degrades update performance
Relational Database Theory
• De-normalization– De-normalization means combining two
(or more) tables• Usually done when tables are
frequently joined– De-normalization (joining two tables)
depends on usage• Depends on how applications and
users access the data
Relational Database Theory
• De-normalization is done to improve performance
– Tailors data structures for one specific application’s use
– Improves performance of one type of access at expense of others
Relational Database Theory
Relational Database Theory
• De-normalization Trade-Offs
Normalization De-normalization
Eliminates update anomalies Improves performance for specific application(s)
Minimizes data redundancy
Supports simpler logic
Provides application-independent database design
Encourages sharing of data
• When to De-Normalize– This is EVIL, Do Not Do…– When does de-normalization have
minimal impact?• Data is accessed primarily on a
read-only basis• Data is accessed primarily by one
application
Relational Database Theory
• When to de-normalize
– After database design is done and tables are normalized to 3NF
– After clustering related tables in the same logical container
– After considering trade-offs and usage of data
Relational Database Theory
• Alternatives to de-normalization– Physical placement of data
• Use of container• Can improve performance without
impacting logical design– Selective hardware upgrades
• More main memory, expanded storage, cache storage devices
Relational Database Theory
• Fragmentation – Better alternative to de-normalization– Means breaking one table into two (or
more) tables• Usually done when one table is very
large• Or groups of user almost exclusively
access a subset of data in a table
Relational Database Theory
• Fragmentation can be based on selection or projection– Must be able to reconstruct the
original table – by union or join– Primary key column(s) must be
included in all vertical fragments• Disadvantage is that the DBA must be
aware of all the fragmented tables
Relational Database Theory
Relational Database Theory
• Physical Database Design– Goals
• Improve performance–By minimizing disk I/O
• Improving management of the data
–By grouping tables that can be managed as a group
• Indexing: Strengths and Weaknesses– Strengths
• Improves performance on retrieval of data
• Can be built or dropped at any time• Usage is transparent to the user
– Weaknesses• Degrades update performance
Relational Database Theory