Upload
linette-parsons
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
TM 5-1Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Chapter 5:Physical Database Design and
Performance
Jason C. H. Chen, Ph.D.
Professor of MIS
School of Business Administration
Gonzaga University
Spokane, WA 99258
TM 5-2Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Objectives• Definition of terms• Describe the physical database design process• Choose storage formats for attributes• Select appropriate file organizations• Describe three types of file organization• Describe indexes and their appropriate use• Translate a database model into efficient structures• Know when and how to use denormalization
TM 5-3Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Physical Database Design
• Purpose–translate the logical description of data into the technical specifications for storing and retrieving data
• Goal–create a design for storing data that will provide adequate performance and insure database integrity, security, and recoverability
TM 5-4Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Physical Design Process
Normalized relations
Volume estimates
Attribute definitions
Response time expectations
Data security needs
Backup/recovery needs
Integrity expectations
DBMS technology used
Inputs
Attribute data types
Physical record descriptions (doesn’t always
match logical design)
File organizations
Indexes and database architectures
Query optimization
Leads to
Decisions
TM 5-5Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems5
Figure 5-1 Composite usage map (Pine Valley Furniture Company)
TM 5-6Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
What will we learn fromUsage Map?
• Why should Manufacutred_Part and Purchased_Part tables be separated?
• Why should SUPPLIER and SUPPLIES tables be combined?
TM 5-7Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems7
Figure 5-1 Composite usage map (Pine Valley Furniture Company) (cont.)
Database access frequencies are estimated from “transaction volumes.”
Data (transaction)volumes
TM 5-8Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Figure 5-1 Composite usage map (Pine Valley Furniture Company) (cont.)
Access Frequencies (per hour)
TM 5-9Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Figure 5-1 Composite usage map (Pine Valley Furniture Company) (cont.)
Usage analysis:14,000 purchased parts accessed per hour 8000 quotations accessed from these 140 purchased part accesses 7000 suppliers accessed from these 8000 quotation accesses
TM 5-10Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Figure 5-1 Composite usage map (Pine Valley Furniture Company) (cont.)
Usage analysis:7500 suppliers accessed per hour 4000 quotations accessed from these 7500 supplier accesses 4000 purchased parts accessed from these 4000 quotation accesses
TM 5-11Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems11
Designing Fields
• Field: smallest unit of data in database
• Field design –Choosing data type–Coding, compression, encryption–Controlling data integrity
TM 5-12Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems12
Choosing Data Types
• CHAR–fixed-length character• VARCHAR2–variable-length character (memo)• LONG–large number• NUMBER–positive/negative number• INEGER–positive/negative whole number• DATE–actual date• BLOB–binary large object (good for graphics,
sound clips, etc.)
TM 5-13Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems13
Figure 5-2 Example code look-up table(Pine Valley Furniture Company)
Code saves space, but costs an additional lookup to obtain actual value
TM 5-14Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems14
Field Data Integrity• Default value–assumed value if no explicit value
– e.g., s_state CHAR(2) DEFAULT ‘WI’;• Range control–allowable value limitations
(constraints or validation rules)• Null value control–allowing or prohibiting empty
fields• Referential integrity–range control (and null value
allowances) for foreign-key to primary-key match-ups
Sarbanes-Oxley Act (SOX) legislates importance of financial data integrity
TM 5-15Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Handling Missing Data
• Substitute an estimate of the missing value (e.g., using a formula)
• Construct a report listing missing values• In programs, ignore missing data unless the value
is significant (sensitivity testing)
Triggers can be used to perform these operations
TM 5-16Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Physical Records• Physical Record: A group of fields stored in
adjacent memory locations and retrieved together as a unit
• Page: The amount of data read or written in one I/O operation
• Blocking Factor: The number of physical records per page
TM 5-17Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Denormalization• Transforming normalized relations into unnormalized physical
record specifications• Benefits:
– Can improve performance (speed) by reducing number of table lookups (i.e. reduce number of necessary join queries)
• Costs (due to data duplication)– Wasted storage space– Data integrity/consistency threats
• Common denormalization opportunities– One-to-one relationship (Fig. 5-3)– Many-to-many relationship with attributes (Fig. 5-4)– Reference data (1:N relationship where 1-side has data not used in any
other relationship) (Fig. 5-5)
TM 5-18Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems18
Figure 5-3 A possible denormalization situation: two entities with one-to-one relationship
mandatory optional
TM 5-19Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Figure 5-4 A possible denormalization situation: a many-to-many relationship with nonkey attributes
Extra table access required
Null description possible
TM 5-20Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Figure 5-5A possible denormalization situation:reference data
Extra table access required
Data duplication
TM 5-21Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Partitioning• Horizontal Partitioning: Distributing the rows of a table
into several separate files– Useful for situations where different users need access
to different rows– Three types: Key Range Partitioning, Hash
Partitioning, or Composite Partitioning• Vertical Partitioning: Distributing the columns of a table
into several separate relations– Useful for situations where different users need access
to different columns– The primary key must be repeated in each file
• Combinations of Horizontal and Vertical
Partitions often correspond with User Schemas (user views)
TM 5-22Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Partitioning (cont.)• Advantages of Partitioning:
– Efficiency: Records used together are grouped together– Local optimization: Each partition can be optimized for
performance– Security, recovery– Load balancing: Partitions stored on different disks,
reduces contention– Take advantage of parallel processing capability
• Disadvantages of Partitioning:– Inconsistent access speed: Slow retrievals across partitions– Complexity: Non-transparent partitioning– Extra space or update time: Duplicate data; access from
multiple partitions
TM 5-23Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems4-23
Figure: Magnetic Disk Components
This figure shows how data are recorded on magnetic disks.
TM 5-24Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Oracle 11g Horizontal Partitioning Methods
• Range partitioning– Partitions defined by range of field values– Could result in unbalanced distribution of rows– Like-valued fields share partitions
• Hash partitioning– Partitions defined via hash functions– Will guarantee balanced distribution of rows– Partition could contain widely varying valued fields
• List partitioning– Based on predefined lists of values for the partitioning key
• Composite partitioning– Combination of the other approaches
TM 5-25Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems25
Designing Physical Files
• Physical File: – A named portion of secondary memory allocated for
the purpose of storing physical records– Tablespace–named set of disk storage elements in
which physical files for database tables can be stored– Extent–contiguous section of disk space
• Constructs to link two pieces of data:– Sequential storage– Pointers : field of data that can be used to locate
related fields or records
TM 5-26Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Figure 5-6 DBMS terminology in an Oracle 11g environment
TM 5-27Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems27
File Organizations• Technique for physically arranging records of a file on
secondary storage• Factors for selecting file organization:
– Fast data retrieval and throughput– Efficient storage space utilization– Protection from failure and data loss– Minimizing need for reorganization– Accommodating growth– Security from unauthorized use
• Types of file organizations– Sequential (Fig. 5-7a): the most efficient with storage space.– Indexed (Fig. 5-7b) : quick retrieval
• Join Index, Fig 5-8– Hashed (Fig. 5-7c) : easiest to update
TM 5-28Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems28
Figure 5-7a Sequential file organization
If not sortedAverage time to find desired record = n/2
1
2
n
Records of the file are stored in sequence by the primary key field values
If sorted – every insert or delete requires resort
TM 5-29Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Indexed File Organizations• Indexed File Organization: the storage of records
either sequentially or nonsequentially with an index that allows software to locate individual records
• Index: a table or other data structure used to determine in a file the location of records that satisfy some condition
• Primary keys are automatically indexed• Other fields or combinations of fields can also be
indexed; these are called secondary keys (or nonunique keys)
• Oracle has a CREATE INDEX operation, and MS ACCESS allows indexes to be created for most field types
TM 5-30Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Figure 5-7b Indexed file organization
uses a tree searchAverage time to find desired record = depth of the tree
TM 5-31Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems31
Figure 5-7cHashed file or index organization
Hash algorithmUsually uses division-remainder to determine record position. Records with same position are grouped in lists
TM 5-32Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Figure 5-8 Join Indexes–speeds up join operations
a) Join index for common non-key columns
b) Join index for matching foreign key (FK) and primary key (PK)
TM 5-33Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems33
Understand their general concepts
TM 5-34Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems34
Clustering Files
• In some relational DBMSs, related records from different tables can be stored together in the same disk area
• Useful for improving performance of join operations
• Primary key records of the main table are stored adjacent to associated foreign key records of the dependent table
• e.g. Oracle has a CREATE CLUSTER command
TM 5-35Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Cluster• Clustering is a method of storing tables that are
intimately related and often joined together into the same area on disk. For example, instead of the WORKER table being in one section of the disk and the WORKERSKILL table being somewhere else, their rows could be interleaved together in a single area, called a cluster.
• The cluster key is the field (or fields) by which the table are usually joined in a query (e.g., Worker_ID in the example). To cluster tables, you must own the tables you are going to cluster together.
TM 5-36Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Fig. 4-10: Mapping an entity with a multivalued attribute
(a) Employee entity type with multivalued attribute
(b) Mapping a multivalued attributeMultivalued attribute becomes a separate relation with foreign key
[Two relations created with one containing all of the attributes except the multi-valued attribute, and the second one contains the pk (on the first one) and the multi-valued attribute]
One–to–many relationship between original entity and new relation
WORKER
Worker_ID Name LodgingAge
Worker_ID Skill Ability
WORKERSKILL
WORKERWorker_IDNameAgeLoding{Skill, Ability}
TM 5-37Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
The following is the basic syntax of the create cluster command:
CREATE CLUSTER cluster_name (field-1 datatype, Field-2 datatype …)
CREATE CLUSTER WORKERandSKILL (Judy VARCHAR2(25));
Cluster created.
This creates a cluster (a space is set aside, as it would be for a table) with nothing in it. The use of Worker_Cluster for the cluster key is irrelevant; you’ll never use it again.
Next, tables are created to be included in this cluster.
TM 5-38Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
CREATE TABLE WORKER (Worker_ID NUMBER(2),Name VARCHAR2 (25) NOT NULL,Age NUMBER (2),Lodging VARCHAR2 (15))CLUSTER WORKERandSKILL (Worker_ID);
Prior to inserting rows into WORKER, you must create a cluster index:
CREATE INDEX WORKERandSKILL_NDXON CLUSTER WORKERandSKILL;Notice that nowhere does either statement say explicitly that the Worker_ID column goes into the July cluster key. The mentioned in their respective cluster statements. Multiple columns and cluster keys are matched first to first, second to second, third to third, and so on. Now a second table is added to the cluster.
TM 5-39Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
CREATE TABLE WORKERSKILL (Worker_ID NUMBER(2),Skill VARCHAR2(25) NOT NULL,Ability VARCHAR2 (15) NOT NULL) CLUSTER WORKERandSKILL (Worker_ID);
We will investigate how the tables are stored in the cluster.Recall that WORKER table has four fields: Worker_ID, Name, Age, and Lodging. The WORKERSKILL table has three fields: Worker_ID, Skill, and Ability. When these two tables are clustered, each unique Worker_ID is actually stored only once, in the cluster key. To each Worker_ID are attached the columns from both of these tables.
The data from both of these tables is actually stored in a single location, almost as if the cluster where a big table containing data drawn from both of the tables that make it up.
TM 5-40Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
AGE LODGING NAME WORKER_ID SKILL ABILITY
2329221816434155153327
21253335326726
25
PAPA KINGROSE HILLCRANMERROSE HILLMATTSWEITBROCHTROSE HILLPAPA KING
MATTSROSE HILL
ROSE HILLROSE HILLCRANMERWEITBROCHTMATTSMULLERS
CRANMER
ADAH TALBOTANDREW DYEBART SARJEANTDICK JONESDONALD ROLLOELBERT TALBOTGEORGE OSCARGERHARDT KENTGENHELEN BRANDTJED HOPKINSJOHN PEARSON
PALMER WALLBOMPAT LAVAYPETER LAWSONRICHARD KOCH ROLAND BRANDTVICTORIA LYNNWILFRED LOWELL
WILLIAM SWING
1234567891011
12131415161718
19
PHOTO
SMITHY
COMPUTER
COMBINE DRIVER
COMBINE DRIVRWOODCUTTERSMITHY
SMITHYPHOTOCOMPUTER
GOOD
EXCELLENT
SLOW
VERY FAST
GOODAVERAGEGOOD
PRECISEAVERAGEAVERAGE
Figure: How data is stored in clusters
from WORKER tablefrom WORKERSKILL table
CLUSTER key
TM 5-41Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
AGE LODGING NAME WORKER_ID SKILL ABILITY
2329221816434155153327
21253335326726
25
PAPA KINGROSE HILLCRANMERROSE HILLMATTSWEITBROCHTROSE HILLPAPA KING
MATTSROSE HILL
ROSE HILLROSE HILLCRANMERWEITBROCHTMATTSMULLERS
CRANMER
ADAH TALBOTANDREW DYEBART SARJEANTDICK JONESDONALD ROLLOELBERT TALBOTGEORGE OSCARGERHARDT KENTGENHELEN BRANDTJED HOPKINSJOHN PEARSON
PALMER WALLBOMPAT LAVAYPETER LAWSONRICHARD KOCH ROLAND BRANDTVICTORIA LYNNWILFRED LOWELL
WILLIAM SWING
1234567891011
12131415161718
19
PHOTO
SMITHY
COMPUTER
COMBINE DRIVER
COMBINE DRIVRWOODCUTTERSMITHY
SMITHYPHOTOCOMPUTER
GOOD
EXCELLENT
SLOW
VERY FAST
GOODAVERAGEGOOD
PRECISEAVERAGEAVERAGE
Figure: How data is stored in clusters
Why we do not simply create a table like the one shown above?
TM 5-42Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Optimizing Query Processing -A Quick Way to Speed Access to Your data
• Indexes may be used to improve the efficiency of data searches to meet particular search criteria after the table has been in use for some time. Therefore, the ability to create indexes quickly and efficiently at any time is important.
• SQL indexes can be created on the basis of any selected attributes
TM 5-43Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Optimizing Query Processing (conti.)
For example:SELECT index_name FROM user_indexes;
SELECT *FROM inventoryWHERE inv_qoh>= 130;
CREATE INDEX qoh_indexON inventory (inv_qoh);
Note that this statement defines an index called qoh_index for the qoh column in the inventory table. This index ensures that in the next example SQL only needs to look at row in the database that satisfy the WHERE condition, and is, therefore, quicker to produce an answer.
TM 5-44Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
SQL> SELECT * 2 FROM inventory 3 WHERE inv_qoh>= 130;
INV_ID ITEM_ID COLOR INV_SIZE INV_PRICE INV_QOH---------- ---------- -------------------- ---------- ---------- ---------- 3 3 Khaki S 29.95 150 4 3 Khaki M 29.95 147 6 3 Navy S 29.95 139 7 3 Navy M 29.95 137 9 4 Eggplant S 59.95 135 10 4 Eggplant M 59.95 168 11 4 Eggplant L 59.95 187 19 5 Bright Pink 10 15.99 148 20 5 Bright Pink 11 15.99 137 21 5 Bright Pink 12 15.99 134
10 rows selected.
CREATE INDEX qoh_index ON inventory (inv_qoh);
SQL> SELECT * 2 FROM inventory 3 WHERE inv_qoh>= 130;
INV_ID ITEM_ID COLOR INV_SIZE INV_PRICE INV_QOH---------- ---------- -------------------- ---------- ---------- ---------- 21 5 Bright Pink 12 15.99 134 9 4 Eggplant S 59.95 135 7 3 Navy M 29.95 137 20 5 Bright Pink 11 15.99 137 6 3 Navy S 29.95 139 4 3 Khaki M 29.95 147 19 5 Bright Pink 10 15.99 148 3 3 Khaki S 29.95 150 10 4 Eggplant M 59.95 168 11 4 Eggplant L 59.95 187
10 rows selected.
TM 5-45Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Optimizing Query Processing (conti.)
You may even create an index that prevents you from using a value that has been used before. Such a feature is especially useful when the index field (attribute) is a primary key whose values must not be duplicated:
CREATE UNIQUE INDEX <index_field>ON <tablename> (the key field);
SELECT index_name FROM user_indexes;DROP INDEX <index_name>;
TM 5-46Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Summary on Optimizing Query Processing (conti.)
hThe indexes are defined so as to optimize the processing of SELECT statements. ;
hAn index is never explicitly referenced in a SELECT statement; the syntax of SQL does not allow this;
hDuring the processing of a statement, SQL itself determines whether an existing index will be used;
hAn index may be created or deleted at any time;
hWhen updating, inserting or deleting rows, SQL also maintains the indexes on the tables concerned. This means that, on the one hand, the processing time for SELECT statements is reduced, while, on the other hand, the processing time for update statements (such as INSERT, UPDATE and DELETE) can increase.
TM 5-47Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Rules for Using Indexes
1. Use on larger tables2. Index the primary key of each table3. Index search fields (fields frequently in
WHERE clause)4. Fields in SQL ORDER BY and GROUP BY
commands5. When there are >100 values but not when
there are <30 values
TM 5-48Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Rules for Using Indexes (cont.)6. Avoid use of indexes for fields with long values;
perhaps compress values first7. If key to index is used to determine location of
record, use surrogate (like sequence nbr) to allow even spread in storage area
8. DBMS may have limit on number of indexes per table and number of bytes per indexed field(s)
9. Be careful of indexing attributes with null values; many DBMSs will not recognize null values in an index search
TM 5-49Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Query Optimization
• Parallel query processing–possible when working in multiprocessor systems
• Overriding automatic query optimization–allows for query writers to preempt the automated optimization
• Picking data block size–factors to consider include:– Block contention, random and sequential row
access speed, row size• Balancing I/O across disk controllers
TM 5-50Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems50
Query Design Guidelines• 1. Understand how indexes are used• 2. Keep optimization statistics up-to-date• 3. Use compatible data types for fields and literals• 4. Write simple queries• 5. Break complex queries into multiple, simple parts• 6. Don’t use one query inside another• 7. Don’t combine a table with itself
TM 5-51Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems51
Query Design Guidelines (cont.)
• 8. Create temporary tables for groups of queries• 9. Combine update operations• 10. Retrieve only the data you need• 11. Don’t have the DBMS sort without an index• 12. Learn!
– Learning to Learn and Learning to Change!• 13. Consider the total query processing time for ad
hoc queries
TM 5-52Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems52
Dat
abas
e A
rch
itec
ture
s(M
odel
) (F
igu
re-E
xtra
)Legacy
Systems
Current
Technology
Data
Warehouses
multidimensional
TM 5-53Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Exercise and Homework• Exercises (p.234)
– #1 (index), 5 (storage), 8 (denormalization)
• HW (p.235) [usage map]– #19