Upload
ngophuc
View
218
Download
2
Embed Size (px)
Citation preview
Part 4
Relational Model
Why Other “Productivity Boosters” Have Failed
Low level of structural detail Record-at-a-time processing orientation
(Unwilling to give up “control”) No sharp distinction between logical and
physical Limited data independence
No operations on “aggregates” (files, sets, tables, relations, ...)
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 2
Relational Database model “Codd's” Model
E. F. (Ted) Codd, CACM V13 #6 (June, 1970), pp. 377-87. “A Relational Model of Data for Large Shared Data Banks”
Developed in mid-1970’s Based on the mathematical theory of relations Codd's definition:
Given sets S1, S2, ... , Sn (not necessarily distinct), R is a relation on these n sets if it is a set of n-tuples each of which has its first element from S1, its second element from S2, and so on.
We shall refer to Sj as the jth domain of R. R is said to have degree n. If R has m n-tuples (or just tuples), R is said to have
cardinality m.
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 3
The Beginning of Codd's Historic Article
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 4
Conceptual Idea of a Relation Conceptual (but not physical) ideas:
- A relation is a table or a flat file with n columns or fields and m rows or records - Column (or field) j represents a set of values (from a
possible set of values, Sj, the “domain”) for a particular attribute of all the entities
- Each row (or record) represents a set of values for an
entity, one for each attribute (column, field) - Degree - number of columns (fields, domains) - Cardinality - number of rows (records, entities, tuples)
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 5
Translation of Relational Terms
Relational Loose Term Equivalent Relation Table Tuple Row Degree # of attributes Cardinality # of table entries Domain field-level edit criteria
and integrity constraints
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 6
Requirements of a Relation All rows of the relation must have the same
attributes in the same order No repeating groups Each row must be unique
(No duplicate rows - if there are, they are “cast out”)
A set of columns that forms an identifier is the
table key
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 7
Recognizing and Eliminating Repeating Groups
Consider the 3M master sales history file:
1,000,000 entries of the form: Co Sect Div Grp Dept Item # Item Desc Qty $ Qty $
Qty $ Qty $ Qty $ Qty $ Qty $ Qty $ Qty $ Qty $ Where the first Qty/$ group represents sales this month,
the second sales last month, and the last sales 9 months ago. The primary key is the first 6 columns.
This needs to be broken up into two tables, with
two different keys as follows:
Co Sect Div Grp Dept Item # Item Desc and
Co Sect Div Grp Dept Item # Month Year Qty $
The first table has a 6-part primary key, with 1,000,000
entries. The second table has an 8-part primary key, with up to
10,000,000 entries. BUT zeroes for both qty and $ need to be stored - the
rows can be eliminated. This actually SAVES space.
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 8
Advantages of the Relational Model Logical not physical model
- easy to communicate Data Independence
- implementation independent Record interconnections are dynamically
generated based on data value - (no user-visible navigation links)
Set-at-a-time database operations (relational operators) locate, permute, join, select, project, derive, order, format, present
Join - the operator that “connects” tables - is unrestricted - it is not necessary to pre-define access paths
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 9
Differences in the Relational Model
Relational Non-relational Set-Oriented Navigational High-level Low-level
What How
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 10
Relational View of Sample Database
department
employee
task
project
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 11
Details of Department Relation
DeptNo Dname Loc Dbudget
400 programming 200 150000
401 financial 200 275000
402 academic 100 390000403 support 300 7000
attributes (columns)
enti-ties
domain 1 domain 4
tuple(row)
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 12
Organization of Relations in Sample Database
Relation (Entity type)
Attributes (Key underlined)
emp (Ename, Job, Mgr, Hired, Rate, Bonus, DeptNo)
dept (DeptNo, Dname, Loc, Dbudget)
task (Ename, Project_id, Tname, Hours)
proj (Project_id, Description, Pbudget, Due_date)
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 13
(Single Table) Relational Operations
named file,view, orrelation
booleanentityselectionexpression
locate relation
selection
namedattributes projection
derivationrules
entry-levelderivations
orderingspecification order
set-functionspecification
file-levelderivations
format,edit spec.,destination
formatting &presentation
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 14
Relational Algebra Relational operators take one or two relations as
their “operands” or arguments Result of applying a relational operator to a
relation (or pair of relations) is another relation
Consequently, relational operators can be used
in sequence to achieve the desired results
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 15
Locate Relation
Table may only “logically” exist
relation 1 relation 2 relation 3 relation 4
user
view 1 view 2
file 1 file 2 file 3 Table may contain “derived” elements
finance
table
(view)
Ename Rate Bonus task_hours
smith 35 165
jones 35 145
king 18 49
turner 75 1000 57
From emp table
From
task
table
Cardinality may be determined by user type (how much you get depends on who you are)
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 16
Selection Construct a new table by taking a subset of the
tuples in the relation Done by taking a horizontal subset of the table Select all rows of the relation that satisfy some
specified condition In SQL: SELECT * FROM EMP WHERE JOB =
'PROGRAMMER';
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 17
Projection Construct a new table by taking a subset of the
attributes in a relation Done by taking vertical slices out of the table Extract all columns whose names have been
specified (Prior to permanent storage) Remove all
duplicate rows in the resulting table In SQL: SELECT ENAME, JOB, RATE FROM
EMP;
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 18
Entry-Level Derivations Construct a new table by appending new
attributes to the relation that are derived from existing attributes
Done by extending the table horizontally with generated columns
Specify a rule for generating the new columns from existing columns
New columns generated may be named for future reference
In SQL: SELECT ENAME, RATE, BONUS,
TASK_HOURS, RATE*TASK_HOURS + BONUS AS EXTEND FROM FINANCE;
finance Ename Rate Bonus task_hours extend
table (view)
smith 35 165 5775 jones 35 145 5075 king 18 49 882 turner 75 1000 57 5275
New column (extend) generated by the formula
(Rate × task_hours) + Bonus
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 19
Order
• Construct a “new” table by permuting the tuples in a relation
• Done by rearranging the rows of the table • Sort the rows of the relation using any combination of
the attributes as sorting keys • Actually, two relations differing only in the positions
of the tuples (rows) within it are considered the same
(This is also true of permuting the attributes, provided
the names and domains are carried along) In SQL: SELECT * FROM DEPT ORDER BY
DNAME; DeptNo Dname Loc Dbudget
400 programming 200 150000 401 financial 200 257000 402 academic 100 390000 403 support 300 7000 Order by Dname to obtain:
DeptNo Dname Loc Dbudget 402 academic 100 390000 401 financial 200 257000 400 programming 200 150000 403 support 300 7000
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 20
File-Level Derivations
Obtain the results of set-level functions (Everything is a set in the relational model.)
Done by obtaining one or more “tables” of results (many are “tables” with one row and one attribute)
Perform sum, count, count unique, minimum, maximum, mean, variance, standard deviation, etc. on specified columns
Function results are generally not “appended” to the existing “table” in the relational sense - they represent “new” tables - with their own domains
In SQL: SELECT COUNT(DEPTNO), COUNT
(DISTINCT LOC), SUM(DBUDGET) FROM DEPT;
DeptNo Dname Loc Dbudget 400 programming 200 150000 401 financial 200 257000 402 academic 100 390000 403 support 300 7000
4 3 822000 count of departments
count of unique locations
total of all budgets
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 21
Formatting and Presentation Prepare the results in a form suitable for:
-transmission -storage -printing -plotting -screen display, etc.
Results may be sent to
-user screens -hard-copy devices -other software packages (e.g. Lotus 1-2-3), etc.
Specifications may be given through
-the query language (interactive or embedded) -a separate report writer -a variety of screen-oriented tools
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 22
Two Table Relational Operations Cartesian Product
All rows of the second table appended to all rows of the first table
No compatibility requirements Join
A form of parallel table lookup Both tables must share a domain
Union
All rows of the second table appended to the rows of the first table
Both tables must have the same domains Set Difference
All rows of the first table whose keys do not appear as keys in the second table
Both tables must share the same domains for their keys
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 23
Cartesian Product
If R1 and R2 are relations, the Cartesian product is written R1 × R2 (in relational algebra) or SELECT * FROM R1, R2; (in SQL)
A new relation is generated that consists of every tuple in R1 followed by every tuple in R2
relation empl relation group
name age dept dept loc able 20 35 35 100 baker 40 45 45 200 codd 60 45 25 100 date 30 25
Cartesian product empl × group empl.name empl.age empl.dept group.dept group.loc
able 20 35 35 100 able 20 35 45 200 able 20 35 25 100
baker 40 45 35 100 baker 40 45 45 200 baker 40 45 25 100 codd 60 45 35 100 codd 60 45 45 200 codd 60 45 25 100 date 30 25 35 100 date 30 25 45 200 date 30 25 25 100
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 24
Join Operation Form the Cartesian product between two
relations Cast out duplicates (assuming projection is done
also) Apply join conditions to select a subset of the
Cartesian product (selection) There are a variety of different join types,
differentiated by • which relations are used • what the join conditions are • what results are desired
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 25
Natural Join Operation (Simple join, inner equijoin)
- Start with two different tables, form the Cartesian
product (e.g. empl x group) empl.name empl.age empl.dept group.dept group.loc able 20 35 35 100 able 20 35 45 200 able 20 35 25 100 baker 40 45 35 100 baker 40 45 45 200 baker 40 45 25 100 codd 60 45 35 100 codd 60 45 45 200 codd 60 45 25 100 date 30 25 35 100 date 30 25 45 200 date 30 25 25 100
- Select rows where values of a pair of fields are equal (e.g. empl.dept and group.dept)
empl.name empl.age empl.dept group.dept group.loc able 20 35 35 100 baker 40 45 45 200 codd 60 45 45 200 date 30 25 25 100
- Project all except the duplicated column empl.name empl.age dept group.loc able 20 35 100 baker 40 45 200 codd 60 45 200 date 30 25 100
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 26
Expressing the Natural Join The natural join is written:
empl x group where empl.dept = group.dept
in the relational algebra The natural join is written:
SELECT * FROM EMPL, GROUP WHERE EMPL.DEPT = GROUP.DEPT;
in SQL The natural join performs a “table lookup”
function by “looking up” data from the second table for a field in the first table
Unfortunately, if no match is found for an item
“looked up” in the first table, that row in the first table is “lost”
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 27
Inner and Outer Joins
relation empl relation group
name age dept dept loc fox 30 25 25 100 gun 35 27 30 200 hal 27 30 40 150
Inner join produces:
empl.name empl.age dept group.loc fox 30 25 100 hal 27 30 200
Outer join produces:
empl.name empl.age dept group.loc fox 30 25 100 hal 27 30 200 gun 35 27 40 150
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 28
Outer Joins Outer Join
• selects from two different tables • keeps rows where values of a pair of fields are equal • if a row in either original table does not appear as a
part of any row after the inner join, that original row is appended to the join table with null values in the remaining fields
• In Oracle SQL: SELECT * FROM EMPL+, DEPT+ WHERE EMPL.DEPT = GROUP.DEPT;
Left Outer Join
• only members from the left original table are appended
Right Outer Join
• only members from the right original table are appended
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 29
Left and Right Outer Joins Left outer join produces: empl.name empl.age dept group.loc fox 30 25 100 hal 27 30 200 gun 35 27
• In Oracle SQL: SELECT * FROM EMPL+, DEPT WHERE EMPL.DEPT = GROUP.DEPT;
Right outer join produces: empl.name empl.age dept group.loc fox 30 25 100 hal 27 30 200 40 150
• In Oracle SQL: SELECT * FROM EMPL, DEPT+ WHERE EMPL.DEPT = GROUP.DEPT;
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 30
Other Types of Joins Self join
• joins a table with itself • e.g. desire a list of all employees with the department
number of their manager • two fields in the table must have the same domain
Non-equijoin (θ join or theta join)
• the condition for keeping rows is not an equality • inequality (<, >, ≤,≥, !=) • wild-card match (= 'S*', = '*MAN', = '1?') • other (IN, BETWEEN, LIKE, IS NULL, NOT)
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 31
Relational Algebra Notation Project:
R2 = πR1 (attribute 1, attribute 2, ...)
Select: R3 = R2 WHERE condition
Product (or Cartesian Product)
R5 = R3 × R4
Join: R7 = R5 × R6 WHERE condition
Union:
R9 = R7 UNION R8 or
R9 = R7 + R8
Difference: R11 = R9 - R10
Intersection:
R13 = R11 INTERSECT R12
Division: R15 = R13 / R14
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 32
Union, Difference, and Intersection
Suppose sets A and B have duplicated rows as show by the crosshatched region below
A BA
B
Union: Everything in one relation or the other or both,
but only one copy
Difference: Everything in the first relation that is NOT duplicated in the second relation
Intersection: One copy of everything that appears in both relations
A + BUnion
A - BDifference
A Intersect BIntersection
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 33
Division Division is a rather unusual construction in the
relational algebra. It is generally used to establish association rather than causation. It is useful in data mining application. If we divide relation A by relation B, then the
result is those first parts of rows in A that have all of the rows of relation B.
For example, if relation A is:
Ename Project_id allen admit allen billing barger admit barger alumni barger billing jones billing jones budget king admit
...and if relation B is:
Project_id admit billing
... then A / B is
Ename allen barger
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 34
Example Use of Relational Operators
1. Retrieve information on “large” projects - projects with a budget in excess of $ 10,000
R1 = proj WHERE Pbudget > 10000
2. The project_id's of large projects R2 = πR1 (Project_id)
3. Budgets and names of projects 'olson' is working
on R3 = proj × task (Ename, Description, Pbudget) WHERE task.Project_id = proj.Project_id R4 = R3 WHERE Ename = 'olson' R5 = πR4 (Description, Pbudget)
4. Budgets and names of projects 'olson' is working
on - in a single expression R6 = proj × task (Description, Pbudget) WHERE task.Project_id = proj.Project_id AND Ename = 'olson'
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 35
Copyright 1971-2004 Thomas P. Sturm Relational Model Part 4, Page 36
Example Use of SQL
1. Retrieve information on “large” projects - projects with a budget in excess of $ 10,000
SELECT * FROM PROJ WHERE PBUDGET > 10000;
2. The project_id's of large projects SELECT PROJECT_ID FROM PROJ WHERE
PBUDGET > 10000;
3. Budgets and names of projects 'olson' is working on
SELECT PROJ.DESCRIPTION, PROJ.PBUDGET FROM PROJ, TASK WHERE TASK.PROJECT_ID = PROJ.PROJECT_ID AND TASK.ENAME = 'OLSON';