Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
1
Converting ER to Relational Schema
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
1
Project P-number P-name Due-Date
Employee SSN E-Name Office
0..* Assignment 0..*
0..* Manager 1..1
2
Translate each entity into a table, with keys.
Entity :
can be represented as a table in the relational model
has a key … which becomes a key for the table
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
CREATE TABLE Employee (SSN CHAR(11) NOT NULL,
E-Name CHAR(20),
Office INTEGER,
PRIMARY KEY (SSN))
Employee
SSN E-Name
Office
3
Translate each many-to-many relationship set into a table
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
3
What are the attributes and
what is the key for Assignment?
Project(P-number, P-name, Due-Date) Employee(SSN, E-Name, Office)
Project P-number P-name Due-Date
Employee SSN E-Name Office
0..* Assignment 0..*
0..* Manager 1..1
4
Answer: Assignment(P-Number, SSN)
P-Number is a foreign key for Project SSN is a foreign key for Employee
Project(P-Number, P-Due-Date)
Employee(SSN, E-Name, Office)
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
Project P-number P-name Due-Date
Employee SSN E-Name Office
0..* Assignment 0..*
0..* Manager 1..1
What do we do with a one-to-many relationship?
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 5
For example, what do we do with Manager?
Project(P-number, P-name, Due-Date)
Employee(SSN, E-Name, Office)
Project P-number P-name Due-Date
Employee SSN E-Name Office
0..* Assignment 0..*
0..* Manager 1..1
Create a foreign key for a 1-to-many relationship.
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 6
Project(P-number, P-name, Due-Date, Manager) Employee(SSN, E-Name, Office)
Manager is a foreign key (referencing the Employee relation)
value of Manager must match an SSN in Employee
Project P-number P-name Due-Date
Employee SSN E-Name Office
0..* Assignment 0..*
0..* Manager 1..1
Or...Create a table for a 1-many relationship.
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 7
Project(P-number, P-name, Due-Date, Manager) Employee(SSN, E-Name, Office)
vs.
Project(P-number, P-name, Due-Date) Employee(SSN, E-Name, Office)
Manager(P-number, SSN) What are the tradeoffs between these two?
Note:
P-number
is the key
for Manager
Project P-number P-name Due-Date
Employee SSN E-Name Office
0..* Assignment 0..*
0..* Manager 1..1
Project (P-number, P-name, Due-Date) Employee (SSN, E-Name, Office, Managed-project)
vs. Project (P-number, P-name, Due-Date)
Employee (SSN, E-Name, Office) Manager (P-number, SSN)
What if SSN is the key for Manager?
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 8
Project P-number P-name Due-Date
Employee SSN E-Name Office
0..* Assignment 0..*
0..1 Manager 0..*
Add attributes to the table for the relationship
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 9
Assignment(P-number, SSN, role, start-date, end-date) Project(P-number, P-name, Due-Date) Employee(SSN, E-Name, Office)
role start-date end-date
Project P-number P-name Due-Date
Employee SSN E-Name Office
0..* Assignment 0..*
0..* Manager 1..1
What if a 1-to-many relationship has an attribute?
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 10
Project(P-number, P-name, Due-Date, Manager) Employee(SSN, E-Name, Office)
Employee SSN E-Name Office
Manager
start-date end-date
Project P-number P-name Due-Date
Add attributes to the table for the relationship?
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 11
Project(P-number, P-name, Due-Date, Manager, start-date, end-date) Employee(SSN, E-Name, Office)
Employee SSN E-Name Office
Manager
start-date end-date
Project P-number P-name Due-Date
Add attributes to the table for the relationship?
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 12
Project(P-number, P-name, Due-Date, Manages(P-number, SSN, start-date, end-date) Employee(SSN, E-Name, Office)
Employee SSN E-Name Office
Manager
start-date end-date
Project P-number P-name Due-Date
No, bad idea. You should use an extra table for the relationship.
Notice the key for the Manages table.
13
Minimum cardinality of 1 in SQL We can require any table to be in a binary relationship using a foreign key which is required to be NOT NULL (but little else without resorting to CHECK constraints)
CREATE TABLE Department ( d-code INTEGER,
d-name CHAR(20), manager-ssn CHAR(9) NOT NULL, since DATE, PRIMARY KEY (d-code), FOREIGN KEY (manager-ssn) REFERENCES Employee, ON DELETE NO ACTION)
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
14
Weak Entity Sets
Policy
dep-name
cost
Insures
strong
entity
set
identifying
relationship
set
weak
entity
set
Employee
SSN
name
office
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
Translating Weak Entity Sets
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
Weak entity sets and identifying relationship sets are translated into a single table. Must include key of strong entity set, as a foreign key.
When the owner entity is deleted, all owned weak entities must also be deleted.
CREATE TABLE Insurance_Policy ( dep-name CHAR(20),
cost REAL, ssn CHAR(11) NOT NULL, PRIMARY KEY (dep-name, ssn), FOREIGN KEY (ssn) REFERENCES Employee, ON DELETE CASCADE)
Translating entities involved in ISA
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 16
person
ssn
name
age
employee
position
salary
dependent
gender listed on insurance
You have three choices:
1. One table: person with all
attributes
2. Two tables: employee
and dependent with all
attributes from person
copied down
3. Three tables: one for
person, employee, and
dependent
Option 1: Use one table (for person)
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 17
person
ssn
name
age
employee
position
salary
dependent
gender listed on insurance
person(ssn, name, age, position, salary, gender) listed-on-insurance(emp-ssn, dep-ssn) where emp-ssn references person.ssn and dep-ssn references dep-ssn
Advantages:
1. no join needed
2. no redundant attributes
Disadvantages:
1. may have lots of nulls
2. slightly more difficult to
find just employee or just
dependent
Option 2: Use two tables (employee & dependent)
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 18
person
ssn
name
age
employee
position
salary
dependent
gender listed on insurance
employee( ssn, name, age, position, salary) dependent(ssn, name, age, gender, insurer) where insurer references employee.ssn This works if each dependent is on just one insurance.
Advantages:
1. fewer nulls
2. easy to get just
employee or just
dependent
Disadvantages:
1. Not possible to have a
person who is not an
employee or person
2. name and age are
stored redundantly
Option 3: Use three tables (person, employee, dependent)
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 19
person
ssn
name
age
employee
position
salary
dependent
gender listed on insurance
person(ssn, name, age) employee(ssn, position, salary) where ssn refs person.ssn dependent(ssn, gender, insurer) where ssn refs person.ssn where insurer references employee.ssn This works if each dependent is on just one insurance.
Advantages:
1. there can be persons
who are not employees
or dependents
2. no redundancy
3. no need to use nulls
4. matches the conceptual
model
Disadvantages:
1. joins are required to get
all of the attributes
20
Translation Steps: ER to Tables
• Create table and choose key for each entity; include single-valued attributes.
• Create table for each weak entity; include single-valued attributes. Include key of owner as a foreign key in the weak entity. Set key as foreign key of owner plus local, partial key.
• For each 1:1 relationship, add a foreign key to one of the entity sets involved in the relationship (a foreign key to the other entity in the relationship)*.
• For each 1:N relationship, add a foreign key to the entity set on the N-side of the relationship (to reference the entity set on the 1-side of the relationship)*.
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
21
Translation Steps: ER to Tables (cont.)
• For each M:N relationship set, create a new table. Include a foreign key for each participant entity set, in the relationship set. The key for the new table is the set of all such foreign keys.
• For each multi-valued attribute, construct a separate table. Repeat the key for the entity in this new table. It will serve as both the key for this table as well as a foreign key to the original table for the entity.
This algorithm from Elmasri & Navathe, Fundamentals of Database Systems
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
22
Files, Indexes, Query Optimization
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
23
Query Optimization … Overview
Which query plan is the fastest?
How many query plans are there?
How can we estimate the cost of a plan?
But wait, how are queries (query operators) implemented?
But wait, how are the files stored?
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
24
We’ll just introduce these ideas and we’ll start from bottom
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
Query Optimization
Relational Operator Algs.
Files and Access Methods
Buffer Management
Disk Space Management
DB
Relational Algebra Query Tree
Search for a cheap plan
Join algorithms, …
Heap, Index, …
Operating system level
Issues (may be handled by
DBMS or by O/S)
how a
disk works 1
2
3
4
25
Disk:105 to 106 times slower than memory (we’ll use in this 106 example)
Disk access time (all three costs together) is about 2 to 7 milliseconds
Memory access time: 50 to 70 nanoseconds
7 milliseconds vs. 70 nanoseconds therefore disk access is 100,000 times slower than memory access
Contrast 1 second (pick up a piece of paper) vs. 100,000 seconds (drive to SF and back – about 28 hours)
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
26
Cost of Accessing Data on Disk
Time to access (read/write) a disk block: seek time (moving arms to position disk head on track) rotational delay (waiting for block to rotate under head) transfer time (actually moving data to/from disk surface)
Key to lower I/O cost: reduce seek/rotation delays! (you have to wait for the transfer time, no matter what)
Query cost is often measured in the number of page I/Os – often simplified to assume each page I/O costs the same
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
27
Block (page) size vs. record size
Page –smallest unit of transfer supported by OS
Block – Multiple of page, smallest unit of transfer supported by an application or a disk drive.
Block and page are often used interchangeably.
“typical” record size … maybe a few hundred up to few thousand bytes
“typical” page size 4K, 8K
When would we choose block size to be larger?
When would we choose block size to be smaller?
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
28
Index for a File
An Index is a data structure that speeds up selections on the search key field(s)
An index starts with a search key k and gives you a data entry k*.
Given k*, you can get to the record(s) with the search key k in one I/O.
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
29
B+ Tree Indexes
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
Leaf pages contain data entries, and are chained (prev & next) Non-leaf pages have index entries; only used to direct searches:
P 0 K 1 P 1 K 2 P 2 K m P m
index entry
Non-leaf
Pages
Pages (Sorted by search key)
Leaf
30
Smith, 44, 3000
Ashby, 25, 3000
Bristow, 30, 2007
Basu, 33, 4003
Tracy, 44, 5004
Cass, 50, 5004
Daniels, 22, 6003
Jones, 40, 6003
Index is on the left; data is on the right
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
Search key is “Name”
Records
are sorted
by “Name”
in the file
Index Data File
Each page
contains 3
records.
Ashby
Basu
Cass
Bristow
Daniels
Jones
Tracy
Smith
31
Smith, 44, 3000
Ashby, 25, 3000
Bristow, 30, 2007
Basu, 33, 4003
Tracy, 44, 5004
Cass, 50, 5004
Daniels, 22, 6003
Jones, 40, 6003
This index is dense (one index entry for each data record) and clustered (data are sorted based on search key)
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
Search key is “Name”
Records
are sorted
by “Name”
in the file;
Clustered!
Index Data File
Each page
contains 3
records.
Ashby
Basu
Cass
Bristow
Daniels
Jones
Tracy
Smith
32
Smith, 44, 3000
Ashby, 25, 3000
Bristow, 30, 2007
Basu, 33, 4003
Tracy, 44, 5004
Cass, 50, 5004
Daniels, 22, 6003
Jones, 40, 6003
Ashby
Cass
Smith
This index is: sparse (one index entry per PAGE of data) and clustered (data is sorted on search key)
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
Search key is “Name”
Records
are sorted
by “Name”
in the file;
clustered!
Index Data File
Each page
contains 3
records.
33
Sparse, clustered index Good for range search
The data is sorted according to the search key. There is one entry for each page of data records.
Consider a phone book – with the heading on each page such as “Mcinroy – Mckee” or “Lowe – Lozano”. This tells us that all names that fall between Lowe and Lozano – will be on this page.
On disk, one I/O operation gets us a whole page of entries – sorted by name. (~350 names)
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
34
Smith, 44, 3000
Ashby, 25, 3000
Bristow, 30, 2007
Basu, 44, 4003
Tracy, 33, 5004
Cass, 50, 5004
Daniels, 22, 6003
Jones, 40, 6003
22
25
30
40
44
44
50
33
This index is unclustered (records NOT Sorted on Search Key)
and dense (one entry for every data record)
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
Search key is “Age”
Index Data File
35
Unclustered indexes
In an unclustered index (also called a secondary index) the underlying data is NOT sorted according to the search key.
Example: imagine building an index on a phone book based on phone number. You MUST put an entry in the index for every single person in the phone book.
Thus … an unclustered index MUST be dense; there is no other choice.
For a range search, we need one I/O for every data record!
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
36
I/Os using a dense, unclustered index during a range search
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
Non-leaf
Pages
Pages (Sorted by search key)
Leaf
Every data record = 1 I/O!
You may re-read some
pages!
Cost to scan data file
in sorted order =
M*N = no. of records
in the file.
37
Indexes
DBMSs often create a clustered index on all primary keys.
Note: primary keys are the values that must be used in foreign keys.
Only one clustered index per table! Why?
You need to decide whether you want additional (unclustered/secondary) indexes.
You need to decide if you want composite indexes.
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
38
Join Algorithms – an Introduction
R ⋈ S is very common! And R S followed by a selection is inefficient. So we process joins (rather than cross product) whenever possible. Lots of effort invested in join algorithms.
Assume: M pages in R, pR tuples per page, N pages in S, pS tuples per page.
In our examples, R is Reserves and S is Sailors.
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
SELECT * FROM Reserves R1, Sailors S1 WHERE R1.sid=S1.sid
Simple Nested Loops Join
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
Join on ith column of R and jth column of S
foreach tuple r in R do
foreach tuple s in S do
if ri == sj then add <r, s> to result
1. In the outer loop, read the first table – tuple-by-tuple.
2. For each tuple in the outer loop, then compare it to each tuple in the
second table … in the inner loop. This requires that we read the entire
table in the second loop – for each tuple in the outer loop.
Simple Nested Loops Join
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
2 ...
12 …
6 ...
1 …
5 …
27 …
Table 1
on disk
... 2
… 13
… 12
… 27
Table 2
on disk
… 1
… 5
Memory Buffers:
Simple Nested Loops Join
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
2 ...
12 …
6 ...
1 …
5 …
27 …
Table 1
on disk
... 2
… 13
… 12
… 27
Table 2
on disk
… 1
… 5
Memory Buffers:
2 ...
12 …
6 ...
... 2
… 13
Query Answer
2 … … 2
42
Simple Nested Loops Join
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
2 ...
12 …
6 ...
1 …
5 …
27 …
Table 1
on disk
... 2
… 13
… 12
… 27
Table 2
on disk
… 1
… 5
Memory Buffers:
2 ...
12 …
6 ...
... 2
… 13
Query Answer
2 … … 2
No match: Discard!
43
Simple Nested Loops Join
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
2 ...
12 …
6 ...
1 …
5 …
27 …
Table 1
on disk
... 2
… 13
… 12
… 27
Table 2
on disk
… 1
… 5
Memory Buffers:
2 ...
12 …
6 ...
Query Answer
2 … … 2
No match: Discard!
… 12
… 27
44
Simple Nested Loops Join
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
2 ...
12 …
6 ...
1 …
5 …
27 …
Table 1
on disk
... 2
… 13
… 12
… 27
Table 2
on disk
… 1
… 5
Memory Buffers:
2 ...
12 …
6 ...
Query Answer
2 … … 2
No match: Discard!
… 12
… 27
45
Simple Nested Loops Join
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
2 ...
12 …
6 ...
1 …
5 …
27 …
Table 1
on disk
... 2
… 13
… 12
… 27
Table 2
on disk
… 1
… 5
Memory Buffers:
2 ...
12 …
6 ...
Query Answer
2 … … 2
No match: Discard!
… 1
… 5
46
Simple Nested Loops Join
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
2 ...
12 …
6 ...
1 …
5 …
27 …
Table 1
on disk
... 2
… 13
… 12
… 27
Table 2
on disk
… 1
… 5
Memory Buffers:
2 ...
12 …
6 ...
Query Answer
2 … … 2
No match: Discard!
… 1
… 5
At this point, we have read the entire table in the inner loop.
So, we advance to the second tuple in the outer loop.
47
Simple Nested Loops Join
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
2 ...
12 …
6 ...
1 …
5 …
27 …
Table 1
on disk
... 2
… 13
… 12
… 27
Table 2
on disk
… 1
… 5
Memory Buffers:
2 ...
12 …
6 ...
... 2
… 13
Query Answer
2 … … 2
No match: Discard!
48
Simple Nested Loops Join
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
2 ...
12 …
6 ...
1 …
5 …
27 …
Table 1
on disk
... 2
… 13
… 12
… 27
Table 2
on disk
… 1
… 5
Memory Buffers:
2 ...
12 …
6 ...
... 2
… 13
Query Answer
2 … … 2
No match: Discard!
49
Simple Nested Loops Join
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
2 ...
12 …
6 ...
1 …
5 …
27 …
Table 1
on disk
... 2
… 13
… 12
… 27
Table 2
on disk
… 1
… 5
Memory Buffers:
2 ...
12 …
6 ...
Query Answer
2 … … 2
12 … … 12
Match!
… 12
… 27
And so forth …
Does this algorithm work for R1.sid < S1.sid? Does this algorithm work for cross product?
50
Cost of Simple Nested Loops Join
For each tuple in the outer relation R, we scan the entire inner relation S, tuple by tuple. Cost: M + (pR * M) * N = 1000 + 100*1000*500 I/Os
50,001,000 I/Os 500,010 seconds 6 days
CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission
Join on ith column of R and jth column of S
foreach tuple r in R do
foreach tuple s in S do
if ri == sj then add <r, s> to result
We assume approximately 100 I/Os per second
M = 1000 pages in R, pR = 100 tuples per page,
N = 500 pages in S, pS = 80 tuples per page.
Better: Page-Oriented Nested Loops Join
2 ...
12 …
6 ...
1 …
5 …
27 …
Table 1
on disk
... 2
… 13
… 12
… 27
Table 2
on disk
… 1
… 5
Memory Buffers:
2 ...
12 …
6 ...
... 2
… 13
Once we’ve got these two pages in memory, check every combination from one page to the other page!
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 51
52
Page-Oriented Nested Loops Join (cont.)
2 ...
12 …
6 ...
1 …
5 …
27 …
Table 1
on disk
... 2
… 13
… 12
… 27
Table 2
on disk
… 1
… 5
Memory Buffers:
2 ...
12 …
6 ...
Do the same thing…compare all
combinations in memory - between
these two pages!
… 12
… 27 This page is
still in memory. Get the
next page.
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
53
Cost of Page-oriented Nested Loops Join
For each page of R, get each page of S, write out matching pairs of tuples <r, s>. Cost: M + M*N = 1000 + 1000*500 = 501,000 (R outer) Cost: N + N*M = 500 + 500*1000 = 500,500 (S outer) Therefore, use smaller relation as outer relation. 500,000 I/Os 500 seconds = 8.3 minutes (assume 1000 I/Os/second)
Compare with simple nested loops: M + (pR * M) * N = 1000 + 100*1000*500 = 50,001,000
for each page of tuples r in R do
for each page of tuples s in S do (match all combinations in
memory)
if ri == sj then add <r, s> to result
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
54
Page-oriented Nested Loops Join
For each page of R, get each page of S, write out matching pairs of tuples <r, s>.
Cost: M + M*N = 1000 + 1000*500 = 501,000 (R outer) Cost: N + N*M = 500 + 500*1000 = 500,500 (S outer) Therefore – typically use smaller relation as outer relation. 500,000 I/Os 8.3 minutes
for each page of tuples r in R do
for each page of tuples s in S do (match all combinations in
memory)
if ri == sj then add <r, s> to result
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
55
The best loops-based join algorithm: Block Nested-Loops Join (use a block of buffers)
Algorithm: One page is assigned to be the output buffer (not shown on this slide)
One page assigned to input from S, B-2 pages assigned to input from R Until all of R has been read {
Read in B-2 pages of R
For each page in S {
Read in the single S page
Check pairs of tuples in memory and output if they match } }
Cost: M + (M/(B-2))*N. For B=35, cost is 1000 + 1000*500/33 = 16,000 I/Os 16 seconds
2 ...
12 …
6 ...
1 …
5 …
27 …
R on disk ... 2
… 13
… 12
… 27
S on disk
… 1
… 5
B pages of Memory Buffer
1 …
5 …
27 …
2 ...
12 …
6 ...
... 2
… 13
…
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
56
Comparing the nested loops join algorithms
Algorithm Number of times to read inner table
Cost formula (with example M = 1000, pr = 100, b = 52, N = 500 (we don’t use ps)
simple nested loops join
Once for each row in outer table
M + (M*pr) * N 1000 + (1000*100) * 500 1000 + 100,000 * 500 1000 + 50,000,000
page-oriented nested loops join
Once for each PAGE of rows in outer table
M + M*N 1000 + 1000 * 500 1000 + 500,000
block nested loops join
Once for each b-2 pages of rows in outer table
M + (M/(b-2) * N 1000 + (1000/50) * 500 1000 + 20 * 500 1000 + 10,000
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
57
A few comments about costs
These cost formulas are much simpler that real formulas:
Real formulas take other costs into account (including CPU time)
Real systems use buffers and caches to assist with I/Os. (So what we estimate as independent I/Os aren’t necessarily actually I/Os; they might be “reading” data from memory or from a disk cache.)
Note: you can’t control/know what happens … e.g., when you read data from a table or query answer using a cursor. (The OS/DBMS handles these details.)
These examples are tiny! These tables would fit in memory.
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
58
Index Nested Loops Join
If there is an index on the search key sj then can use the index on the inner table - get matching tuples!
Cost: M + ( (M*pR) * cost of finding matching S tuples)
M + ((M*pR) * (I/Os to find index + 1 to get the data)
= 500 + (500*80*4) = 160,500 1/2 hour (Reserves as inner)
= 1000 + (1000*100*3) = 301,000 1 hour (Sailor as inner)
foreach tuple r in R do foreach tuple s in S where ri == sj
Use the Index to find s do add <r, s> to result
For each R tuple, cost of probing S index is about 2-4 for B+ tree.
80,500 (Reserves as inner) 15 min.; 151,00 (Sailor as inner) 30 min.
These could be smaller – if top levels of B+ tree are in memory.
Could be 0… if entire index is in memory.
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
59
Index nested loops join
Advantage: access exactly the right records in the inner loop (for equi-join).
You can use a clustered or an unclustered index for an index nested loops join.
Disadvantage: (necessarily) you have one I/O per row (in the table in the outer loop) rather than one I/O per page (of rows in the outer loop).
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
60
Hash Join
Simple case – the smallest table, say S, fits in main memory Build an in-memory hash index for S
Proceed as for index nested-loops join
Harder case – neither R nor S fits in memory We won’t cover this case in the class. Divide them both in the same way (1 pass) so that each
partition of S fits in memory
Do in-memory matching on each pair of matching partitions
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
Hash Join (simple case; small table fits in memory)
…
1. Blocks of rows of R R fits in main memory
2. For each row of R, compute h(joinval)
Bucket 1
Bucket 2
Bucket n
…
First: read and process R
Then: process S against the buckets
…
Blocks of rows of S 1. Read one page after another until finished.
2. For each row of S, compute h(joinval)
3. Then join this row from S with all matching rows from R; they are all in Bucket n. Do all rows in Bucket n join with this row from S?
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 61
3. Then put row in the proper bucket
62
Algorithms for other operators
Table scan
Index retrieval (select operator)
Index-only scan (project operator)
What might a simple algorithm be for eliminating duplicates (e.g., for DISTINCT or for UNIOIN)?
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
63
Query Optimization
Translate SQL query into a query tree (operators: relational algebra plus a few other ones)
Generate other, equivalent query trees (e.g., using relational algebra equivalences)
For each possible query tree:
select an algorithm for each operator (producing a query plan)
estimate the cost of the plan
Choose the plan with lowest estimated cost - of the plans considered (which is not necessarily all possible plans)
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
Initial Query Tree - Equivalent to SQL (without any algorithms selected)
SELECT S.sname
FROM Reserves R, Sailors S
WHERE R.sid = S.sid AND
R.bid = 100 AND
S.rating > 5;
Reserves Sailors
sid=sid
bid=100 rating > 5
sname
Relational Algebra Tree: SQL Query:
65
Relational Algebra Equivalence: Cascade of (and Uncascade of) Selects
c1… cn(R) c1( … cn(R))
This symbol means equivalence.
So you can replace c1( … cn(R)) with c1… cn(R)
And you can replace c1… cn(R) with c1( … cn(R))
If you have several conditions connected by “AND” in a select operator, then you can apply them one at a time.
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
Example: c1… cn(R) c1( … cn(R))
SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid = S.sid AND R.bid = 100 AND S.rating > 5;
Reserves Sailors
sid=sid
rating > 5
sname
Reserves Sailors
sid=sid
bid=100 rating > 5
sname bid=100
Example: c(R⋈S) c(R)⋈S
SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid = S.sid AND R.bid = 100 AND S.rating > 5;
Reserves Sailors
sid=sid
rating > 5
sname
bid=100
This applies only to
the Sailors table!
Reserves
Sailors
sid=sid
rating > 5
sname
bid=100
68
Example: c(R⋈S) c(R)⋈S (cont.)
SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid = S.sid AND R.bid = 100 AND S.rating > 5;
Reserves
Sailors
sid=sid
rating > 5
sname
bid=100
What are the advantages of “pushing” a
select past a join operator?
What are the disadvantages of “pushing” a
select past a join operator?
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
69
Relational Algebra Equivalences
Selections: c1… cn(R) c1( … cn(R)) Selects Cascade
c1(c2(R)) c2(c1(R)) Selects Commute
Projections: a (R) a (a1 (… an(R))) If each ai contains a. Only last project matters
Joins: R⋈(S⋈T) (R⋈S)⋈T Joins are Associative
R⋈S S⋈R Joins Commute
Try to prove that: R⋈(S⋈T) (T⋈R)⋈S
Try it out with: is agent ⋈ (languagerel ⋈ language) equivalent to:
(language⋈ agent) ⋈ languagerel
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
Original Query Tree (without any algorithms selected)
SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid = S.sid AND R.bid = 100 AND S.rating > 5;
Reserves Sailors
sid=sid
bid=100 rating > 5
sname
RA Tree: SQL Query:
A Plan for the Original Query Tree: How shall we run the query plan?
Reserves Sailors
sid=sid
bid=100 rating > 5
sname
One way to execute this query
is to perform each operator (starting
from the bottom) and always write
the intermediate results out to disk.
We could choose a join algorithm
and then do everything else with
a sequential table scan. nested Loops join
(write temp file to disk)
seq. scan
(write temp file to disk)
seq. scan
But wait … select and project operators operate on just one row at a time
Reserves Sailors
sid=sid
bid=100 rating > 5
sname
We can do a select (to filter out
unwanted rows) and we can do a
project (to drop columns) while
the row is in memory.
Some previous operator must
first read the rows into memory;
then … we can do select
and project “on the fly” which
means “in memory.”
Nested
Loops join
(write temp file to disk)
seq. scan
(write temp file to disk)
seq. scan
On-the-fly
On-the-fly
“On the fly” is “free”
On the fly means that we
evaluate the operator in
memory - while we have the
tuple available.
“On the fly” induces no I/O cost!
Reserves Sailors
sid=sid
bid=100 rating > 5
sname
(Nested Loops Join)
(On-the-fly)
(On-the-fly)
Plan:
SELECT S.sname
FROM Reserves R, Sailors S
WHERE R.sid = S.sid AND
R.bid = 100 AND S.rating > 5;
74
Limitations of “On the fly”
Can only happen if:
Computation can be done entirely on tuples in memory
Results do not need to be materialized
An earlier operator has read the table
Cannot be used for first table scan!
Cannot apply to all operations such as join or aggregates, etc.
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
Cost of plan 1 no index (Sailors – inner loop)
M = # of pages in outer table
N = # of pages in inner table
Cost of page-oriented nested loops join is:
M + M * N
1000 + 1000 * 500 = 501,000
And the “on-the-fly” operations have no I/O - so plan cost is 501,000
SELECT S.sname
FROM Reserves R, Sailors S
WHERE R.sid = S.sid AND
R.bid = 100 AND S.rating > 5;
Reserves Sailors
sid=sid
sname
(On-the-fly)
(On-the-fly)
Plan:
(Nested Loops Join)
bid=100 rating > 5
76
Create other plans
Use relational algebra equivalences to produce new query trees. Advantage: you are sure that the new tree is equivalent
to the original, because equivalences have proofs
Assign different algorithms to operators
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
Cost of plan 2 (Reserves on inner loop)
SELECT S.sname
FROM Reserves R, Sailors S
WHERE R.sid = S.sid AND
R.bid = 100 AND S.rating > 5;
Sailors as the outer relation rather than Reserves.
M = # of pages in outer table
N = # of pages in inner table
Cost of page-oriented nested loops join is:
500 + 500 * 1000 = 500,500
And the “on-the-fly” operations have no I/O - so plan cost is 500,500
Reserves Sailors
sid=sid
bid=100 rating > 5
sname
(On-the-fly)
(On-the-fly)
Plan:
(Page-Oriented Nested Loops Join)
⋈
Cost of plan 3 Index nested loops
SELECT S.sname
FROM Reserves R, Sailors S
WHERE R.sid = S.sid AND
R.bid = 100 AND S.rating > 5;
What is the cost of the plan shown?
M + M * pr * (# of index I/Os + 1)
1000 + (1000*100*(3+1)) if 3 index I/Os or
1000 + (1000 * 100 * (0 + 1)) if 0 index I/Ox
Thus …
1000 + 300,000 = 301,000 or
1000 + 100,000 = 101,000
Sailors Reserves
sid=sid
bid=100 rating > 5
sname
(On-the-fly)
(On-the-fly)
Plan:
(Index nested loops)
⋈
⋈
Cost of plan 4 Push down selects
Reserves Sailors
sid=sid
sname (On-the-fly)
Plan:
SELECT S.sname
FROM Reserves R, Sailors S
WHERE R.sid = S.sid AND
R.bid = 100 AND S.rating > 5;
(Page-Oriented Nested Loops Join)
Apply this equivalence:
c(R⋈S) c(R)⋈S
To the previous query tree to get an equivalent query tree.
What is the cost of the plan shown?
Scanning sailors and reserves cost M+N I/Os. But here we read intermediate files.
What about the cost of the join? It depends on how many reservations there are for boat 100 and how many sailors have a rating >5.
How would you find this information?
Statistics will help.
rating > 5 bid=100
(On-the-fly)
(Table scan) (Table scan)
(On-the-fly)
80
To estimate cost, we need table sizes
For all operators beyond the leaf level of the query plan, the input tables are the result of some earlier query.
Thus, we need to estimate the size of intermediate results! (This can be difficult. This is one reason why the cost estimates may not be very good. Estimation errors tend to compound.)
For example, what information would you need to estimate how many reservations are for bid 100 and how many sailors have a rating >5?
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
81
DBMS Usually Maintains Some Statistics in the DB Catalog
Catalogs typically contain at least: # tuples and # pages for each table.
# distinct key values and # pages for each index.
Index height, low/high key values for each tree index.
Catalogs are updated periodically - say, once a week or once a month. Perhaps they’re updated during the backup.
Simplest case: assume that all attribute values are uniformly distributed. Thus if gender was an attribute, the optimizer would
assume that half of the rows have the male value and other half have the female value. (This might be grossly inaccurate.)
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
82
Calculating Selectivities
Assume that rating values range from 1 to 10, and that bid values range from 1 to 100.
What percentage of the incoming tuples, to the operator
bid=100 , will be output?
What about rating > 5 ?
bid=100 rating > 5 ?
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
83
Doing better than a uniform distribution
The DBMS might gather more detailed information about how the values of attributes are distributed (e.g., histograms of the values in a field) and store it in the catalog. Suppose there was an attribute degree-program with three possible values: “BS CS” “MS CS” “PhD CS” Then the DBMS might count the values and know that there are 428
“BS CS” values, 98 “MS CS” values and 25 “PhD CS” values.
This allows much better estimate of the reduction factor.
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
84
Independence of Reduction Factors So far, we have gathered statistics and shown how to use them,
assuming uniform distributions.
But what about a select operator with terms separated by AND? Example: bid=100 rating > 5
We assume that all terms are independent!
Thus, if one attribute is class and the other is number-of-hours - the query optimizer might assume that class is uniformly distributed over {Fresh, Soph, Jun, Sen} and that number-of-hours is uniformly distributed over {0, 1, …, 205} But, we know that class correlates with number-of-hours!
Might even be that number-of-hours class.
What percentage of the incoming tuples, to the operator bid=100 rating > 5 , will be output?
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
85
Enumerating Plans for Multiple Joins
Back to the problem of generating plans.
Are we trying to generate as many plans as possible? No, best to generate few, as long as cheap plans are among them.
In System R: only left-deep join trees are considered.
B A
C
D
B A
C
D
C D B A
This one is left-deep - the other two are not.
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
86
Queries Over Multiple Relations (Joins)
Left-deep trees allow us to generate all fully pipelined plans.
Intermediate results not written to temporary files.
Not all left-deep trees are fully pipelined (e.g., SM join).
Using only left-deep plans (obviously) restricts the search space. (So optimizer may not find the optimal plan.)
B A
C
D
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
Nested Queries
Nested block is optimized independently, with the outer tuple considered as providing a selection condition.
Outer block is optimized with the cost of ‘calling’ the nested block computation taken into account.
Implicit ordering of these blocks means that some good strategies are not considered.
The non-nested version of the query is typically optimized better. The optimizer might not find it from the nested version, so you may need to explicitly unnest the query.
SELECT S.sname
FROM Sailors S WHERE EXISTS
(SELECT * FROM Reserves R
WHERE R.bid=103
AND R.sid=S.sid)
Nested block to optimize:
SELECT *
FROM Reserves R
WHERE R.bid=103
AND S.sid= outer value
Equivalent non-nested query:
SELECT S.sname
FROM Sailors S, Reserves R
WHERE S.sid=R.sid
AND R.bid=103
88
Summary – Algorithms for Relational Algebra Operators
A virtue of relational DBMSs: queries are composed of a few basic operators; the implementation of these operators can be carefully tuned (and it is important to do this!).
Many alternative implementation techniques for each operator; no universally superior technique for most operators.
Must consider available alternatives for each operation in a query and choose best one based on system statistics, etc. This is part of the broader task of optimizing a query composed of several ops.
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
89
Query Optimizers Don’t (Always) Find the Best Plan
There are usually more plans than the optimizer can consider, even if only left deep plans are considered.
The optimizer might not even try to generate all possible plans (it won’t be able to consider all of them anyway).
Sometimes the optimizer will compare the optimization cost to the estimated execution cost and quit early.
The optimizer chooses the plan with the lowest ESTIMATED cost. Actual costs may differ.
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
90
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
91
Physical DB Design
Now that we know how query optimizers work…
How do we choose the file organizations and indices?
How do we decide whether to modify the schema
for our database?
This is called “physical design” of a database.
This is sort of like query optimization - backwards.
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
91
92
Some Important Physical Database Design Issues
Mapping tables to physical storage/file types. (We won’t talk about this part because we haven’t discussed physical storage in a DBMS.)
Choosing the indexes
Modifying the conceptual schema
Denormalization
Horizontal/vertical decomposition
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
92
93
We Need to Understand the Workload to Choose Indices
For each query in the workload: Which relations does it access?
Which attributes are retrieved?
Which attributes are involved in selection/join conditions? How selective are these conditions likely to be? (That is, how many matching records will there be?
For each update in the workload: Which attributes are involved in selection/join conditions?
How selective are these conditions likely to be?
What type of updates (i.e., INSERT/DELETE/UPDATE) are prominent, and which tables and attributes are affected?
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
93
94
Tips for index selection
Don’t use indexes on small tables (< 200 rows)
Don’t use indexes on columns with few values (T/F, gender, state)
For most systems, indexes on primary keys and foreign keys are sufficient
Don’t forget to add indexes when the schema changes!
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
94
Vertical decomposition: split one table into two – choose which attributes go into each table
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 95
Course c# cname instructor room days
Course2 c# cname
Course1 c# room days
Course3 c# instructor
Take this table:
And replace it with these three tables:
Advantage: if queries use only a subset of the attributes, smaller
rows (fewer attributes) means faster access.
Disadvantage: if queries need all attributes, then you have to join.
Horizontal partition/decomposition (split data into two identical tables – with two different names)
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 96
Course c# cname instructor room days
Undergraduate-Course c# cname instructor room days
Graduate-Course c# cname instructor room days
97
Use a view to hide changes
A query can use any of the three.
Updates will need to know which table to use.
CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012
97
CREATE VIEW Courses(cid, sid, jid, did, pid, qty, val) AS SELECT * FROM Undergraduate-courses UNION
SELECT * FROM Graduate-courses
Undergraduate-Course (c#, cname, instructor, room, days) Graduate-Course (c#, cname, instructor, room, days)
98
CS386/586 Introduction to Database Systems, © Lois Delcambre 1999-2010 Slide 27 Some slides
adapted from R. Ramakrishnan, with permission Lecture 4
Views
A view is a query, with a name, that is stored in the database.
Example view definition:
CREATE VIEW gstudents AS SELECT S.* FROM student S WHERE s.gpa >= 2.5
Views can be used like base tables, in a query or in any other view. Views are like macros.
99
CS386/586 Introduction to Database Systems, © Lois Delcambre 1999-2010 Slide 27 Some slides
adapted from R. Ramakrishnan, with permission Lecture 4
Views can be used to make queries simpler
Completed(StudID, Course)
CREATE VIEW gstudents AS SELECT S.* FROM student S WHERE s.gpa >= 2.5
Queries about “good” students only are easier to write
SELECT S.name, S.phone
FROM gstudent S NATURAL JOIN completed C
WHERE C.course = ‘CS386’;
100
How are views implemented?
When you enter a query that mentions a view in the from clause, the DBMS expands/rewrites your query to include the view definition.
SELECT S.name, S.phone
FROM gstudent S NATURAL JOIN completed C
WHERE C.course = ‘CS386’;
is rewritten as:
SELECT S.name, S.phone
FROM (SELECT S.* FROM student S WHERE s.gpa >= 2.5) AS S NATURAL JOIN completed C
WHERE C.course = ‘CS386’;
CS386/586 Introduction to Database Systems, © Lois Delcambre 1999-2010 Slide 27
Some slides adapted from R. Ramakrishnan, with permission Lecture 4
101
CS386/586 Introduction to Database Systems, © Lois Delcambre 1999-2010 Slide 27 Some slides
adapted from R. Ramakrishnan, with permission Lecture 4
Views used for Security
This views presents a table (called sstudent) that is the student relation without the gpa field.
CREATE VIEW sstudent AS SELECT studID, name, address FROM student
Can you think of some other security examples?
102
CS386/586 Introduction to Database Systems, © Lois Delcambre 1999-2010 Slide 27 Some slides
adapted from R. Ramakrishnan, with permission Lecture 4
Views used to support extensibility
A company’s database includes a relation: Part (PartID: Char(4), weight: real,…)
Weight is stored in pounds
Company is purchased by a firm that uses metric weights
Databases must be integrated and the weight must be in metric units
But there’s much old software using pounds.
Solution: views!
103
CS386/586 Introduction to Database Systems, © Lois Delcambre 1999-2010 Slide 27 Some slides
adapted from R. Ramakrishnan, with permission Lecture 4
Views to support extensibility (cont.)
Solution:
1. Define a new base table with kilograms, NewPart, for the integrated company.
2. CREATE VIEW Part AS SELECT PartID, 2.2046*weight, …(no other changes)… FROM NewPart
3. Old programs still call the table “Part”; now they are referencing a view; before they references the table.
104
CS386/586 Introduction to Database Systems, © Lois Delcambre 1999-2010 Slide 27 Some slides
adapted from R. Ramakrishnan, with permission Lecture 4
Data Independence
A database management system supports data independence if application programs are immune to changes in the conceptual and physical schemas.
Why is this important? Because everything can change.
How does the relational model achieve logical (conceptual) data independence?
105
CS386/586 Introduction to Database Systems, © Lois Delcambre 1999-2010 Slide 27 Some slides
adapted from R. Ramakrishnan, with permission Lecture 4
Levels of Abstraction – support Data Independence
Physical Schema
Conceptual Schema
ES 1 ES 2 ES 3
Physical storage; DBA
Logical storage; data
designer
External view; user and
data designer
Logical data indepence
Physical data indepence
106
CS386/586 Introduction to Database Systems, © Lois Delcambre 1999-2010 Slide 27 Some slides
adapted from R. Ramakrishnan, with permission Lecture 4
Levels of Abstraction in a relational DBMS
Physical Schema
Conceptual Schema
ES 1 ES 2 ES 3
Physical schema: the
tables – after they are modified
by the DBA, plus indexes, files,
etc.
Conceptual schema: ERDs
Logical schema: tables
External schemas are Views
107
CS386/586 Introduction to Database Systems, © Lois Delcambre 1999-2010 Slide 27 Some slides
adapted from R. Ramakrishnan, with permission Lecture 4
A few Comments on Views
View versus temporary table
What is the difference?
Which is better?
Some DBMS’ offer a way around the view update problem
Triggers
Rules
Active research
108
CS386/586 Introduction to Database Systems, © Lois Delcambre 1999-2010 Slide 27 Some slides
adapted from R. Ramakrishnan, with permission Lecture 4
Summary
A view is a stored query definition; rows are not computed until they are needed.
Views can be very useful
Easier query writing, security, extensibility
Views cannot always be unambiguously updated
Three levels of abstraction in a DBMS supports data independence: logical and physical