108
1 Converting ER to Relational Schema CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 1 Project P-number P-name Due-Date Employee SSN E-Name Office 0..* Assignment 0..* 0..* Manager 1..1

Converting ER to Relational Schema

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Converting ER to Relational Schema

1

Converting ER to Relational Schema

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

1

Project P-number P-name Due-Date

Employee SSN E-Name Office

0..* Assignment 0..*

0..* Manager 1..1

Page 2: Converting ER to Relational Schema

2

Translate each entity into a table, with keys.

Entity :

can be represented as a table in the relational model

has a key … which becomes a key for the table

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

CREATE TABLE Employee (SSN CHAR(11) NOT NULL,

E-Name CHAR(20),

Office INTEGER,

PRIMARY KEY (SSN))

Employee

SSN E-Name

Office

Page 3: Converting ER to Relational Schema

3

Translate each many-to-many relationship set into a table

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

3

What are the attributes and

what is the key for Assignment?

Project(P-number, P-name, Due-Date) Employee(SSN, E-Name, Office)

Project P-number P-name Due-Date

Employee SSN E-Name Office

0..* Assignment 0..*

0..* Manager 1..1

Page 4: Converting ER to Relational Schema

4

Answer: Assignment(P-Number, SSN)

P-Number is a foreign key for Project SSN is a foreign key for Employee

Project(P-Number, P-Due-Date)

Employee(SSN, E-Name, Office)

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Project P-number P-name Due-Date

Employee SSN E-Name Office

0..* Assignment 0..*

0..* Manager 1..1

Page 5: Converting ER to Relational Schema

What do we do with a one-to-many relationship?

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 5

For example, what do we do with Manager?

Project(P-number, P-name, Due-Date)

Employee(SSN, E-Name, Office)

Project P-number P-name Due-Date

Employee SSN E-Name Office

0..* Assignment 0..*

0..* Manager 1..1

Page 6: Converting ER to Relational Schema

Create a foreign key for a 1-to-many relationship.

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 6

Project(P-number, P-name, Due-Date, Manager) Employee(SSN, E-Name, Office)

Manager is a foreign key (referencing the Employee relation)

value of Manager must match an SSN in Employee

Project P-number P-name Due-Date

Employee SSN E-Name Office

0..* Assignment 0..*

0..* Manager 1..1

Page 7: Converting ER to Relational Schema

Or...Create a table for a 1-many relationship.

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 7

Project(P-number, P-name, Due-Date, Manager) Employee(SSN, E-Name, Office)

vs.

Project(P-number, P-name, Due-Date) Employee(SSN, E-Name, Office)

Manager(P-number, SSN) What are the tradeoffs between these two?

Note:

P-number

is the key

for Manager

Project P-number P-name Due-Date

Employee SSN E-Name Office

0..* Assignment 0..*

0..* Manager 1..1

Page 8: Converting ER to Relational Schema

Project (P-number, P-name, Due-Date) Employee (SSN, E-Name, Office, Managed-project)

vs. Project (P-number, P-name, Due-Date)

Employee (SSN, E-Name, Office) Manager (P-number, SSN)

What if SSN is the key for Manager?

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 8

Project P-number P-name Due-Date

Employee SSN E-Name Office

0..* Assignment 0..*

0..1 Manager 0..*

Page 9: Converting ER to Relational Schema

Add attributes to the table for the relationship

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 9

Assignment(P-number, SSN, role, start-date, end-date) Project(P-number, P-name, Due-Date) Employee(SSN, E-Name, Office)

role start-date end-date

Project P-number P-name Due-Date

Employee SSN E-Name Office

0..* Assignment 0..*

0..* Manager 1..1

Page 10: Converting ER to Relational Schema

What if a 1-to-many relationship has an attribute?

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 10

Project(P-number, P-name, Due-Date, Manager) Employee(SSN, E-Name, Office)

Employee SSN E-Name Office

Manager

start-date end-date

Project P-number P-name Due-Date

Page 11: Converting ER to Relational Schema

Add attributes to the table for the relationship?

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 11

Project(P-number, P-name, Due-Date, Manager, start-date, end-date) Employee(SSN, E-Name, Office)

Employee SSN E-Name Office

Manager

start-date end-date

Project P-number P-name Due-Date

Page 12: Converting ER to Relational Schema

Add attributes to the table for the relationship?

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 12

Project(P-number, P-name, Due-Date, Manages(P-number, SSN, start-date, end-date) Employee(SSN, E-Name, Office)

Employee SSN E-Name Office

Manager

start-date end-date

Project P-number P-name Due-Date

No, bad idea. You should use an extra table for the relationship.

Notice the key for the Manages table.

Page 13: Converting ER to Relational Schema

13

Minimum cardinality of 1 in SQL We can require any table to be in a binary relationship using a foreign key which is required to be NOT NULL (but little else without resorting to CHECK constraints)

CREATE TABLE Department ( d-code INTEGER,

d-name CHAR(20), manager-ssn CHAR(9) NOT NULL, since DATE, PRIMARY KEY (d-code), FOREIGN KEY (manager-ssn) REFERENCES Employee, ON DELETE NO ACTION)

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 14: Converting ER to Relational Schema

14

Weak Entity Sets

Policy

dep-name

cost

Insures

strong

entity

set

identifying

relationship

set

weak

entity

set

Employee

SSN

name

office

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 15: Converting ER to Relational Schema

Translating Weak Entity Sets

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Weak entity sets and identifying relationship sets are translated into a single table. Must include key of strong entity set, as a foreign key.

When the owner entity is deleted, all owned weak entities must also be deleted.

CREATE TABLE Insurance_Policy ( dep-name CHAR(20),

cost REAL, ssn CHAR(11) NOT NULL, PRIMARY KEY (dep-name, ssn), FOREIGN KEY (ssn) REFERENCES Employee, ON DELETE CASCADE)

Page 16: Converting ER to Relational Schema

Translating entities involved in ISA

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 16

person

ssn

name

age

employee

position

salary

dependent

gender listed on insurance

You have three choices:

1. One table: person with all

attributes

2. Two tables: employee

and dependent with all

attributes from person

copied down

3. Three tables: one for

person, employee, and

dependent

Page 17: Converting ER to Relational Schema

Option 1: Use one table (for person)

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 17

person

ssn

name

age

employee

position

salary

dependent

gender listed on insurance

person(ssn, name, age, position, salary, gender) listed-on-insurance(emp-ssn, dep-ssn) where emp-ssn references person.ssn and dep-ssn references dep-ssn

Advantages:

1. no join needed

2. no redundant attributes

Disadvantages:

1. may have lots of nulls

2. slightly more difficult to

find just employee or just

dependent

Page 18: Converting ER to Relational Schema

Option 2: Use two tables (employee & dependent)

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 18

person

ssn

name

age

employee

position

salary

dependent

gender listed on insurance

employee( ssn, name, age, position, salary) dependent(ssn, name, age, gender, insurer) where insurer references employee.ssn This works if each dependent is on just one insurance.

Advantages:

1. fewer nulls

2. easy to get just

employee or just

dependent

Disadvantages:

1. Not possible to have a

person who is not an

employee or person

2. name and age are

stored redundantly

Page 19: Converting ER to Relational Schema

Option 3: Use three tables (person, employee, dependent)

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 19

person

ssn

name

age

employee

position

salary

dependent

gender listed on insurance

person(ssn, name, age) employee(ssn, position, salary) where ssn refs person.ssn dependent(ssn, gender, insurer) where ssn refs person.ssn where insurer references employee.ssn This works if each dependent is on just one insurance.

Advantages:

1. there can be persons

who are not employees

or dependents

2. no redundancy

3. no need to use nulls

4. matches the conceptual

model

Disadvantages:

1. joins are required to get

all of the attributes

Page 20: Converting ER to Relational Schema

20

Translation Steps: ER to Tables

• Create table and choose key for each entity; include single-valued attributes.

• Create table for each weak entity; include single-valued attributes. Include key of owner as a foreign key in the weak entity. Set key as foreign key of owner plus local, partial key.

• For each 1:1 relationship, add a foreign key to one of the entity sets involved in the relationship (a foreign key to the other entity in the relationship)*.

• For each 1:N relationship, add a foreign key to the entity set on the N-side of the relationship (to reference the entity set on the 1-side of the relationship)*.

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 21: Converting ER to Relational Schema

21

Translation Steps: ER to Tables (cont.)

• For each M:N relationship set, create a new table. Include a foreign key for each participant entity set, in the relationship set. The key for the new table is the set of all such foreign keys.

• For each multi-valued attribute, construct a separate table. Repeat the key for the entity in this new table. It will serve as both the key for this table as well as a foreign key to the original table for the entity.

This algorithm from Elmasri & Navathe, Fundamentals of Database Systems

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 22: Converting ER to Relational Schema

22

Files, Indexes, Query Optimization

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 23: Converting ER to Relational Schema

23

Query Optimization … Overview

Which query plan is the fastest?

How many query plans are there?

How can we estimate the cost of a plan?

But wait, how are queries (query operators) implemented?

But wait, how are the files stored?

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

Page 24: Converting ER to Relational Schema

24

We’ll just introduce these ideas and we’ll start from bottom

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

Query Optimization

Relational Operator Algs.

Files and Access Methods

Buffer Management

Disk Space Management

DB

Relational Algebra Query Tree

Search for a cheap plan

Join algorithms, …

Heap, Index, …

Operating system level

Issues (may be handled by

DBMS or by O/S)

how a

disk works 1

2

3

4

Page 25: Converting ER to Relational Schema

25

Disk:105 to 106 times slower than memory (we’ll use in this 106 example)

Disk access time (all three costs together) is about 2 to 7 milliseconds

Memory access time: 50 to 70 nanoseconds

7 milliseconds vs. 70 nanoseconds therefore disk access is 100,000 times slower than memory access

Contrast 1 second (pick up a piece of paper) vs. 100,000 seconds (drive to SF and back – about 28 hours)

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

Page 26: Converting ER to Relational Schema

26

Cost of Accessing Data on Disk

Time to access (read/write) a disk block: seek time (moving arms to position disk head on track) rotational delay (waiting for block to rotate under head) transfer time (actually moving data to/from disk surface)

Key to lower I/O cost: reduce seek/rotation delays! (you have to wait for the transfer time, no matter what)

Query cost is often measured in the number of page I/Os – often simplified to assume each page I/O costs the same

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

Page 27: Converting ER to Relational Schema

27

Block (page) size vs. record size

Page –smallest unit of transfer supported by OS

Block – Multiple of page, smallest unit of transfer supported by an application or a disk drive.

Block and page are often used interchangeably.

“typical” record size … maybe a few hundred up to few thousand bytes

“typical” page size 4K, 8K

When would we choose block size to be larger?

When would we choose block size to be smaller?

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

Page 28: Converting ER to Relational Schema

28

Index for a File

An Index is a data structure that speeds up selections on the search key field(s)

An index starts with a search key k and gives you a data entry k*.

Given k*, you can get to the record(s) with the search key k in one I/O.

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

Page 29: Converting ER to Relational Schema

29

B+ Tree Indexes

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

Leaf pages contain data entries, and are chained (prev & next) Non-leaf pages have index entries; only used to direct searches:

P 0 K 1 P 1 K 2 P 2 K m P m

index entry

Non-leaf

Pages

Pages (Sorted by search key)

Leaf

Page 30: Converting ER to Relational Schema

30

Smith, 44, 3000

Ashby, 25, 3000

Bristow, 30, 2007

Basu, 33, 4003

Tracy, 44, 5004

Cass, 50, 5004

Daniels, 22, 6003

Jones, 40, 6003

Index is on the left; data is on the right

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

Search key is “Name”

Records

are sorted

by “Name”

in the file

Index Data File

Each page

contains 3

records.

Ashby

Basu

Cass

Bristow

Daniels

Jones

Tracy

Smith

Page 31: Converting ER to Relational Schema

31

Smith, 44, 3000

Ashby, 25, 3000

Bristow, 30, 2007

Basu, 33, 4003

Tracy, 44, 5004

Cass, 50, 5004

Daniels, 22, 6003

Jones, 40, 6003

This index is dense (one index entry for each data record) and clustered (data are sorted based on search key)

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

Search key is “Name”

Records

are sorted

by “Name”

in the file;

Clustered!

Index Data File

Each page

contains 3

records.

Ashby

Basu

Cass

Bristow

Daniels

Jones

Tracy

Smith

Page 32: Converting ER to Relational Schema

32

Smith, 44, 3000

Ashby, 25, 3000

Bristow, 30, 2007

Basu, 33, 4003

Tracy, 44, 5004

Cass, 50, 5004

Daniels, 22, 6003

Jones, 40, 6003

Ashby

Cass

Smith

This index is: sparse (one index entry per PAGE of data) and clustered (data is sorted on search key)

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

Search key is “Name”

Records

are sorted

by “Name”

in the file;

clustered!

Index Data File

Each page

contains 3

records.

Page 33: Converting ER to Relational Schema

33

Sparse, clustered index Good for range search

The data is sorted according to the search key. There is one entry for each page of data records.

Consider a phone book – with the heading on each page such as “Mcinroy – Mckee” or “Lowe – Lozano”. This tells us that all names that fall between Lowe and Lozano – will be on this page.

On disk, one I/O operation gets us a whole page of entries – sorted by name. (~350 names)

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

Page 34: Converting ER to Relational Schema

34

Smith, 44, 3000

Ashby, 25, 3000

Bristow, 30, 2007

Basu, 44, 4003

Tracy, 33, 5004

Cass, 50, 5004

Daniels, 22, 6003

Jones, 40, 6003

22

25

30

40

44

44

50

33

This index is unclustered (records NOT Sorted on Search Key)

and dense (one entry for every data record)

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

Search key is “Age”

Index Data File

Page 35: Converting ER to Relational Schema

35

Unclustered indexes

In an unclustered index (also called a secondary index) the underlying data is NOT sorted according to the search key.

Example: imagine building an index on a phone book based on phone number. You MUST put an entry in the index for every single person in the phone book.

Thus … an unclustered index MUST be dense; there is no other choice.

For a range search, we need one I/O for every data record!

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

Page 36: Converting ER to Relational Schema

36

I/Os using a dense, unclustered index during a range search

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

Non-leaf

Pages

Pages (Sorted by search key)

Leaf

Every data record = 1 I/O!

You may re-read some

pages!

Cost to scan data file

in sorted order =

M*N = no. of records

in the file.

Page 37: Converting ER to Relational Schema

37

Indexes

DBMSs often create a clustered index on all primary keys.

Note: primary keys are the values that must be used in foreign keys.

Only one clustered index per table! Why?

You need to decide whether you want additional (unclustered/secondary) indexes.

You need to decide if you want composite indexes.

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

Page 38: Converting ER to Relational Schema

38

Join Algorithms – an Introduction

R ⋈ S is very common! And R S followed by a selection is inefficient. So we process joins (rather than cross product) whenever possible. Lots of effort invested in join algorithms.

Assume: M pages in R, pR tuples per page, N pages in S, pS tuples per page.

In our examples, R is Reserves and S is Sailors.

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

SELECT * FROM Reserves R1, Sailors S1 WHERE R1.sid=S1.sid

Page 39: Converting ER to Relational Schema

Simple Nested Loops Join

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

Join on ith column of R and jth column of S

foreach tuple r in R do

foreach tuple s in S do

if ri == sj then add <r, s> to result

1. In the outer loop, read the first table – tuple-by-tuple.

2. For each tuple in the outer loop, then compare it to each tuple in the

second table … in the inner loop. This requires that we read the entire

table in the second loop – for each tuple in the outer loop.

Page 40: Converting ER to Relational Schema

Simple Nested Loops Join

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

2 ...

12 …

6 ...

1 …

5 …

27 …

Table 1

on disk

... 2

… 13

… 12

… 27

Table 2

on disk

… 1

… 5

Memory Buffers:

Page 41: Converting ER to Relational Schema

Simple Nested Loops Join

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

2 ...

12 …

6 ...

1 …

5 …

27 …

Table 1

on disk

... 2

… 13

… 12

… 27

Table 2

on disk

… 1

… 5

Memory Buffers:

2 ...

12 …

6 ...

... 2

… 13

Query Answer

2 … … 2

Page 42: Converting ER to Relational Schema

42

Simple Nested Loops Join

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

2 ...

12 …

6 ...

1 …

5 …

27 …

Table 1

on disk

... 2

… 13

… 12

… 27

Table 2

on disk

… 1

… 5

Memory Buffers:

2 ...

12 …

6 ...

... 2

… 13

Query Answer

2 … … 2

No match: Discard!

Page 43: Converting ER to Relational Schema

43

Simple Nested Loops Join

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

2 ...

12 …

6 ...

1 …

5 …

27 …

Table 1

on disk

... 2

… 13

… 12

… 27

Table 2

on disk

… 1

… 5

Memory Buffers:

2 ...

12 …

6 ...

Query Answer

2 … … 2

No match: Discard!

… 12

… 27

Page 44: Converting ER to Relational Schema

44

Simple Nested Loops Join

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

2 ...

12 …

6 ...

1 …

5 …

27 …

Table 1

on disk

... 2

… 13

… 12

… 27

Table 2

on disk

… 1

… 5

Memory Buffers:

2 ...

12 …

6 ...

Query Answer

2 … … 2

No match: Discard!

… 12

… 27

Page 45: Converting ER to Relational Schema

45

Simple Nested Loops Join

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

2 ...

12 …

6 ...

1 …

5 …

27 …

Table 1

on disk

... 2

… 13

… 12

… 27

Table 2

on disk

… 1

… 5

Memory Buffers:

2 ...

12 …

6 ...

Query Answer

2 … … 2

No match: Discard!

… 1

… 5

Page 46: Converting ER to Relational Schema

46

Simple Nested Loops Join

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

2 ...

12 …

6 ...

1 …

5 …

27 …

Table 1

on disk

... 2

… 13

… 12

… 27

Table 2

on disk

… 1

… 5

Memory Buffers:

2 ...

12 …

6 ...

Query Answer

2 … … 2

No match: Discard!

… 1

… 5

At this point, we have read the entire table in the inner loop.

So, we advance to the second tuple in the outer loop.

Page 47: Converting ER to Relational Schema

47

Simple Nested Loops Join

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

2 ...

12 …

6 ...

1 …

5 …

27 …

Table 1

on disk

... 2

… 13

… 12

… 27

Table 2

on disk

… 1

… 5

Memory Buffers:

2 ...

12 …

6 ...

... 2

… 13

Query Answer

2 … … 2

No match: Discard!

Page 48: Converting ER to Relational Schema

48

Simple Nested Loops Join

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

2 ...

12 …

6 ...

1 …

5 …

27 …

Table 1

on disk

... 2

… 13

… 12

… 27

Table 2

on disk

… 1

… 5

Memory Buffers:

2 ...

12 …

6 ...

... 2

… 13

Query Answer

2 … … 2

No match: Discard!

Page 49: Converting ER to Relational Schema

49

Simple Nested Loops Join

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

2 ...

12 …

6 ...

1 …

5 …

27 …

Table 1

on disk

... 2

… 13

… 12

… 27

Table 2

on disk

… 1

… 5

Memory Buffers:

2 ...

12 …

6 ...

Query Answer

2 … … 2

12 … … 12

Match!

… 12

… 27

And so forth …

Does this algorithm work for R1.sid < S1.sid? Does this algorithm work for cross product?

Page 50: Converting ER to Relational Schema

50

Cost of Simple Nested Loops Join

For each tuple in the outer relation R, we scan the entire inner relation S, tuple by tuple. Cost: M + (pR * M) * N = 1000 + 100*1000*500 I/Os

50,001,000 I/Os 500,010 seconds 6 days

CS386/586 © Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

Join on ith column of R and jth column of S

foreach tuple r in R do

foreach tuple s in S do

if ri == sj then add <r, s> to result

We assume approximately 100 I/Os per second

M = 1000 pages in R, pR = 100 tuples per page,

N = 500 pages in S, pS = 80 tuples per page.

Page 51: Converting ER to Relational Schema

Better: Page-Oriented Nested Loops Join

2 ...

12 …

6 ...

1 …

5 …

27 …

Table 1

on disk

... 2

… 13

… 12

… 27

Table 2

on disk

… 1

… 5

Memory Buffers:

2 ...

12 …

6 ...

... 2

… 13

Once we’ve got these two pages in memory, check every combination from one page to the other page!

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 51

Page 52: Converting ER to Relational Schema

52

Page-Oriented Nested Loops Join (cont.)

2 ...

12 …

6 ...

1 …

5 …

27 …

Table 1

on disk

... 2

… 13

… 12

… 27

Table 2

on disk

… 1

… 5

Memory Buffers:

2 ...

12 …

6 ...

Do the same thing…compare all

combinations in memory - between

these two pages!

… 12

… 27 This page is

still in memory. Get the

next page.

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 53: Converting ER to Relational Schema

53

Cost of Page-oriented Nested Loops Join

For each page of R, get each page of S, write out matching pairs of tuples <r, s>. Cost: M + M*N = 1000 + 1000*500 = 501,000 (R outer) Cost: N + N*M = 500 + 500*1000 = 500,500 (S outer) Therefore, use smaller relation as outer relation. 500,000 I/Os 500 seconds = 8.3 minutes (assume 1000 I/Os/second)

Compare with simple nested loops: M + (pR * M) * N = 1000 + 100*1000*500 = 50,001,000

for each page of tuples r in R do

for each page of tuples s in S do (match all combinations in

memory)

if ri == sj then add <r, s> to result

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 54: Converting ER to Relational Schema

54

Page-oriented Nested Loops Join

For each page of R, get each page of S, write out matching pairs of tuples <r, s>.

Cost: M + M*N = 1000 + 1000*500 = 501,000 (R outer) Cost: N + N*M = 500 + 500*1000 = 500,500 (S outer) Therefore – typically use smaller relation as outer relation. 500,000 I/Os 8.3 minutes

for each page of tuples r in R do

for each page of tuples s in S do (match all combinations in

memory)

if ri == sj then add <r, s> to result

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 55: Converting ER to Relational Schema

55

The best loops-based join algorithm: Block Nested-Loops Join (use a block of buffers)

Algorithm: One page is assigned to be the output buffer (not shown on this slide)

One page assigned to input from S, B-2 pages assigned to input from R Until all of R has been read {

Read in B-2 pages of R

For each page in S {

Read in the single S page

Check pairs of tuples in memory and output if they match } }

Cost: M + (M/(B-2))*N. For B=35, cost is 1000 + 1000*500/33 = 16,000 I/Os 16 seconds

2 ...

12 …

6 ...

1 …

5 …

27 …

R on disk ... 2

… 13

… 12

… 27

S on disk

… 1

… 5

B pages of Memory Buffer

1 …

5 …

27 …

2 ...

12 …

6 ...

... 2

… 13

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 56: Converting ER to Relational Schema

56

Comparing the nested loops join algorithms

Algorithm Number of times to read inner table

Cost formula (with example M = 1000, pr = 100, b = 52, N = 500 (we don’t use ps)

simple nested loops join

Once for each row in outer table

M + (M*pr) * N 1000 + (1000*100) * 500 1000 + 100,000 * 500 1000 + 50,000,000

page-oriented nested loops join

Once for each PAGE of rows in outer table

M + M*N 1000 + 1000 * 500 1000 + 500,000

block nested loops join

Once for each b-2 pages of rows in outer table

M + (M/(b-2) * N 1000 + (1000/50) * 500 1000 + 20 * 500 1000 + 10,000

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 57: Converting ER to Relational Schema

57

A few comments about costs

These cost formulas are much simpler that real formulas:

Real formulas take other costs into account (including CPU time)

Real systems use buffers and caches to assist with I/Os. (So what we estimate as independent I/Os aren’t necessarily actually I/Os; they might be “reading” data from memory or from a disk cache.)

Note: you can’t control/know what happens … e.g., when you read data from a table or query answer using a cursor. (The OS/DBMS handles these details.)

These examples are tiny! These tables would fit in memory.

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 58: Converting ER to Relational Schema

58

Index Nested Loops Join

If there is an index on the search key sj then can use the index on the inner table - get matching tuples!

Cost: M + ( (M*pR) * cost of finding matching S tuples)

M + ((M*pR) * (I/Os to find index + 1 to get the data)

= 500 + (500*80*4) = 160,500 1/2 hour (Reserves as inner)

= 1000 + (1000*100*3) = 301,000 1 hour (Sailor as inner)

foreach tuple r in R do foreach tuple s in S where ri == sj

Use the Index to find s do add <r, s> to result

For each R tuple, cost of probing S index is about 2-4 for B+ tree.

80,500 (Reserves as inner) 15 min.; 151,00 (Sailor as inner) 30 min.

These could be smaller – if top levels of B+ tree are in memory.

Could be 0… if entire index is in memory.

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 59: Converting ER to Relational Schema

59

Index nested loops join

Advantage: access exactly the right records in the inner loop (for equi-join).

You can use a clustered or an unclustered index for an index nested loops join.

Disadvantage: (necessarily) you have one I/O per row (in the table in the outer loop) rather than one I/O per page (of rows in the outer loop).

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 60: Converting ER to Relational Schema

60

Hash Join

Simple case – the smallest table, say S, fits in main memory Build an in-memory hash index for S

Proceed as for index nested-loops join

Harder case – neither R nor S fits in memory We won’t cover this case in the class. Divide them both in the same way (1 pass) so that each

partition of S fits in memory

Do in-memory matching on each pair of matching partitions

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 61: Converting ER to Relational Schema

Hash Join (simple case; small table fits in memory)

1. Blocks of rows of R R fits in main memory

2. For each row of R, compute h(joinval)

Bucket 1

Bucket 2

Bucket n

First: read and process R

Then: process S against the buckets

Blocks of rows of S 1. Read one page after another until finished.

2. For each row of S, compute h(joinval)

3. Then join this row from S with all matching rows from R; they are all in Bucket n. Do all rows in Bucket n join with this row from S?

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 61

3. Then put row in the proper bucket

Page 62: Converting ER to Relational Schema

62

Algorithms for other operators

Table scan

Index retrieval (select operator)

Index-only scan (project operator)

What might a simple algorithm be for eliminating duplicates (e.g., for DISTINCT or for UNIOIN)?

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 63: Converting ER to Relational Schema

63

Query Optimization

Translate SQL query into a query tree (operators: relational algebra plus a few other ones)

Generate other, equivalent query trees (e.g., using relational algebra equivalences)

For each possible query tree:

select an algorithm for each operator (producing a query plan)

estimate the cost of the plan

Choose the plan with lowest estimated cost - of the plans considered (which is not necessarily all possible plans)

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 64: Converting ER to Relational Schema

Initial Query Tree - Equivalent to SQL (without any algorithms selected)

SELECT S.sname

FROM Reserves R, Sailors S

WHERE R.sid = S.sid AND

R.bid = 100 AND

S.rating > 5;

Reserves Sailors

sid=sid

bid=100 rating > 5

sname

Relational Algebra Tree: SQL Query:

Page 65: Converting ER to Relational Schema

65

Relational Algebra Equivalence: Cascade of (and Uncascade of) Selects

c1… cn(R) c1( … cn(R))

This symbol means equivalence.

So you can replace c1( … cn(R)) with c1… cn(R)

And you can replace c1… cn(R) with c1( … cn(R))

If you have several conditions connected by “AND” in a select operator, then you can apply them one at a time.

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 66: Converting ER to Relational Schema

Example: c1… cn(R) c1( … cn(R))

SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid = S.sid AND R.bid = 100 AND S.rating > 5;

Reserves Sailors

sid=sid

rating > 5

sname

Reserves Sailors

sid=sid

bid=100 rating > 5

sname bid=100

Page 67: Converting ER to Relational Schema

Example: c(R⋈S) c(R)⋈S

SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid = S.sid AND R.bid = 100 AND S.rating > 5;

Reserves Sailors

sid=sid

rating > 5

sname

bid=100

This applies only to

the Sailors table!

Reserves

Sailors

sid=sid

rating > 5

sname

bid=100

Page 68: Converting ER to Relational Schema

68

Example: c(R⋈S) c(R)⋈S (cont.)

SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid = S.sid AND R.bid = 100 AND S.rating > 5;

Reserves

Sailors

sid=sid

rating > 5

sname

bid=100

What are the advantages of “pushing” a

select past a join operator?

What are the disadvantages of “pushing” a

select past a join operator?

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 69: Converting ER to Relational Schema

69

Relational Algebra Equivalences

Selections: c1… cn(R) c1( … cn(R)) Selects Cascade

c1(c2(R)) c2(c1(R)) Selects Commute

Projections: a (R) a (a1 (… an(R))) If each ai contains a. Only last project matters

Joins: R⋈(S⋈T) (R⋈S)⋈T Joins are Associative

R⋈S S⋈R Joins Commute

Try to prove that: R⋈(S⋈T) (T⋈R)⋈S

Try it out with: is agent ⋈ (languagerel ⋈ language) equivalent to:

(language⋈ agent) ⋈ languagerel

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 70: Converting ER to Relational Schema

Original Query Tree (without any algorithms selected)

SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid = S.sid AND R.bid = 100 AND S.rating > 5;

Reserves Sailors

sid=sid

bid=100 rating > 5

sname

RA Tree: SQL Query:

Page 71: Converting ER to Relational Schema

A Plan for the Original Query Tree: How shall we run the query plan?

Reserves Sailors

sid=sid

bid=100 rating > 5

sname

One way to execute this query

is to perform each operator (starting

from the bottom) and always write

the intermediate results out to disk.

We could choose a join algorithm

and then do everything else with

a sequential table scan. nested Loops join

(write temp file to disk)

seq. scan

(write temp file to disk)

seq. scan

Page 72: Converting ER to Relational Schema

But wait … select and project operators operate on just one row at a time

Reserves Sailors

sid=sid

bid=100 rating > 5

sname

We can do a select (to filter out

unwanted rows) and we can do a

project (to drop columns) while

the row is in memory.

Some previous operator must

first read the rows into memory;

then … we can do select

and project “on the fly” which

means “in memory.”

Nested

Loops join

(write temp file to disk)

seq. scan

(write temp file to disk)

seq. scan

On-the-fly

On-the-fly

Page 73: Converting ER to Relational Schema

“On the fly” is “free”

On the fly means that we

evaluate the operator in

memory - while we have the

tuple available.

“On the fly” induces no I/O cost!

Reserves Sailors

sid=sid

bid=100 rating > 5

sname

(Nested Loops Join)

(On-the-fly)

(On-the-fly)

Plan:

SELECT S.sname

FROM Reserves R, Sailors S

WHERE R.sid = S.sid AND

R.bid = 100 AND S.rating > 5;

Page 74: Converting ER to Relational Schema

74

Limitations of “On the fly”

Can only happen if:

Computation can be done entirely on tuples in memory

Results do not need to be materialized

An earlier operator has read the table

Cannot be used for first table scan!

Cannot apply to all operations such as join or aggregates, etc.

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 75: Converting ER to Relational Schema

Cost of plan 1 no index (Sailors – inner loop)

M = # of pages in outer table

N = # of pages in inner table

Cost of page-oriented nested loops join is:

M + M * N

1000 + 1000 * 500 = 501,000

And the “on-the-fly” operations have no I/O - so plan cost is 501,000

SELECT S.sname

FROM Reserves R, Sailors S

WHERE R.sid = S.sid AND

R.bid = 100 AND S.rating > 5;

Reserves Sailors

sid=sid

sname

(On-the-fly)

(On-the-fly)

Plan:

(Nested Loops Join)

bid=100 rating > 5

Page 76: Converting ER to Relational Schema

76

Create other plans

Use relational algebra equivalences to produce new query trees. Advantage: you are sure that the new tree is equivalent

to the original, because equivalences have proofs

Assign different algorithms to operators

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 77: Converting ER to Relational Schema

Cost of plan 2 (Reserves on inner loop)

SELECT S.sname

FROM Reserves R, Sailors S

WHERE R.sid = S.sid AND

R.bid = 100 AND S.rating > 5;

Sailors as the outer relation rather than Reserves.

M = # of pages in outer table

N = # of pages in inner table

Cost of page-oriented nested loops join is:

500 + 500 * 1000 = 500,500

And the “on-the-fly” operations have no I/O - so plan cost is 500,500

Reserves Sailors

sid=sid

bid=100 rating > 5

sname

(On-the-fly)

(On-the-fly)

Plan:

(Page-Oriented Nested Loops Join)

Page 78: Converting ER to Relational Schema

Cost of plan 3 Index nested loops

SELECT S.sname

FROM Reserves R, Sailors S

WHERE R.sid = S.sid AND

R.bid = 100 AND S.rating > 5;

What is the cost of the plan shown?

M + M * pr * (# of index I/Os + 1)

1000 + (1000*100*(3+1)) if 3 index I/Os or

1000 + (1000 * 100 * (0 + 1)) if 0 index I/Ox

Thus …

1000 + 300,000 = 301,000 or

1000 + 100,000 = 101,000

Sailors Reserves

sid=sid

bid=100 rating > 5

sname

(On-the-fly)

(On-the-fly)

Plan:

(Index nested loops)

Page 79: Converting ER to Relational Schema

Cost of plan 4 Push down selects

Reserves Sailors

sid=sid

sname (On-the-fly)

Plan:

SELECT S.sname

FROM Reserves R, Sailors S

WHERE R.sid = S.sid AND

R.bid = 100 AND S.rating > 5;

(Page-Oriented Nested Loops Join)

Apply this equivalence:

c(R⋈S) c(R)⋈S

To the previous query tree to get an equivalent query tree.

What is the cost of the plan shown?

Scanning sailors and reserves cost M+N I/Os. But here we read intermediate files.

What about the cost of the join? It depends on how many reservations there are for boat 100 and how many sailors have a rating >5.

How would you find this information?

Statistics will help.

rating > 5 bid=100

(On-the-fly)

(Table scan) (Table scan)

(On-the-fly)

Page 80: Converting ER to Relational Schema

80

To estimate cost, we need table sizes

For all operators beyond the leaf level of the query plan, the input tables are the result of some earlier query.

Thus, we need to estimate the size of intermediate results! (This can be difficult. This is one reason why the cost estimates may not be very good. Estimation errors tend to compound.)

For example, what information would you need to estimate how many reservations are for bid 100 and how many sailors have a rating >5?

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 81: Converting ER to Relational Schema

81

DBMS Usually Maintains Some Statistics in the DB Catalog

Catalogs typically contain at least: # tuples and # pages for each table.

# distinct key values and # pages for each index.

Index height, low/high key values for each tree index.

Catalogs are updated periodically - say, once a week or once a month. Perhaps they’re updated during the backup.

Simplest case: assume that all attribute values are uniformly distributed. Thus if gender was an attribute, the optimizer would

assume that half of the rows have the male value and other half have the female value. (This might be grossly inaccurate.)

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 82: Converting ER to Relational Schema

82

Calculating Selectivities

Assume that rating values range from 1 to 10, and that bid values range from 1 to 100.

What percentage of the incoming tuples, to the operator

bid=100 , will be output?

What about rating > 5 ?

bid=100 rating > 5 ?

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 83: Converting ER to Relational Schema

83

Doing better than a uniform distribution

The DBMS might gather more detailed information about how the values of attributes are distributed (e.g., histograms of the values in a field) and store it in the catalog. Suppose there was an attribute degree-program with three possible values: “BS CS” “MS CS” “PhD CS” Then the DBMS might count the values and know that there are 428

“BS CS” values, 98 “MS CS” values and 25 “PhD CS” values.

This allows much better estimate of the reduction factor.

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 84: Converting ER to Relational Schema

84

Independence of Reduction Factors So far, we have gathered statistics and shown how to use them,

assuming uniform distributions.

But what about a select operator with terms separated by AND? Example: bid=100 rating > 5

We assume that all terms are independent!

Thus, if one attribute is class and the other is number-of-hours - the query optimizer might assume that class is uniformly distributed over {Fresh, Soph, Jun, Sen} and that number-of-hours is uniformly distributed over {0, 1, …, 205} But, we know that class correlates with number-of-hours!

Might even be that number-of-hours class.

What percentage of the incoming tuples, to the operator bid=100 rating > 5 , will be output?

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 85: Converting ER to Relational Schema

85

Enumerating Plans for Multiple Joins

Back to the problem of generating plans.

Are we trying to generate as many plans as possible? No, best to generate few, as long as cheap plans are among them.

In System R: only left-deep join trees are considered.

B A

C

D

B A

C

D

C D B A

This one is left-deep - the other two are not.

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 86: Converting ER to Relational Schema

86

Queries Over Multiple Relations (Joins)

Left-deep trees allow us to generate all fully pipelined plans.

Intermediate results not written to temporary files.

Not all left-deep trees are fully pipelined (e.g., SM join).

Using only left-deep plans (obviously) restricts the search space. (So optimizer may not find the optimal plan.)

B A

C

D

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 87: Converting ER to Relational Schema

Nested Queries

Nested block is optimized independently, with the outer tuple considered as providing a selection condition.

Outer block is optimized with the cost of ‘calling’ the nested block computation taken into account.

Implicit ordering of these blocks means that some good strategies are not considered.

The non-nested version of the query is typically optimized better. The optimizer might not find it from the nested version, so you may need to explicitly unnest the query.

SELECT S.sname

FROM Sailors S WHERE EXISTS

(SELECT * FROM Reserves R

WHERE R.bid=103

AND R.sid=S.sid)

Nested block to optimize:

SELECT *

FROM Reserves R

WHERE R.bid=103

AND S.sid= outer value

Equivalent non-nested query:

SELECT S.sname

FROM Sailors S, Reserves R

WHERE S.sid=R.sid

AND R.bid=103

Page 88: Converting ER to Relational Schema

88

Summary – Algorithms for Relational Algebra Operators

A virtue of relational DBMSs: queries are composed of a few basic operators; the implementation of these operators can be carefully tuned (and it is important to do this!).

Many alternative implementation techniques for each operator; no universally superior technique for most operators.

Must consider available alternatives for each operation in a query and choose best one based on system statistics, etc. This is part of the broader task of optimizing a query composed of several ops.

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 89: Converting ER to Relational Schema

89

Query Optimizers Don’t (Always) Find the Best Plan

There are usually more plans than the optimizer can consider, even if only left deep plans are considered.

The optimizer might not even try to generate all possible plans (it won’t be able to consider all of them anyway).

Sometimes the optimizer will compare the optimization cost to the estimated execution cost and quit early.

The optimizer chooses the plan with the lowest ESTIMATED cost. Actual costs may differ.

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 90: Converting ER to Relational Schema

90

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

Page 91: Converting ER to Relational Schema

91

Physical DB Design

Now that we know how query optimizers work…

How do we choose the file organizations and indices?

How do we decide whether to modify the schema

for our database?

This is called “physical design” of a database.

This is sort of like query optimization - backwards.

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

91

Page 92: Converting ER to Relational Schema

92

Some Important Physical Database Design Issues

Mapping tables to physical storage/file types. (We won’t talk about this part because we haven’t discussed physical storage in a DBMS.)

Choosing the indexes

Modifying the conceptual schema

Denormalization

Horizontal/vertical decomposition

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

92

Page 93: Converting ER to Relational Schema

93

We Need to Understand the Workload to Choose Indices

For each query in the workload: Which relations does it access?

Which attributes are retrieved?

Which attributes are involved in selection/join conditions? How selective are these conditions likely to be? (That is, how many matching records will there be?

For each update in the workload: Which attributes are involved in selection/join conditions?

How selective are these conditions likely to be?

What type of updates (i.e., INSERT/DELETE/UPDATE) are prominent, and which tables and attributes are affected?

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

93

Page 94: Converting ER to Relational Schema

94

Tips for index selection

Don’t use indexes on small tables (< 200 rows)

Don’t use indexes on columns with few values (T/F, gender, state)

For most systems, indexes on primary keys and foreign keys are sufficient

Don’t forget to add indexes when the schema changes!

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

94

Page 95: Converting ER to Relational Schema

Vertical decomposition: split one table into two – choose which attributes go into each table

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 95

Course c# cname instructor room days

Course2 c# cname

Course1 c# room days

Course3 c# instructor

Take this table:

And replace it with these three tables:

Advantage: if queries use only a subset of the attributes, smaller

rows (fewer attributes) means faster access.

Disadvantage: if queries need all attributes, then you have to join.

Page 96: Converting ER to Relational Schema

Horizontal partition/decomposition (split data into two identical tables – with two different names)

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012 96

Course c# cname instructor room days

Undergraduate-Course c# cname instructor room days

Graduate-Course c# cname instructor room days

Page 97: Converting ER to Relational Schema

97

Use a view to hide changes

A query can use any of the three.

Updates will need to know which table to use.

CS386/586 Introduction to Database Systems, ©Lois Delcambre, David Maier 1999-2012

97

CREATE VIEW Courses(cid, sid, jid, did, pid, qty, val) AS SELECT * FROM Undergraduate-courses UNION

SELECT * FROM Graduate-courses

Undergraduate-Course (c#, cname, instructor, room, days) Graduate-Course (c#, cname, instructor, room, days)

Page 98: Converting ER to Relational Schema

98

CS386/586 Introduction to Database Systems, © Lois Delcambre 1999-2010 Slide 27 Some slides

adapted from R. Ramakrishnan, with permission Lecture 4

Views

A view is a query, with a name, that is stored in the database.

Example view definition:

CREATE VIEW gstudents AS SELECT S.* FROM student S WHERE s.gpa >= 2.5

Views can be used like base tables, in a query or in any other view. Views are like macros.

Page 99: Converting ER to Relational Schema

99

CS386/586 Introduction to Database Systems, © Lois Delcambre 1999-2010 Slide 27 Some slides

adapted from R. Ramakrishnan, with permission Lecture 4

Views can be used to make queries simpler

Completed(StudID, Course)

CREATE VIEW gstudents AS SELECT S.* FROM student S WHERE s.gpa >= 2.5

Queries about “good” students only are easier to write

SELECT S.name, S.phone

FROM gstudent S NATURAL JOIN completed C

WHERE C.course = ‘CS386’;

Page 100: Converting ER to Relational Schema

100

How are views implemented?

When you enter a query that mentions a view in the from clause, the DBMS expands/rewrites your query to include the view definition.

SELECT S.name, S.phone

FROM gstudent S NATURAL JOIN completed C

WHERE C.course = ‘CS386’;

is rewritten as:

SELECT S.name, S.phone

FROM (SELECT S.* FROM student S WHERE s.gpa >= 2.5) AS S NATURAL JOIN completed C

WHERE C.course = ‘CS386’;

CS386/586 Introduction to Database Systems, © Lois Delcambre 1999-2010 Slide 27

Some slides adapted from R. Ramakrishnan, with permission Lecture 4

Page 101: Converting ER to Relational Schema

101

CS386/586 Introduction to Database Systems, © Lois Delcambre 1999-2010 Slide 27 Some slides

adapted from R. Ramakrishnan, with permission Lecture 4

Views used for Security

This views presents a table (called sstudent) that is the student relation without the gpa field.

CREATE VIEW sstudent AS SELECT studID, name, address FROM student

Can you think of some other security examples?

Page 102: Converting ER to Relational Schema

102

CS386/586 Introduction to Database Systems, © Lois Delcambre 1999-2010 Slide 27 Some slides

adapted from R. Ramakrishnan, with permission Lecture 4

Views used to support extensibility

A company’s database includes a relation: Part (PartID: Char(4), weight: real,…)

Weight is stored in pounds

Company is purchased by a firm that uses metric weights

Databases must be integrated and the weight must be in metric units

But there’s much old software using pounds.

Solution: views!

Page 103: Converting ER to Relational Schema

103

CS386/586 Introduction to Database Systems, © Lois Delcambre 1999-2010 Slide 27 Some slides

adapted from R. Ramakrishnan, with permission Lecture 4

Views to support extensibility (cont.)

Solution:

1. Define a new base table with kilograms, NewPart, for the integrated company.

2. CREATE VIEW Part AS SELECT PartID, 2.2046*weight, …(no other changes)… FROM NewPart

3. Old programs still call the table “Part”; now they are referencing a view; before they references the table.

Page 104: Converting ER to Relational Schema

104

CS386/586 Introduction to Database Systems, © Lois Delcambre 1999-2010 Slide 27 Some slides

adapted from R. Ramakrishnan, with permission Lecture 4

Data Independence

A database management system supports data independence if application programs are immune to changes in the conceptual and physical schemas.

Why is this important? Because everything can change.

How does the relational model achieve logical (conceptual) data independence?

Page 105: Converting ER to Relational Schema

105

CS386/586 Introduction to Database Systems, © Lois Delcambre 1999-2010 Slide 27 Some slides

adapted from R. Ramakrishnan, with permission Lecture 4

Levels of Abstraction – support Data Independence

Physical Schema

Conceptual Schema

ES 1 ES 2 ES 3

Physical storage; DBA

Logical storage; data

designer

External view; user and

data designer

Logical data indepence

Physical data indepence

Page 106: Converting ER to Relational Schema

106

CS386/586 Introduction to Database Systems, © Lois Delcambre 1999-2010 Slide 27 Some slides

adapted from R. Ramakrishnan, with permission Lecture 4

Levels of Abstraction in a relational DBMS

Physical Schema

Conceptual Schema

ES 1 ES 2 ES 3

Physical schema: the

tables – after they are modified

by the DBA, plus indexes, files,

etc.

Conceptual schema: ERDs

Logical schema: tables

External schemas are Views

Page 107: Converting ER to Relational Schema

107

CS386/586 Introduction to Database Systems, © Lois Delcambre 1999-2010 Slide 27 Some slides

adapted from R. Ramakrishnan, with permission Lecture 4

A few Comments on Views

View versus temporary table

What is the difference?

Which is better?

Some DBMS’ offer a way around the view update problem

Triggers

Rules

Active research

Page 108: Converting ER to Relational Schema

108

CS386/586 Introduction to Database Systems, © Lois Delcambre 1999-2010 Slide 27 Some slides

adapted from R. Ramakrishnan, with permission Lecture 4

Summary

A view is a stored query definition; rows are not computed until they are needed.

Views can be very useful

Easier query writing, security, extensibility

Views cannot always be unambiguously updated

Three levels of abstraction in a DBMS supports data independence: logical and physical