120
SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

Embed Size (px)

Citation preview

Page 1: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

1

SQL Unit 18: Data Management: Databases and OrganizationsRichard Watson

Summary of Selections from Chapters 9, 10 prepared by Kirk Scott

Page 2: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

2

Chapter 9, The Relational Model and Relational Algebra

• Generally speaking, the contents of this chapter should not be too difficult

• The idea is that most of the information has been introduced inductively in the foregoing sections

• This chapter puts some of the earlier information into context and sums up the idea of relational databases

Page 3: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

3

Background

• Databases existed before the development of the relational model

• They were based on networks or hierarchies• In other words, their implementation was based

on linked data structures• These kinds of databases were not easy to

understand or code• Note how everything is cyclical: O-O databases

are basically modern hierarchical databases

Page 4: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

4

• The general idea of storing data in tables was an obvious alternative from the beginning.

• However, it was unclear whether the relational data model was a practical alternative for large data sets.

• The main apparent problem was performance.• Linked code can run quickly.• Performing joins, for example, by traversing two

tables is not very efficient.

Page 5: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

5

• More basic questions included, “What is a database?” and, “What should the user interface be like?”

• E. F. Codd is recognized as the main figure in the development of the relational model as a practical alternative to existing dbms’s.

• He and others addressed all of these questions.

Page 6: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

6

• Normalization was one of the results of this work.• It illustrates the value of theory.• Garden variety idiots think they understand tables.• “What is there not to understand?” they think.• Without a theoretical understanding there was no

clarity about tables or how to use them.

Page 7: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

7

• The two remaining areas of development were related.

• Efficient algorithms for performing relational operations were necessary.

• Would the user be exposed to the implementations of the operations?

• Or would the user be given a different language as an interface?

Page 8: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

8

• Codd made the following observations about existing systems:

• 1. They forced programmers to write low level code

• This meant that queries were more difficult to write, took longer to write, and typically required debugging because they were error prone.

Page 9: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

9

• 2. No commands were available for processing multiple records at a time.

• Existing systems used procedural algorithms which used loops to traverse linked data structures.

• Linked list traversal can be efficient, but the code can be difficult to write

• By definition, inside the loop one record at a time was accessed.

Page 10: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

10

• The relational model was inherently set-based • It would be desirable to give the user a set-

based interface• That would require the implementation of set

level commands in the db internals.• Efficient implementations would be needed

before the relational model could be adopted

Page 11: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

11

• 3. The existing systems were not amenable to ad hoc querying.

• Trained programmers are needed in order to write procedural code.

• SQL is simple enough that an end user can learn it (maybe).

• Also, the development time for an SQL query is short enough that it becomes practical to write one-time queries, not suites of programs.

Page 12: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

12

• Observations about existing systems and the contrasts with the relational model led Codd to these three goals for a database management system:

• 1. Data independence• 2. Communicability• 3. Set processing

Page 13: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

13

• 1. Data independence• The users of databases should not have to worry

about how the data was physically stored.• They should be free to envision the data simply as

a collection of related tables, regardless of the physical implementation.

• Any physical level questions would be at the operating system or database administrator level.

Page 14: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

14

• 2. Communicability• The basic idea here is that the relational

model, based on tables, records, keys, and values, is relatively easily understood by both users and programmers, making it easier for clients and developers to work together.

• This is in marked contrast to earlier database models.

Page 15: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

15

• 3. Set processing• This is basically just a repetition of information

given above.• The beauty of the relational model is that it

allows queries to be non-procedural and still supports the retrieval of multiple records.

• The model is “tell what you want” rather than “tell how to get it”.

Page 16: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

16

The Major Components of the Relational Model

• The relational data model has three major components:

• Data structures• Integrity rules• Operators used to retrieve, derive, or modify

data

Page 17: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

17

Data Structures

• The following terms summarize the data structures that the relational model is based on:

• Domains (fields)• Relations (collections of fields)• Primary key• Candidate key = Alternate key• Foreign key• Relational database (relations in a primary to

foreign key relationship)

Page 18: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

18

Integrity Rules

• These are the integrity rules of the relational model:

• Entity integrity– The primary key is unique and not null

• Referential integrity– Every foreign key value has to have a matching

primary key value

Page 19: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

19

Operators = Manipulation Languages

• A complete dbms has to support two kinds of functionality

• The two kinds of functionality be together in one language or they may be implemented in separate forms:

• DDL = data definition language = defining the database tables

• DML = data management language = inserting, updating, and deleting data

Page 20: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

20

• There are essentially four language or manipulation options when it comes to relational databases:

• Relational calculus• Relational algebra• SQL• QBE

Page 21: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

21

Relational Calculus

• Relational calculus is based on the mathematical underpinnings of the relational model

• It has never been implemented as a language in a widely accepted dbms product

• Relational calculus will not be pursued at all

Page 22: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

22

Relational Algebra

• Relational algebra also emphasizes the mathematical underpinnings of the relational model

• The query language for Postgres, Quel, was based on relational algebra

• In the marketplace, it has largely been superseded by SQL

Page 23: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

23

• Relational algebra will be pursued for two reasons:

• It provides a useful vocabulary for talking about queries

• Even without delving into the theory, it is possible to make some useful observations about the necessary contents of a query language based on relational algebra concepts

Page 24: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

24

• Relational algebra is fundamentally based on 8 operations:

• 1. Restrict (select): This picks a subset of rows from a table

• 2. Project: This picks a subset of columns from a table:

• 3. Product: This forms all possible pairings of the rows of two tables

Page 25: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

25

• 4. Union: This forms a vertical combination of the rows of two tables

• 5. Intersect: This finds the rows that appear in both of two tables

• 6. Difference: This finds the rows that appear in one table but not another

• 7. Join: this finds a subset of rows of a product, typically where corresponding field values match

Page 26: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

26

• 8. Divide: • Relational divide is not as simple as the other

concepts and has not been fully explained yet• For the sake of completeness it will be

explained in the following overheads• After explaining division, the discussion will

return to relational concepts in general

Page 27: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

27

Relational Division

• Divide should be (and is) the converse of product• Division was mentioned in passing in the unit on

SQL querying that covered double NOT EXISTS• In the context of the products of relations, the

logical concept of FOR ALL is closely linked to the concept of division

• Practically speaking, division will be accomplished using double NOT EXISTS

Page 28: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

28

Relational Division Example

• The plan for this section is to explain relational division with the help of a few examples.

• These examples are actually the last four questions on the assignment for this unit.

• (Note that the current offering of the course may not include this assignment for credit.)

• The answers to these questions will be given here as part of the explanation.

Page 29: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

29

• If you do the assignment, your goal should not be to copy the answers given.

• Instead, after having read the explanatory material, hopefully enough of it will stick in your memory that you can come up with the correct answer on your own.

• If not, you can refer back to the explanations again.

Page 30: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

30

• TableX, TableY, and TableZ are given for the questions/examples.

• TableY is the table in the middle in a m – n relationship between TableX and TableZ

• TableY contains a subset of the Cartesian product of the pk (id) fields of TableX and TableY

• The tables are shown on the following overhead.

Page 31: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

31

TableX

attribute: xid attribute: xone

a g

b h

c i

d i

TableYattribute: xid attribute: zida ra sa tb rb sc rd rd sd t

TableZ

attribute: zid attribute: zone

r l

s m

t n

Page 32: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

32

• Two tables are needed in order to do division.• In this example we are interested in the

quotient of TableY and TableZ• In other words, we’re interested in finding

TableY DIVIDED BY TableZ. • TableX is included in the example in order to

help visualize the relationship between TableY and TableZ.

Page 33: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

33

• Dividing TableY by TableZ won’t yield TableX• It will yield a subset of TableX• This is because TableY isn’t the full Cartesian

product of TableX and TableZ• If TableY were the full Cartesian product of

TableX and TableZ, then TableY DIVIDED BY TableZ would give TableX

Page 34: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

34

• Part of the goal of this discussion is to show how relational division can be accomplished in SQL.

• Division in SQL is done by means of double NOT EXISTS queries

• The familiar structure of such queries consists of double nesting with three tables

• Therefore, it’s convenient to have TableX available along with TableY and TableZ

Page 35: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

35

• Now consider TableY and TableZ. • The first column of TableY is the field xid.• The second column of TableY is the field zid. • TableY and TableZ have the field zid in common.• The second field of TableZ, field zone, does not play a

role in the division. • The division of the two tables is based on the common

field, zid. • The result of the division will be in terms of the first

field in TableY, xid.

Page 36: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

36

• The definition of relational division can be explained using these two tables as an example.

• The verbal expression of what TableY DIVIDED BY TableZ is supposed to produce as a result is this:

• It should find all of those values of xid, the first field in TableY, where those values of xid are matched with every value of zid, the common field, that appears in TableZ.

Page 37: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

37

• The verbal expression can be restated in this way:

• The division of the two tables should find those values of xid in TableY that are in a Cartesian product with the values of zid in TableZ.

• The division operation will not include in the results any values of xid in TableY that are not matched with every value of zid in TableZ.

Page 38: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

38

• TableY divided by TableZ on the fields TableY.zid and TableZ.zid, respectively, gives a one column result table containing xid values taken from TableY.

• TableX, TableY, and TableZ are repeated on the next overhead.

• The result of dividing TableY by TableZ is shown on the overhead following that one.

Page 39: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

39

TableX

attribute: xid attribute: xone

a g

b h

c i

d i

TableYattribute: xid attribute: zida ra sa tb rb sc rd rd sd t

TableZ

attribute: zid attribute: zone

r l

s m

t n

Page 40: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

40

a

d

Page 41: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

41

Another Example

• In order to help the idea stick, another example is explained here verbally without completely illustrating it with tables.

• Suppose some TableR was the full Cartesian product of the xid values in TableX and the zid values in TableZ.

• What would the result be of dividing TableR by TableZ on their common field zid?

Page 42: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

42

• Except for the fact that it's stated verbally rather than completely illustrated, this question is easier than the first one.

• In this example TableR replaces TableY.• If TableR is the Cartesian product of TableX.xid

and TableZ.zid, then every xid value in TableR will be in the result of TableR divided by TableZ.

• In other words, the actual results of the division would be the table shown on the next overhead.

Page 43: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

43

abcd

Page 44: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

44

Relational Division Using SQL

• From a mathematical point of view, relational division is a binary operation.

• Using SQL syntax, relational division can be accomplished with double NOT EXISTS.

• Double NOT EXISTS on three different tables is easier to keep track of than double NOT EXISTS on two tables, where one table appears once and the other table appears twice in the query.

• That’s why TableX is included in the example.

Page 45: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

45

• Relationally, the operation of interest is TableY divided by TableZ

• Operationally, this means finding those values of TableX.xid which are paired in TableY with all of the values of TableZ.zid

• In other words, find those TableX.xid values where there does not exist a TableZ.zid value that it's not matched with in TableY.

• The desired results can be phrased using universal quantification, all, or double negation.

Page 46: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

46

• This is the indication that in SQL the desired result can be obtained with a double NOT EXISTS query.

• If this query is written correctly, the result set of TableX.xid values will equal the set of TableY.xid values that would result from dividing TableY by TableZ on the fields TableY.zid and TableZ.zid, respectively.

• The desired query is shown on the next overhead.

Page 47: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

47

• SELECT xid• FROM TableX• WHERE NOT EXISTS• (SELECT *• FROM TableZ• WHERE NOT EXISTS• (SELECT *• FROM TableY• WHERE TableX.xid = TableY.xid• AND TableY.zid = TableZ.zid));

Page 48: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

48

• Phrased informally, as was done in the unit that covered the double not exists queries, this query asks for those values of xid in TableX where there is not a zid value in TableZ that it's not matched with, through the table in the middle, TableY.

Page 49: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

49

• Notice that this query follows the pattern for double NOT EXISTS queries

• The first query opens the left base table • The second query opens the right base table • The third query opens the table in the middle. • For reasons of scope, both of the joining

conditions are in the third query.

Page 50: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

50

• It is also possible to write such a query using just the two tables that are involved in the division.

• When considering the double NOT EXISTS query an example was given where all of the relevant fields were in the table in the middle and it could be opened three times with aliases in order to achieve the desired results.

Page 51: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

51

• In the division example the table in the middle, TableY, is both the thing that is being divided (the dividend) and the thing that has the result field in it (the quotient).

• TableZ is the thing you're dividing by (the divisor).

Page 52: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

52

• Using the terms for division as the aliases, TableY can be substituted for TableX in the previous example.

• This is possible because the result field of interest is xid, which is in TableY as well as TableX.

• The desired query is shown on the next overhead.

Page 53: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

53

• SELECT DISTINCT xid• FROM TableY AS Quotient• WHERE NOT EXISTS• (SELECT *• FROM TableZ AS Divisor• WHERE NOT EXISTS• (SELECT *• FROM TableY AS Dividend• WHERE Quotient.xid = Dividend.xid• AND Dividend.zid = Divisor.zid));

Page 54: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

54

• What does the division operation have to do with the Cartesian product?

• Are division and product complementary or inverse operations in a relational system?

• If TableY were the full Cartesian product of the xid from TableX and the zid from TableZ, then TableY divided by TableZ would return all of the xid values in TableX.

• Yes, they’re relational inverses.

Page 55: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

55

• The special case of the first example that was used to illustrate division is actually the more common case.

• TableY is not a full Cartesian product of TableX and TableZ.

• Only some of the values of xid have been matched with all of the values of zid.

• Division is also defined in this case, as explained above.

Page 56: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

56

• In essence, division finds the inverse for any values that could or would have been the result of a Cartesian product.

• Relational division ignores those values that did not participate in a full Cartesian product.

Page 57: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

57

• As you may already have noted, relational algebra is not the same as arithmetic algebra.

• If it were, we would be working with numbers, not relations.

• It seems that in the special case, which is the common case, relational division is not a full inverse.

• However, there is another way of viewing this.

Page 58: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

58

• When doing integer division, there is a remainder.

• In a sense, when doing relational division there is also a remainder.

• Those values xid in TableY which did not participate in a Cartesian product are left over

• Those values are in some sense the remainder upon relational division.

Page 59: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

59

• For those interested in things mathematical and logical, it is interesting that the SQL syntax for implementing relational division is the same syntax for implementing the logical quantifier FOR ALL.

• Pursuing an explanation of this aspect of the situation is beyond the scope of these notes.

Page 60: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

60

Relational Algebra

• This, then, is the full list of the eight relational algebra operations:

• Restrict (Select)• Project• Product• Union• Intersect• Difference• Join• Divide

Page 61: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

61

A Primitive Set of Relational Operators

• The truth is that there are only five basic relational operations:

• Restrict• Project• Product• Union• Difference

Page 62: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

62

• The five basic operations are basic for the following reason:

• They cannot be defined in terms of any of the other basic operations

• Put another way, the effects they achieve cannot be achieved using any other combination of basic operations

• However, the remaining 3 operations can be defined in terms of the basic 5.

Page 63: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

63

• The assertion that these five operations are basic will not be demonstrated.

• However, for those who are interested in the question, the following can be noted:

• The five basic operations can be viewed as corresponding to basic operations in a simple algebraic system.

• To a mathematician, the “basicness” of the operations would not be in doubt.

Page 64: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

64

• The three non-basic operations are join, intersection, and division.

• Showing that these three can be defined in terms of the other five will be pursued.

Page 65: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

65

Defining the Join

• The join can be defined in terms of the Cartesian product, selection, and projection

• First, form the Cartesian product of two tables• Then do a selection (restriction) which applies

the joining condition to the two corresponding fields which are internal to the product table

• Then do a projection to obtain only those fields that you want

Page 66: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

66

Defining Intersection

• The intersection can be defined in terms of the union and set differences

• This is illustrated on the next overhead with the help of some Venn diagrams.

Page 67: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

67

• A intersect B =• (A union B) – (A – B) – (B – A)

B - A

B A

A - B

Page 68: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

68

Defining Division

• Just as defining division was a bit messy, explaining why it isn’t a basic operation is also a bit messy.

• Let TableX, TableY, and TableZ again be given as a starting point for the discussion.

• TableY is a partial product, not necessarily a full Cartesian product, of TableX and TableZ

Page 69: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

69

• We’re interested in dividing TableY by TableZ• We would like to find a sequence of basic

relational algebra operations that will result in the same contents as TableY divided by TableZ

Page 70: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

70

• Let TableC = the Cartesian product of TableX and TableZ.

• Consider the difference TableC – TableY.• Let some xid be in TableY• Consider such an xid that matched with every

zid in TableZ

Page 71: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

71

• When you find the difference, TableC – TableY, all occurrences of the xid value in the result of TableY divided by TableZ would be eliminated by the subtraction.

• In the result of the subtraction, no xid value would remain that was in TableY divided by TableZ

Page 72: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

72

• TableY can hold xid values that don’t participate in the full Cartesian product

• These are the remainder xid values• Only some occurrences of the remainder

values would be eliminated by the subtraction.• In other words, in TableC – TableY, some

remainder values in TableC, the full Cartesian product, would not be eliminated

Page 73: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

73

• Now do a projection on TableX on the xid column, giving a single column table, TableAllXid, containing all values of xid.

• Also do a projection on (TableC – TableY) on the xid column, giving a single column table, TableRemainders, containing all of the remainder values of xid.

• Then the result of the division would be TableAllXid – TableRemainders.

Page 74: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

74

• In summary:• TableC = TableX CARTESIAN PRODUCT TableZ• TableRemainders = projection on xid(TableC – TableY)• TableAllXid = projection on xid(TableX)• TableY DIVIDED BY TableZ = TableAllXid – TableRemainders• In short, division can be accomplished with a combination

of a Cartesian product, two subtractions, and two projections

• Division is not a basic operation because it can be accomplished by a combination of basic operations.

Page 75: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

75

Who Cares About the Primitive Operators?

• Some database management systems used relational algebra as their query language.

• The Quel language of Ingres is an example.• This has largely been supplanted by SQL.• The point of the basic relational operators is

that a system with a language that can accomplish what the five basic operators can accomplish is known as relationally complete.

Page 76: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

76

• In other words, all data stored in the database is retrievable.

• All systems can be measured against this standard.

• Theoretically speaking, SQL is a bit of a syntactical mish-mash.

• Whether successful or not, the designers’ goal was to make it friendly to users, not necessarily theoretically beautiful.

Page 77: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

77

• In any case, SQL is relationally complete.• This is easily established by showing that it

supports the five basic operations.• 1. The WHERE clause implements restriction

(selection).

Page 78: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

78

• 2. The listing of the desired fields in a SELECT statement implements projection.

• 3. A join without a joining condition implements the Cartesian product.

• 4. SQL has a UNION operator, so it implements union.

Page 79: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

79

• 5. Finally, relational subtraction is implemented through NOT EXISTS.

• Let relations A and B be given.• Let A and B be union compatible.• In other words, they have the same set of

attributes.• For the sake of illustration, let the attributes

simply be named 1, 2, …, n.

Page 80: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

80

• Then this SQL query would find A – B:• SELECT *• FROM A• WHERE NOT EXISTS• (SELECT *• FROM B• WHERE A.1 = B.1 AND A.2 = B.2 AND …• AND A.n = B.n)

Page 81: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

81

• In other words, find all of those records of A where there is no record in B that is exactly the same.

• Any record of A where there was a record in B that was exactly the same would be subtracted out.

Page 82: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

82

• As you know, SQL also supports joining with separate syntax.

• This is part of what makes SQL a mish-mash, but in this instance, it certainly helps make SQL more user friendly.

Page 83: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

83

A Fully Relational Database

• As stated at the beginning, a relational dbms has three components:

• Structures: domains and relations• Integrity rules: entity and referential• A manipulation language: DDL, DML.• For example, relational algebra, or something else

which is relationally complete.• Relational completeness has just been explained.• The query language supports the five basic operations.

Page 84: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

84

• Do no confuse the phrase “relationally complete” with the phrase “fully relational”.

• The book notes that there are commercially available systems that advertise themselves as relational but which have certain limitations.

• For example, the systems may have an implementation of SQL but not support domains or integrity rules.

• The question is, is it fair to call these systems relational?

Page 85: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

85

• The answer is that they are not fully relational.• E. F. Codd was one of the people instrumental

in developing relational databases.• He came up with a list of 12 characteristics that

could be included in a fully relational dbms implementation, and which such a system should have.

• These are the accepted measuring stick for whether a system is fully relational.

Page 86: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

86

• When considering the current state of the dbms market it is worth noting that Codd’s rules were enunciated in 1985.

• When reading the rules, it may be helpful to read them “negatively.”

• In other words, for every rule there is or has been a dbms advertised as relational that did not have that characteristic.

Page 87: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

87

Codd’s Rules for a Fully Relational Database

• 1. The information rule• Regardless of the underlying implementation,

from the user’s point of view, there is only one logical representation of data in a database:

• Values stored in fields stored in tables.

Page 88: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

88

• 2. The guaranteed access rule• Every value in a database has to be accessible

by specifying the table name, the column name, and the primary key value of the row in which it’s stored.

Page 89: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

89

• 3. Systematic treatment of null values• The system has to support the semantics of

null. • It can’t rely on devices such as storing blanks

or 0’s or other default values to signify null. • The system also has to support the syntax of

null in the query language.

Page 90: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

90

• 4. Active online catalog of the relational model• The system has to maintain an online catalog. • This will include tables like SYSTABLE, SYSCOLUMN,

SYSINDEX, etc. • It should be possible for the user to query the

catalog and find out all of the information about a given user database.

• Note that informally a data dictionary is at least a partial representation of the contents of the system catalog.

Page 91: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

91

• 5. The comprehensive data sublanguage rule• The system has to have a language or languages that support

the following:– Data definition– Data manipulation– Security and integrity– Transaction processing– Interactive querying and querying embedded in a programming

language• Even if a graphical user interface is provided, a text based

language supporting these functions has to be provided• Note that SQL meets all of these requirements

Page 92: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

92

• 6. The view updating rule• The dbms has to be able to update any view that is

theoretically updatable.• Comment mode on:• Note that when views were covered, it was

explained that a change to a view should cause a change in the underlying table(s).

• This rule tells you that some systems have not implemented views in this theoretically correct way.

Page 93: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

93

• 7. High-level insert, update, and delete• The system has to support set-at-a-time operations.• In other words, it has to be possible to insert, update,

and delete multiple records at a time.• Comment mode on:• Note that this is a swipe at graphical user interface-

only systems.• Without a real language, like SQL, it is unlikely that a

graphically based system will be able to support multiple inserts, updates, and deletes.

Page 94: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

94

• 8. Physical data independence• The logical appearance of tables and data to users

will not change even if there is some change in their physical storage.

• For example, a database may be ported to a different machine, hard drive, etc.

• As long as the dbms is the same, the db should seem unchanged.

• This is also true for changes such as adding indexes.

Page 95: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

95

• A user may notice a change in performance, but every query should still run, and it should not be necessary for the user to write queries with syntax that specifies that an index should be used when executing it.

• The system itself is responsible for all access issues at the physical level.

Page 96: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

96

• 9. Logical data independence• Information-preserving changes to the base tables

should not affect queries or applications.• For example, adding a new table to a db (even if

it’s in a relationship with an existing table) should in no way affect any pre-existing applications.

• Or, adding a new field to a table shouldn’t affect existing queries.

Page 97: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

97

• 10. Integrity independence• Integrity constraints should be part of the dbms’s

function.• Application programs should not have to contain

the logic for maintaining the constraints.• It should be possible to change the constraints in

the system without affecting existing applications.• Note that this should not be confused with data

integrity, which is a user problem.

Page 98: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

98

• 11. Distribution independence• If a dbms advertises itself as distributed, the

distribution should be entirely transparent.• In other words, all tables, data and

applications should be accessible and work in the same way as they do without distribution, without any changes needed on the part of the user.

Page 99: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

99

• 12. The non-subversion rule• It should not be possible to get around the

security or integrity constraints by using some other interface or access into the database.

Page 100: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

100

• Rule 0• At a later time Codd also stated this rule:• The dbms should make it possible to manage a

database entirely through its relational capacities.

• In other words, you may supply a graphical user interface or some user tools that are not explicitly relational, but you also have to provide the relational interface.

Page 101: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

101

• By way of explanation, the author now introduces another phrase, “totally relational”.

• The idea is that the system won’t allow non-relational tools to subvert the database.

• It also has a complete set of relational tools to manage the database.

• If these two conditions are met, along with the other 12 (plus rule 0), the dbms is totally relational, even though it may also provide other kinds of interfaces for convenience.

Page 102: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

102

Chapter 10, SQL

• Chapter 10 in the book reviews SQL syntax and then presents some additional information

• The syntax review will be ignored• The additional information will be summarized

Page 103: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

103

User Defined Functions

• SQL allows for the creation of a user defined function

• The syntax is CREATE FUNCTION…• the specifics aren’t important• The general idea is that the user can create a

simple numerical/arithmetic function

Page 104: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

104

User Defined Procedures

• SQL allows for the creation of a user defined procedure

• The syntax is CREATE PROCEDURE…• the specifics aren’t important• The general idea is that the user can package

together a sequence of SQL commands/operations/queries in order to support multi-part transactions

Page 105: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

105

User Defined Triggers

• SQL allows for the creation of a user defined trigger• The syntax is CREATE TRIGGER…• the specifics aren’t important• The general idea is that the user can create a type of

stored procedure which is automatically triggered when some action is taken on the database such as inserting, updating, or deleting the rows of a table

• Triggers can be used to enforce business rules, data integrity checking, transaction logging, etc.

Page 106: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

106

Database Security

• SQL supports security by making it possible to grant or revoke the ability to take certain actions to individuals or groups of users

• This is the basic syntax:• GRANT privilege(s) ON object(s) TO user(s)

[WITH GRANT OPTION]

Page 107: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

107

• These are the privileges that apply to base tables and views:

• SELECT, INSERT, UPDATE, DELETE• These are the privileges that apply only to

base tables:• ALTER, INDEX• It is also possible to specify the following:• ALL PRIVILEGES

Page 108: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

108

• Users can be lists of userid’s or potentially all users, PUBLIC

• The WITH GRANT OPTION tells whether or not a user who has been granted a privilege also has the right to grant it to another user

• Privileges can be withdrawn with REVOKE• If REVOKE is issued on a user who granted a

privilege to another user, the privilege is also revoked from this other user (cascading REVOKE)

Page 109: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

109

The System Catalog

• The system catalog was touched on briefly in the previous chapter

• The catalog is a db in its own right• By querying tables like SYSCATALOG,

SYSCOLUMNS, SYSINDEXES, etc., it is possible to find out everything there is to know about the databases recorded in the catalog

• Note: It is a mystery why SYSCOLUMNS is plural rather than singular in this discussion

Page 110: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

110

Natural Language Processing

• Some vendors may offer natural language processing as a feature of their dbms

• This would allow users to write queries in English

• The system would translate them to SQL

Page 111: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

111

• This is problematic because of the possible ambiguities in English

• It is also possibly problematic because a user who doesn’t understand the database well enough to apply SQL to it may not be able to form clear, meaningful queries against the database in English

Page 112: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

112

Database Connectivity and Drivers

• ODBC and JDBC stand for open/Java database connectivity

• This is a set of standards/technology with the following purpose:

• In a client-server environment, a client can use a server database where the server dbms may be one of several different kinds

• This is accomplished by defining one standard interface and writing a driver for each kind of dbms which supports the common interface

Page 113: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

113

Embedded SQL

• This topic will be relevant at the end of the course when considering PHP

• By that time, you will be working on your project

• Most likely you will figure out how this works just by following examples, not by listening to lectures on the topic

Page 114: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

114

• SQL can be used as a stand-alone language for ad hoc queries

• Procedural programming languages also have syntax allowing for SQL statements to be embedded in them

• This allows a program to process the results of a query, for example

• It also allows a program to enter data into tables

Page 115: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

115

SQL Standardization

• SQL was first standardized in the 1980’s• For example, there was a standard known as

SQL-89• SQL-92, also known as SQL2 is the current gold

standard• In other words, most vendors support this

standard, potentially with additional features

Page 116: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

116

• SQL-99 added object-oriented features• It is not clear yet whether vendors will follow

this standard or go their own way• It’s also not clear whether it’s an improvement

to keep adding new features to a standard that has been relatively simple and successful

• SQLJ refers to another direction taken in SQL standardization, trying to integrate it with Java

Page 117: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

117

Summary

• Purists may quibble about one or more features of SQL

• Also, SQL keeps on developing and it’s not clear all of the developments will succeed in the marketplace

• However, the core of SQL has been around for some time

• There is no sign that SQL is going to go away any sooner than relational database management systems are going to go away.

Page 118: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

118

The End

Page 119: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

119

• Why is there a remainder in relational division?

• What's left behind are those values that couldn't have been the result of a product in the first place, because they are not matched with all of the other values.

• In any event, relational division is certainly related to, and complementary to the operation of finding a product.

Page 120: SQL Unit 18: Data Management: Databases and Organizations Richard Watson Summary of Selections from Chapters 9, 10 prepared by Kirk Scott 1

120

• Those xid values that were not eliminated in TableC – TableY would be the same as those xid values that were in the remainder.

• The remainder values, by definition, are those that didn’t match with all of the zid values.

• That’s how come there will be remainder values left after the set subtraction