Upload
nguyenxuyen
View
220
Download
0
Embed Size (px)
Citation preview
MIS 335 - Database Systems Relational Model
Ahmet Onur Durahim http://www.mis.boun.edu.tr/durahim/
Learning Objectives
• Relational Model Definitions
• SQL (DDL) – Create, Drop
• SQL (DML) – Add, Delete, Update
• Key Constraints and Referential Integrity
• Views and Security
• Convert ER models to Relational models
Relational Model • Introduced by Edgar F. “Ted” Codd in 1970
– Implemented by IBM (system R)
• Most widely used model. – Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc.
• “Legacy systems” in older models – IBM’s IMS / CODASYL (Hierarchical / Network data model)
• Major advantages – Simple data representation – The ease with which complex queries can be expressed
• Recent competitor: object-oriented model – ObjectStore, Versant, Ontos – A synthesis emerging: object-relational model
• Informix Universal Server, UniSQL, O2, Oracle, DB2
Relational DB Definitions
• Relational database: a set of relations
• Relation: made up of 2 parts: – Instance: a table, with rows and columns.
• #Rows = cardinality, #fields = degree / arity
– Schema: specifies name of relation, plus name and type and domain of each column/field. • e.g. Students (sid: string, name: string, login: string,
age: integer, gpa: real)
• Can think of a relation as a set of rows (tuples) (i.e., all rows are distinct)
Example Instance of Students Relation
• Cardinality = 3, degree = 5, all rows distinct
• Do all columns in a relation instance have to be distinct?
sid name login age gpa
53666 Jones jones@cs 18 3.4
53688 Smith smith@eecs 18 3.2
53650 Smith smith@math 19 3.8
Relational Query Languages
• A major strength of the relational model
– supports simple, powerful querying of data
• Queries can be written intuitively, and the DBMS is responsible for efficient evaluation
– The key: precise semantics for relational queries
– Allows the optimizer to extensively re-order operations, and still ensure that the answer does not change
The SQL Query Language
• Developed by IBM (system R) in the 1970s
• Need for a standard since it is used by many vendors
• Standards: – SQL-86
– SQL-89 (minor revision)
– SQL-92 (major revision)
– SQL-99 (major extensions, current standard)
– SQL-2003/2006/2008/2011
SQL Standards Year Name Alias Comments
1986 SQL-86 SQL-86 First formalized by ANSI
1989 SQL-89 FIPS 127-1 Minor revision, in which the major addition were integrity constraints
1992 SQL-92 SQL2, FIPS 127-2
Major revision (ISO 9075), Entry Level SQL-92 adopted as FIPS 127-2
1999 SQL:1999 SQL3
Added regular expression matching, recursive queries (e.g. transitive closure), triggers, support for procedural and control-of-flow statements, non-scalar types, and some object-oriented features (e.g. structured types). Support for embedding SQL in Java (SQL/OLB) and vice-versa (SQL/JRT)
2003 SQL:2003 SQL 2003
Introduced XML-related features (SQL/XML), window functions, standardized sequences, and columns with auto-generated values (including identity-columns)
2006 SQL:2006 SQL 2006
ISO/IEC 9075-14:2006 defines ways in which SQL can be used in conjunction with XML. It defines ways of importing and storing XML data in an SQL database, manipulating it within the database and publishing both XML and conventional SQL-data in XML form. In addition, it enables applications to integrate into their SQL code the use of XQuery, the XML Query Language published by the World Wide Web Consortium (W3C), to concurrently access ordinary SQL-data and XML documents.
2008 SQL:2008 SQL 2008 Legalizes ORDER BY outside cursor definitions. Adds INSTEAD OF triggers. Adds the TRUNCATE statement
2011 SQL:2011
source: Wikipedia
sid name login age gpa
53666 Jones jones@cs 18 3.4
53688 Shero shero@cs 18 3.2
53650 Shero shero@math 19 3.8
The SQL Query Language
• Find all 18 year old students
SELECT *
FROM Students S
WHERE S.age=18
sid name login age gpa
53666 Jones jones@cs 18 3.4
53688 Smith smith@ee 18 3.2
The SQL Query Language
• Find just names and logins
SELECT S.name, S.login
FROM Students S
WHERE S.age=18
name login
Jones jones@cs
Shero shero@cs
sid name login age gpa
53666 Jones jones@cs 18 3.4
53688 Shero shero@cs 18 3.2
53650 Shero shero@math 19 3.8
Querying Multiple Relations
sid cid grade
53831 Carnatic101 C
53831 Reggae203 B
53650 Topology112 A
53666 History105 B
sid name login age gpa
53666 Jones jones@cs 18 3.4
53650 Shero shero@cs 18 3.2
Students Enrolled
SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=“A”
S.name E.cid
Shero Topology112
Names and cids of Students who had an A from the cid they are enrolled
Creating Relations in SQL
CREATE TABLE Students (sid: CHAR(20), name: CHAR(20), login: CHAR(10), age: INTEGER, gpa: REAL)
sid name login age gpa
53666 Jones jones@cs 18 3.4
53688 Shero shero@cs 18 3.2
• Creates the Students relation • Observe that the type (domain) of each field is
specified, and enforced by the DBMS whenever tuples are added or modified
Creating Relations in SQL
CREATE TABLE Enrolled (sid: CHAR(20), cid: CHAR(20), grade: CHAR(2))
sid cid grade
53666 MS 413 BA
53688 MIS 335 BB
• Creates the Enrolled relation
• The Enrolled table holds information about courses that students take
Destroying and Altering Relations
• Destroys the relation Enrolled
• The schema information and the tuples are deleted
DROP TABLE Enrolled
• The schema of Students is altered by adding a new field
• Every tuple in the current instance is extended with a null value in the new field
ALTER TABLE Students ADD COLUMN firstYear:integer
sid name login age firstYear gpa
53666 Jones jones@cs 18 null 3.4
53688 Shero shero@cs 18 null 3.2
ALTER TABLE Students DROP COLUMN age
Adding and Deleting Tuples
• Can insert a single value
INSERT INTO Students(sid, name, login, age, gpa) VALUES (53688, ‘Smith’, ‘smith@ee’, 18, 3.2)
sid name login age gpa
53666 Jones jones@cs 18 3.4
53688 Smith smith@ee 18 3.2
Adding and Deleting Tuples
• Can delete all tuples satisfying some condition
– name = ‘Ahmet’
DELETE FROM Students S WHERE S.name = ‘Ahmet’
• Powerful variants of these commands are available – more later
Updating Tuples
• Can modify/update some attributes of a tuple satisfying some condition – name = ‘Ahmet’, update age to 20 and increase gpa by 0.3
UPDATE Students S SET S.age = 20, S.gpa = S.gpa + 0.3 WHERE S.name = ‘Ahmet’
sid name login age firstYear gpa
53666 Jones jones@cs 18 null 3.4
53688 Ahmet ahmet@cs 18 null 3.2
sid name login age firstYear gpa
53666 Jones jones@cs 18 null 3.4
53688 Ahmet ahmet@cs 20 null 3.5
Integrity Constraints (ICs) • IC: condition specified on a DB schema that must be
true for any instance of the database – e.g., domain constraints – ICs are specified when schema is defined. – ICs are checked when relations are modified.
• A legal instance of a relation is one that satisfies all specified ICs. – DBMS should not allow illegal instances.
• DBMS checks for violations of ICs and disallows changes to the data that violate the specified ICs – Stored data is more faithful to real-world meaning – Avoids data entry errors, too!
Primary Key Constraints • A set of fields is a key for a relation if:
1. No two distinct tuples can have same values in all key fields, and
2. This is not true for any subset of the key
• If Part 2 false? => A superkey • If there’s > 1 key for a relation, one of the keys is
chosen (by DBA) to be the primary key – e.g., sid is a key for Students. (What about name?) – The set {sid, gpa} is a superkey
sid name login age gpa
53666 Jones jones@cs 18 3.4
53688 Shero shero@cs 18 3.2
Primary and Candidate Key in SQL
• For a given student and course, there is a single grade
CREATE TABLE Enrolled (sid: CHAR(20), cid: CHAR(20), grade: CHAR(2) PRIMARY KEY (sid, cid))
Primary and Candidate Key in SQL
• Possibly many candidate keys (specified using UNIQUE), one of which is chosen as the primary key
• Used carelessly, an IC can prevent the storage of
database instances that arise in practice!
CREATE TABLE Enrolled (sid: CHAR(20), cid: CHAR(20), grade: CHAR(2), PRIMARY KEY (sid), UNIQUE (cid, grade))
• Students can take only one course, and receive a single grade for that course
• Further, no two students in a course receive the same grade
Foreign Keys, Referential Integrity
• Foreign key: Set of fields in one relation that is used to `refer’ to a tuple in another relation. – Must correspond to primary key of the second relation
– Like a `logical pointer’
• E.g. sid is a foreign key referring to Students:
sid cid grade
53831 Carnatic101 C
53831 Reggae203 B
53650 Topology112 A
53666 History105 B
sid name login age gpa
53666 Jones jones@cs 18 3.4
53688 Shero shero@cs 18 3.2
Students
Enrolled
Foreign Keys, Referential Integrity • If all foreign key constraints are enforced, referential
integrity is achieved, – i.e., no dangling references.
• The foreign key in the referencing (Enrolled) relation must match the primary key of the referenced relation (Students) – Have the same number of columns and compatible data
types
stdid cid grade
53831 Carnatic101 C
53831 Reggae203 B
53650 Topology112 A
53666 History105 B
sid name login age gpa
53666 Jones jones@cs 18 3.4
53688 Shero shero@cs 18 3.2
Students
Enrolled
Foreign key Primary key
Foreign Keys in SQL • Only students listed in the Students relation should
be allowed to enroll for courses
CREATE TABLE Enrolled (sid: CHAR(20), cid: CHAR(20), grade: CHAR(2), PRIMARY KEY (sid, cid), FOREIGN KEY (sid) REFERENCES STUDENTS)
sid cid grade
53650 Carnatic101 C
53666 Reggae203 B
53650 Topology112 A
53666 History105 B
sid name login age gpa
53666 Jones jones@cs 18 3.4
53650 Shero shero@cs 18 3.2
Students Enrolled
Enforcing Referential Integrity
• Consider Students and Enrolled
– sid in Enrolled is a foreign key that references Students
Students
Enrolled
• What should be done if an Enrolled tuple with a non-existent student id is inserted?
• Reject it! sid cid grade
53650 Carnatic101 C
53666 Reggae203 B
53650 Topology112 A
53666 History105 B
sid name login age gpa
53666 Jones jones@cs 18 3.4
53650 Shero shero@cs 18 3.2
Enforcing Referential Integrity • What should be done if a
Students tuple (with sid=53650) is deleted?
– Also delete all Enrolled tuples that refer to it
– Disallow deletion of a Students tuple that is referred to
– Set sid in Enrolled tuples that refer to it to a default sid (=99999)
– In SQL, also: Set sid in Enrolled tuples that refer to it to a special value null, denoting `unknown’ or `inapplicable’
sid cid grade
53650 Carnatic101 C
53666 Reggae203 B
53650 Topology112 A
53666 History105 B
sid name login age gpa
53666 Jones jones@cs 18 3.4
53650 Shero shero@cs 18 3.2
sid cid grade
99999 Carnatic101 C
53666 Reggae203 B
99999 Topology112 A
53666 History105 B
sid cid grade
null Carnatic101 C
53666 Reggae203 B
null Topology112 A
53666 History105 B
Enforcing Referential Integrity
• What should be done if the primary key of Students tuple is updated
• We have options similar to the previous case
– Also update all Enrolled tuples that refer to it
– Disallow update of a Students tuple that is referred to
– Set sid in Enrolled tuples that refer to it to a default sid
– Set sid in Enrolled tuples that refer to it to a special value null, denoting `unknown’ or `inapplicable’
Referential Integrity in SQL
• SQL/92 and SQL:1999 support all 4 options on deletes and updates
– Default is NO ACTION (delete/update is rejected)
– CASCADE (also delete all tuples that refer to deleted tuple)
– SET NULL / SET DEFAULT (sets foreign key value of referencing tuple)
CREATE TABLE Enrolled (sid: CHAR(20), cid: CHAR(20), grade: CHAR(2), PRIMARY KEY (sid, cid), FOREIGN KEY (sid) REFERENCES STUDENTS ON DELETE CASCADE ON UPDATE SET DEFAULT)
Where do ICs Come From? • ICs are based upon the semantics of the real-world
enterprise that is being described in the database relations.
• Domain, Primary key and Foreign key ICs are the most common; but it may be necessary to specify more general ICs; – Student ages be within a certain range of values
• All students must be at least 16 years old • The DBMS rejects inserts and updates that violate the constraint
– Table Constraints: Associated with a single table and checked whenever that table is modified
– Assertions: Involve several tables and are checked whenever any of these tables is modified
Views • A view is just a relation, but we store a definition,
rather than a set of tuples
• Views can be dropped using the DROP VIEW command – How to handle DROP TABLE if there’s a view on the table?
• DROP TABLE command has options to let the user specify this
CREATE VIEW YoungActiveStudents (name, grade) AS SELECT S.name, E.grade FROM Students S, Enrolled E WHERE S.sid = E.sid and S.age < 21
Views and Security
• Used to restrict data access and/or simplify data access
– Views can be used to present necessary information (or a summary), while hiding details in underlying relation(s)
– Given YoungActiveStudents, but not Students or Enrolled, we can find students who have enrolled, but not the cid’s of the courses they are enrolled in
Logical DB Design: ER to Relational
Entity sets to tables
Employees
ssn
name
lot
CREATE TABLE Employees (ssn: CHAR(11), name: CHAR(20), lot: INTEGER, PRIMARY KEY (ssn))
Relationship Sets to Tables
In translating a relationship set to a relation, attributes of the relation must include:
– Keys for each participating entity set (as foreign keys) • This set of attributes forms a
superkey for the relation
– All descriptive attributes
Works_In Departments
did dname since
budget
Employees
ssn name
lot
CREATE TABLE Works_In (ssn: CHAR(11), did: INTEGER, since: DATE, PRIMARY KEY (ssn, did), FOREIGN KEY (ssn) REFERENCES Employees, FOREIGN KEY (did) REFERENCES Departments)
Relationship Sets to Tables
Unary relationships
CREATE TABLE Reports_To (supervisor_ssn: CHAR(11), subordinate_ssn: CHAR(11), PRIMARY KEY (supervisor_ssn, subordinate_ssn) FOREIGN KEY (supervisor_ssn) REFERENCES Employees(ssn), FOREIGN KEY (subordinate_ssn) REFERENCES Employees(ssn))
Reports_To
Employees name ssn
supervisor subordinate
Review: Key Constraints
• Each dept has at most one manager, according to the key constraint on Manages
Manages Departments
did dname since
budget
Employees
ssn name
lot
Many-to-Many 1-to-1 1-to-Many Many-to-1
Translation to relational model?
CREATE TABLE Manages (ssn CHAR(11), did INTEGER, since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERENCES Employees, FOREIGN KEY (did) REFERENCES Departments)
Review: Key Constraints
• Map relationship to a table: – Because each department
has at most one manager, no two tuples can have the same did values but differ on the ssn value
– Note that did is the key now – Separate tables for
Employees and Departments
Manages Departments
did dname since
budget
Employees
ssn name
lot
CREATE TABLE Dept_Mgr (did INTEGER, dname CHAR(20), budget REAL, ssn CHAR(11), since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERENCES Employees)
Review: Key Constraints
• Since each dept has a unique manager, we could instead combine Manages and Departments
Manages Departments
did dname since
budget
Employees
ssn name
lot
Review: Participation Constraints
• Does every department have a manager? – If so, this is a participation constraint – The participation of Departments in Manages is said to be
total (vs. partial) • Every did value in Departments table must appear in a row of the
Manages table (with a non-null ssn value!)
Works_In
since
Manages Departments
name ssn since
Employees
dname did
since lot
Participation Constraints in SQL
• We can capture participation constraints involving one entity set in a binary relationship, but little else (without resorting to CHECK constraints)
CREATE TABLE Dept_Mgr ( did INTEGER, dname CHAR(20), budget REAL, ssn CHAR(11) NOT NULL, since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERENCES Employees ON DELETE NO ACTION)
Participation Constraints in SQL • We cannot capture participation constraints using the
translation approach that creates a distinct table for the relation • The NOT NULL constraint would prevent the firing of a manger,
but does not ensure that a manager is initially appointed for each department CREATE TABLE Manages
(ssn CHAR(11) NOT NULL, did INTEGER NOT NULL, since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERENCES Employees, FOREIGN KEY (did) REFERENCES Departments)
CREATE TABLE Department (did INTEGER, ….
Assertion to capture Works_In total participation
• If we want to enforce total participation constraint such as “each employee must work at at least one department”, we can use assertions as follows;
CREATE TABLE Product (id INT NOT NULL, category INT NOT NULL, price DECIMAL, PRIMARY KEY (id, category) )
CREATE TABLE Product_Order (no INT NOT NULL, prd_cat INT NOT NULL, prdid INT NOT NULL, custid INT NOT NULL, PRIMARY KEY (no), FOREIGN KEY (prd_cat, prdid) REFERENCES Product(category, id) ON UPDATE CASCADE FOREIGN KEY (custid) REFERENCES Customer(id)) )
CREATE TABLE Customer (id INT NOT NULL, PRIMARY KEY (id) )
Review: Weak Entities
• A weak entity can be identified uniquely only by considering the primary key of another (owner) entity. – Owner entity set and weak entity set must participate in a
one-to-many relationship set (1 owner, many weak entities)
– Weak entity set must have total participation in this identifying relationship set
– It has both a key and total participation constraints
Policy Dependents
name ssn cost
Employees
pname age
Dependents Policy
name
Translating Weak Entity Sets
• Weak entity set and identifying relationship set are translated into a single table – When the owner entity is deleted, all owned weak
entities must also be deleted
CREATE TABLE Depd_Policy ( pname CHAR(20), age INTEGER, cost REAL, ssn CHAR(11), PRIMARY KEY (pname, ssn), FOREIGN KEY (ssn) REFERENCES Employees ON DELETE CASCADE)
ssn cannot be null since it is part of the PK
Review: ISA Hierarchies
• As in C++, or other PLs, attributes are inherited
• If we declare A ISA B, every A entity is also considered to be a B entity
Contract_Emps
name ssn
Employees
contractid
Hourly_Emps
hours_worked hourly_wages
ISA
• Overlap constraints: Can Joe be an Hourly_Emps as well as a Contract_Emps entity? (Allowed/disallowed)
• Covering constraints: Does every Employees entity also have to be an Hourly_Emps or a Contract_Emps entity? (Yes/no)
name
Translating ISA Hierarchies to Relations
• General approach: 3 relations: map each entity sets Employees, Hourly_Emps and Contract_Emps to a distinct relation – Hourly_Emps: Every employee is recorded in Employees.
• For hourly emps, extra info recorded in Hourly_Emps (hourly_wages, hours_worked, ssn);
• must delete Hourly_Emps tuple if referenced Employees tuple is deleted
– Queries involving all employees are easy, but those involving just Hourly_Emps require a join to get some attributes
• Alternative: Just Hourly_Emps and Contract_Emps – Hourly_Emps: ssn, name, lot, hourly_wages, hours_worked
– Each employee must be in one of these two subclasses
Aggregation • Employees, Projects, and Departments entity
sets and the Sponsors relationship set are map as described previously
• For the monitors relationship set, create a relation with the following attributes; – The key attributes of Employees => ssn – The key attributes of Sponsors => (did, pid) – The descriptive attributes of Monitors => until
Departments
name
ssn
Employees
Projects
pbudget
started_on
Monitors
Sponsors
since
pid
until
budget
did dname
Review: Binary vs. Ternary Relationships
Covers
Policies
Employees Dependents
name
ssn
policyid cost
pname age
Purchaser
Policies
Employees Dependents
name
ssn
policyid cost
pname age
Beneficiary
Better design
Bad design
• If we have additional requirements; – A policy cannot be owned jointly by
two or more employees – Every policy must be owned by
some employee – Dependents is a weak entity, and
uniquely identified by taking pname in conjunction with policyid of a policy entity
• ER diagram is inaccurate
• What are the additional constraints in the 2nd diagram?
Binary vs. Ternary Relationships
• The key constraints allow us to combine Purchaser with Policies and Beneficiary with Dependents
• Participation constraints lead to NOT NULL constraints
• What if Policies is a weak entity set?
CREATE TABLE Policies ( policyid INTEGER, cost REAL, ssn CHAR(11) NOT NULL, PRIMARY KEY (policyid), FOREIGN KEY (ssn) REFERENCES Employees ON DELETE CASCADE)
CREATE TABLE Dependents ( pname CHAR(20), age INTEGER, policyid INTEGER, PRIMARY KEY (pname, policyid), FOREIGN KEY (policyid) REFERENCES Policies ON DELETE CASCADE)
Relational Model: Summary • A tabular representation of data
• Simple and intuitive, currently the most widely used
• Integrity constraints can be specified by the DBA, based on application semantics. DBMS checks for violations – Two important ICs: primary and foreign keys
– In addition, we always have domain constraints
• Powerful and natural query languages exist
• Rules to translate ER to relational model
Exercise - 1 • Answer following questions that are based on the following
relational schema:
• Give an example of a foreign key constraint that involves the Dept relation. What are the options for enforcing this constraint when a user attempts to delete a Dept tuple?
• Write the SQL statements required to create the preceding relations, including appropriate versions of all primary and foreign key integrity constraints
• Define the Dept relation in SQL so that every department is guaranteed to have a manager
Exercise - 1 • Answer following questions that are based on the following
relational schema:
• Write an SQL statement to add John Doe as an employee with eid = 101, age = 32 and salary = 15,000
• Write an SQL statement to give every employee a 10 percent raise
• Write an SQL statement to delete the Toy department. Given the referential integrity constraints you chose for this schema, explain what happens when this statement is executed