2. Relational Data Model
Topics:
1. Relational Data Model Structure
2. Relational Data Model Operations
2.1 Relational Data Model
Structure
2.1 Relational Data Model Structure
Topics:
Why relational data model
Brief history of relational data model
Basic concepts
Terminology
Schemas
Properties of relations
Relation keys
2.1 Relational Data Model Structure Why Relational Data Model?
We will devote more time to Relational Data Model compared to any other data model for the following reasons:
The model is easy to understand.
It has simple concepts: tables, columns, rows and constraints.
It has mathematical foundation.
Many database management system products are based on the relational data model.
2.1 Relational Data Model Structure Brief History of Relational Data Model
Introduced by E.F. Codd of IBM in 1970.
A prototype RDBMS called System R was developed in late 1970s by IBM.
SQL was developed by IBM as a language for RDBMSs
Commercial RDBMSs DB2 and SQL/DS were developed by IBM corporation and Oracle by Oracle corporation.
INGRES was developed at Univ. of California at Berkeley and later made available as commercial RDBMS. Used language QUEL.
Later many commercial RDBMS were developed in 1980s.
2.1 Relational Data Model Structure Concepts
Basic concepts of the relational data model:
A database consists of one or more relations.
A relation is described by a name, names for one or more attributes and consists of zero or more tuples.
A tuple consists of values for each attribute of the relation.
An attribute takes values from a domain and there exists one value for each attribute in a tuple.
A domain consists of all allowed atomic values for one or more attributes.
2.1 Relational Data Model - Structure Some more Concepts
• Degree of a relation is the number of attributes of the relation.
• Cardinality of a relation is the number of tuples of the relation.
2.1 Relational Data Model Structure Relation
A relation is analogous to a table of rows and columns as shown below:
Employee Name Designation Department Name
Niranjan Software Engineer Development
Praveen Software Engineer Development
Dinesh Director Marketing
Harish Manager Administration
Attributes
Tu
ple
s
12 values in 4 tuples and 3 attributes
Relation: Employees
2.1 Relational Data Model Structure Alternative Terminology
Terms relation, attribute and tuple are from mathematics. The terms used by developers and many RDBMSs are table, row and column:
Employee Name Designation Department Name
Niranjan Software Engineer Development
Praveen Software Engineer Development
Dinesh Director Marketing
Harish Manager Administration
Column names
Ro
ws
Columns
Table: EMPLOYEES
2.1 Relational Data Model Structure One More Alternative Terminology
Term Record is used instead of tuple/row and term Field is used instead of attribute/column.
Employee Name Designation Department Name
Niranjan Software Engineer Development
Praveen Software Engineer Development
Dinesh Director Marketing
Harish Manager Administration
Field names
Re
co
rds
Fields
Table: EMPLOYEES
2.1 Relational Data Model Structure Schema
Schema is description of data.
Relation schema is a description of a relation using its
attributes and domains. Database schema is the collection of relation schemas
and schemas of relationships between the relations
2.1 Relational Data Model Structure Properties of Relations
Every relation has the following properties: A distinct name.
Each attribute has a distinct name.
An atomic value for an attribute in a tuple.
All values for an attribute are from the same domain.
Each tuple is distinct(means no duplicate tuples).
There is no significance to the order of attributes.
There is no significance to the order of tuples.
2.1 Relational Data Model Structure Distinct Relation Names
Each relation or table in a database should have distinct name. For example, three relations with names as specified below in a database are valid EMPLOYEES DEPARTMENTS PROJECTS The following names for three relations in a database are invalid: EMPLOYEES DEPARTMENTS DEPARTMENTS Name DEPARTMENTS is a duplicate name.
2.1 Relational Data Model Structure Distinct Attribute Names
Each relation or table in a database should have distinct names for its attributes or columns. For example, relation EMPLOYEES with attribute names as specified below is valid: Employee Name, Designation, Department Name
It is invalid to specify duplicate names for attributes for a
relation as shown below:
Employee Name, Designation, Designation,
Department Name
Attribute Designation is duplicate.
2.1 Relational Data Model Structure Atomic Values
Each value of each attribute in a relation is atomic. It means the value can not be divided. A tuple can not have multiple values for an attribute. The following table shows intention of storing two designations Manager and Director for employee Harish. This is not possible:
Employee Name Designation Department Name
Niranjan Software Engineer Development
Praveen Software Engineer Development
Dinesh Director Marketing
Jimson K John Manager Director Development
2.1 Relational Data Model Structure Values From The Same Domain
Each value of an attribute can be from only one Domain All values of a domain are expected to be of same data type. For example, if an attribute is used for storing age of employees as an integer number such as 25, you can neither use its word format “twenty five” nor non-integer number such as 25.5 as these two values are not integers.
2.1 Relational Data Model Structure No Duplicate Tuples
Employee Name Designation Department Name
Niranjan Software Engineer Development
Praveen Software Engineer Development
Dinesh Director Marketing
Praveen Software Engineer Development
There are two tuples for employee “Praveen”.
Each relation in a database is expected to have unique tuples. No two tuples are identical in values for all attributes . Relation violating the property:
2.1 Relational Data Model Structure No Significance to Order of Attributes
There is no significance to the order of attributes in a relation. Database language does not depend on the order of attributes. Following three alternatives for a relation are same: Employees(Employee Name, Designation, Department Name)
Employees(Designation, Employee Name, Department Name)
Employees(Department Name, Employee Name, Designation)
2.1 Relational Data Model Structure No Significance to Order of Tuples
Order of tuples does not affect the results of operations on the tables. Employee Name Designation Department Name
Niranjan Software Engineer Development
Praveen Software Engineer Development
Dinesh Director Marketing
Harish Director Administration
Employee Name Designation Department Name
Harish Director Administration
Niranjan Software Engineer Development
Dinesh Director Marketing
Praveen Software Engineer Development
For example, the tables are equivalent.
2.1 Relational Data Model Structure Relation Keys
A property of relation is it has distinct tuples. I.e., no two tuples have same values for each attribute. To make tuples distinct, values of one or more attributes together should be different in each tuple. These attributes that uniquely identify a tuple is called key. There are various kinds of keys.
2.1 Relational Data Model Structure Relation Keys - Super Key
A Super Key is a set of one or more attributes that uniquely identifies a tuple within a relation.
Super Keys
Eno Ename Designation PAN Dno
1001 Niranjan Software Engineer NIR01ABC01 10
1002 Praveen Trainee PRA02ABC02 10
1003 Prashanth Admin PRA03ABC03 20
1004 Srilatha Software Engineer SRI04ABC04 10
1005 Sagar Manager SAG05ABC05 10
There can be multiple super keys for a relation.
A key that has more than one attribute is called composite key.
2.1 Relational Data Model Structure Relation Keys - Candidate Key
A Candidate Key is a super key such that no proper subset is a super key.
Eno Ename Designation PAN Dno
1001 Niranjan Software Engineer NIR01ABC01 10
1002 Praveen Trainee PRA02ABC02 10
1003 Prashanth Admin PRA03ABC03 20
1004 Srilatha Software Engineer SRI04ABC04 10
1005 Sagar Manager SAG05ABC05 10
Candidate Keys
A candidate key has minimum number of attributes required to uniquely identify tuples.
2.1 Relational Data Model Structure Relation Keys - Primary Key
A Primary Key is a candidate key that is chosen by the database designer as the principal means of identifying tuples uniquely within a relation.
Eno Ename Designation PAN Dno
1001 Niranjan Software Engineer NIR01ABC01 10
1002 Praveen Trainee PRA02ABC02 10
1003 Prashanth Admin PRA03ABC03 20
1004 Srilatha Software Engineer SRI04ABC04 10
1005 Sagar Manager SAG05ABC05 10
Primary Key (Eno)
A relation, by definition, will have only one primary key.
Candidate keys other than primary key are called alternate keys.
2.1 Relational Data Model Structure Relation Keys - Foreign Key
A Foreign key is a set of one or more attributes within one relation that matches a candidate key of another relation or possibly the same relation.
Relation that has a foreign key is called referencing relation where as the relation that contains the candidate key referenced relation.
subj_id subj_title book_id book_title subj_id
book_id ch_no ch_title
SUBJECTS BOOKS
CHAPTERS
Primary keys Foreign keys
2.2 Relational Data Model
Operations
2.2 Relational Data Model Operations Description Language
We will use Relational Algebra to describe operations on the relation data model. Relational algebra was introduced by E.F. Codd in 1971. Other languages that can be used to specify the operations is Relational Calculus (domain relational calculus and tuple reational calculus) but we will not cover this.
2.2 Relational Data Model Operations Relational Algebra
The Relational Algebra is a set language. All tuples from one or more relations are operated using one statement of the language without using any looping constructs. Five fundamental operations : Selection, Projection, Cartesian Product, Union and Set difference. Other operations: Join, Intersection, and Division and a few variations of joins. Selection and Projection are unary operations; operate on one relation. Others are binary operations, i.e., operate on two relations.
2.2 Relational Data Model Operations Relational Algebra Operations
The Relational Algebra operations are as follows:
SNO Operation Notation
1 Selection σpredicate
(R)
2 Projection Πa1,...,an
(R)
3 Union R U S
4 Set difference R - S
5 Intersection R ∩ S
6 Division R / S
7 Aggregate GAL
8 Grouping GAGAL
SNO Operation
9 Cartesian Product
10 Theta join
11 Equi oin
12 Natural join
13 Outer join
2.2 Relational Data Model Operations Relational Algebra – Selection
The Selection operation produces a relation that contains only those tuples that satisfy the specified predicate. Predicate is a condition containing columns of the relation and constants that returns a boolean value (true or false). The selection operation σ
salary > 5000(Employees) gives only those tupes
that satisfies the specified predicate which is salary > 5000. Eno Ename Designation Salary Dno
1001 Niranjan Software Engineer
10000 10
1002 Praveen Trainee 5000 10
1003 Prashanth Admin 6000 20
1004 Sugumar Software Engineer
8000 10
1005 Majunath Software Engineer
5000 10
Table: Employees
Eno Ename Designation Salary Dno
1001 Niranjan Software Engineer
10000 10
1003 Prashanth Admin 6000 20
1004 Sugumar Software Engineer
8000 10
Result of selection
2.2 Relational Data Model Operations Relational Algebra – Projection
The Projection operation produces a relation that contains vertical subset of given relation of specified attributes.
Given the following relation Employee,
operation ΠEno, Ename, Salary
(Employees) gives a relation as shown below:
Eno Ename Designation Salary Dno
1001 Niranjan Software Engineer 10000 10
1002 Praveen Trainee 5000 10
1003 Prashanth Admin 6000 20
1004 Sugumar Software Engineer 8000 10
1005 Manjunath Trainee 5000 10
Eno Ename Salary
1001 Niranjan 10000
1002 Praveen 5000
1003 Prashanth 6000
1004 Sugumar 8000
1005 Manjunath 5000
Employees Result relation
2.2 Relational Data Model Operations Relational Algebra – Union
Given two relations, the Union operation produces a relation that contains all tuples of the two relations.
Given the following relations Employees and Contract_Employees,
operation Employees U Contract_Employees gives a relation as shown
below:
Eno Ename Dno
1001 Niranjan 10
1002 Praveen 10
1003 Prashanth 20
1004 Sugumar 10
1005 Manjunath 10
Eno Ename Dno
5001 Ravi 10
5002 Akshay 10
Employees Contract_Employees Eno Ename Dno
1001 Niranjan 10
1002 Praveen 10
1003 Prashanth 20
1004 Sugumar 10
1005 Manjunath 10
5001 Ravi 10
5002 Akshay 10
Result relation
Notes: (1)The two relations must be union-compatible (2) The result will not contain any duplicate tuples.
= U
2.2 Relational Data Model Operations Relational Algebra – Set Difference
Given two relations, the Set Difference operation produces a relation that contains all tuples of the first relation that are not in the second relation.
Given the following relations Employees and Trainee_Employees,
operation Employees - Trainee_Employees gives a relation as shown
below:
Eno Ename Dno
1001 Niranjan 10
1002 Praveen 10
1003 Prashanth 20
1004 Sugumar 10
1005 Manjunath 10
Eno Ename Dno
5001 Ravi 10
5002 Akshay 10
1002 Praveen 10
1005 Manjunath 10
Employees Trainee_Employees
Eno Ename Dno
1001 Niranjan 10
1003 Prashanth 20
1004 Sugumar 10
Result Relation
= -
Note that the two relations must be union-compatible
2.2 Relational Data Model Operations Relational Algebra – Intersection
Given two relations, the Intersection operation produces a relation that contains tuples that are common to both the relations.
Given the following relations Employees and Trainee_Employees,
operation Employees ∩ Trainee_Employees gives a relation as shown
below:
Eno Ename Dno
1001 Niranjan 10
1002 Praveen 10
1003 Prashanth 20
1004 Sugumar 10
1005 Manjunath 10
Eno Ename Dno
5001 Ravi 10
5002 Akshay 10
1002 Praveen 10
1005 Manjunath 10
Employees Trainee_Employees
Eno Ename Dno
1002 Praveen 10
1005 Manjunath 10
∩ =
Result relation
Note that the two relations must be union-compatible
Relational Data Model Operations Relational Algebra – Cartesian Product
Given two relations, the Cartesian Product operation produces a relation that is a concatenation of every tuple of the first relation with every tuple of the second relation.
Cartesian Product of relations Employees and Departments, written as
Employees X Departments, is shown below:
Eno Ename Dno
1001 Niranjan 10
1003 Prashanth 20
Dno Dname
10 Devlopment
20 Admin
30 Marketing
40 Research
Employees
Departments Eno Ename Dno Dno Dname
1001 Niranjan 10 10 Development
1001 Niranjan 10 20 Admin
1001 Niranjan 10 30 Marketing
1001 Niranjan 10 40 Research
1003 Prashanth 10 10 Development
1003 Prashanth 10 20 Admin
1003 Prashanth 10 30 Marketing
1004 Prashanth 10 40 Research
X
Result
=
(2 x 3) (4 x 2)
(8 x 5)
2.2 Relational Data Model Operations Relational Algebra – Theta Join
The Theta Join(θ join)operation produces a relation that contains tuples satisfying a predicate F from cartesian product of given two relations.
Theta join of relations Employees and Departments, written as
Employees F Departments, is shown below:
Eno Ename Dno
1001 Niranjan 10
1003 Prashanth 20
Dno Dname
10 Devlopment
20 Admin
30 Marketing
Employees Departments Eno Ename Dno Dno Dname
1001 Niranjan 10 20 Admin
1001 Niranjan 10 30 Marketing
1003 Prashanth 20 30 Marketing
F
Result relation
=
F is (Employees.Dno < Departments.Dno)
R F S = σ
FR X S
2.2 Relational Data Model Operations Relational Algebra – Equijoin
The Equijoin operation produces a relation that contains tuples satisfying a predicate F that contains only equal operators(=) from cartesian product of given two relations.
Equijoin of relations Employees and Departments, written as
Employees F Departments, is shown below:
Eno Ename Dno
1001 Niranjan 10
1003 Prashanth 20
Dno Dname
10 Devlopment
20 Admin
30 Marketing
Employees Departments Eno Ename Dno Dno Dname
1001 Niranjan 10 10 Marketing
1003 Prashanth 20 20 Admin F
Result relation
=
F is (Employees.Dno = Departments.Dno)
2.2 Relational Data Model Operations Relational Algebra – Natural Join
The Natural Join operation is an Equijoin of given two relations over all common attributes with result containing only one occurrence of each common attribute.
Natural join of relations Employees and Departments, written as
Employees Departments, is shown below:
Eno Ename Dno
1001 Niranjan 10
1003 Prashanth 20
Dno Dname
10 Devlopment
20 Admin
30 Marketing
Employees Departments
Eno Ename Dno Dname
1001 Niranjan 10 Marketing
1003 Prashanth 20 Admin
Result relation
=
Common Attributes
2.2 Relational Data Model Operations Relational Algebra – Left Outer Join The Left Outer Join operation of given two relations is a join in which tuples from the left table of the join that do not have matching tuples in the right table for the specified predicate are also included in the result relation but with null values for the attributes of the right table.
Left Outer Join of relations Courses and Students, written as
Courses P Students is shown below:
Cno Cname
C01 Java
C02 C++
C03 DBMS
C04 Android
C05 jQuery
Sno Sname Cno
101 Akhil C01
102 Akhil C03
103 Prithivi C03
104 Chetan C01
Courses Students Cno Cname Sno Sname Cno
C01 Java 101 Akhil C01
C01 Java 104 Chetan C01
C02 C++ NULL NULL NULL
C03 DBMS 102 Akhil C03
C03 DBMS 103 Prithvi C03
C04 Android NULL NULL NULL
C05 jQuery NULL NULL NULL
P
Result relation
=
Predicate: (Courses.Cno = Students.Cno)
2.2 Relational Data Model Operations Relational Algebra – Semijoin
The (Left) Semijoin operation of given two relations is a join in which each tuple from the left table that has at least one matching tuple in the right table is included in the result relation.
Semijoin of relations Courses and Students, written as Courses I>P
Students is shown below:
Cno Cname
C01 Java
C02 C++
C03 DBMS
C04 Android
C05 jQuery
Sno Sname Cno
101 Akhil C01
102 Akhil C03
103 Prithivi C03
104 Chetan C01
Courses Students
Cno Cname C01 Java
C03 DBMS P
Result relation
=
Predicate: (Courses.Cno = Students.Cno) Note: Columns from right table will not be in the result.
2.2 Relational Data Model Operations Relational Algebra – Anti-Semijoin
The (Left) Anti-Semijoin operation of given two relations is a join in which each tuple from the left table that does not have a matching tuple in the right table is included in the result relation.
Anti-Semijoin of relations Courses and Students is shown below:
Cno Cname
C01 Java
C02 C++
C03 DBMS
C04 Android
C05 jQuery
Sno Sname Cno
101 Akhil C01
102 Akhil C03
103 Prithivi C03
104 Chetan C01
Courses Students
Cno Cname C02 C++
C04 Android
C05 jQuery P
Result relation
=
Predicate: (Courses.Cno = Students.Cno) Note: Columns from right table will not be in the result.
Anti
2.2 Relational Data Model Operations Relational Algebra – Division
The Division operation of two relations produces a relation that consist of set of tuples from the first relation for a set of attributes that match the combination of every tuple in the second relation where the set of attributes exists in the first table but not in the second table.
Division of relations Courses and Students is shown below:
Cno
C01
C03
Students
Sno Sname 101 Akhil
105 Pranoy
Result relation
= ÷
Sno Sname Cno
101 Akhil C01 101 Akhil C03 102 Prithivi C04
103 Chetan C01
104 Pranoy C01
104 Pranoy C03
102 Prithvi C02
Speical-Courses
This answers typical question of who took courses C01 and C03 ?
2.2 Relational Data Model Operations Relational Algebra – Division
Another example for the division operation. Suppose you have a table of prospective candidates whom you want to recruit provided they all have the required skills present in another table. You can use the Division operator to get the result:
Skill
Java
C++
JSP
Candidates
Candidate Name
Akhil
Sparsh
Result relation
= ÷
Candidate Name Skill
Chetan Objective-C
Chetan C++
Chetan Java
Sparsh C++
Sparsh JSP
Sparsh Java
Sparsh HTML
Pranoy Java
Pranoy C++
Akhil PHP
Akhil C++
Akhil Objective-C
Akhil Java
Akhil JSP
Skills
Note that the table of Skills could be derived as a result of another operator on some other tables.
2.2 Relational Data Model Operations Relational Algebra – Aggregate
Aggregate operator, GAL(R), produces a relation by applying an aggregate function list to a given relation. An aggregate function returns one value that is computed from a collection of values. An aggregate function list, AL, consists of pairs of an aggregate function and an attribute. Aggregate operation GSUM(Salary), COUNT(Eno)(Employees) is shown below:
Employees
Result relation
Typical aggregate functions are COUNT, SUM, AVG, MAX and MIN.
SUM(Salary) COUNT(Eno)
74000 7
Eno Ename Salary Dno
1001 Niranjan 10000 10
1002 Praveen 5000 10
1003 Prashanth 6000 20
1004 Sugumar 8000 10
1005 Majunath 5000 10
1006 Dinesh 25000 30
1007 Harish 15000 20
2.2 Relational Data Model Operations Relational Algebra – Grouping
Grouping operator, GAGAL(R), produces a relation that contains the grouping attributes, GA, and result for each aggregate function of the aggregate function list, AL, by grouping tuples of the given relation by the grouping attributes and applying the aggregate operator on the result of the grouping. Grouping operation DnoGSUM(Salary), COUNT(Eno)(Employees) is shown below:
Employees Result relation
Typical aggregate functions are COUNT, SUM, AVG, MAX and MIN.
Eno Ename Salary Dno
1001 Niranjan 10000 10
1002 Praveen 5000 10
1003 Prashanth 6000 20
1004 Sugumar 8000 10
1005 Majunath 5000 10
1006 Dinesh 25000 30
1007 Harish 15000 20
Dno SUM(Salary) COUNT(Eno)
10 28000 4
20 21000 2
30 25000 1
Grouping column Aggregate Columns
Aggregate Values