44
2. Relational Data Model Topics: 1. Relational Data Model Structure 2. Relational Data Model Operations

Database Systems - Relational Data Model (Chapter 2)

Embed Size (px)

Citation preview

Page 1: Database Systems - Relational Data Model (Chapter 2)

2. Relational Data Model

Topics:

1. Relational Data Model Structure

2. Relational Data Model Operations

Page 2: Database Systems - Relational Data Model (Chapter 2)

2.1 Relational Data Model

Structure

Page 3: Database Systems - Relational Data Model (Chapter 2)

2.1 Relational Data Model Structure

Topics:

Why relational data model

Brief history of relational data model

Basic concepts

Terminology

Schemas

Properties of relations

Relation keys

Page 4: Database Systems - Relational Data Model (Chapter 2)

2.1 Relational Data Model Structure Why Relational Data Model?

We will devote more time to Relational Data Model compared to any other data model for the following reasons:

The model is easy to understand.

It has simple concepts: tables, columns, rows and constraints.

It has mathematical foundation.

Many database management system products are based on the relational data model.

Page 5: Database Systems - Relational Data Model (Chapter 2)

2.1 Relational Data Model Structure Brief History of Relational Data Model

Introduced by E.F. Codd of IBM in 1970.

A prototype RDBMS called System R was developed in late 1970s by IBM.

SQL was developed by IBM as a language for RDBMSs

Commercial RDBMSs DB2 and SQL/DS were developed by IBM corporation and Oracle by Oracle corporation.

INGRES was developed at Univ. of California at Berkeley and later made available as commercial RDBMS. Used language QUEL.

Later many commercial RDBMS were developed in 1980s.

Page 6: Database Systems - Relational Data Model (Chapter 2)

2.1 Relational Data Model Structure Concepts

Basic concepts of the relational data model:

A database consists of one or more relations.

A relation is described by a name, names for one or more attributes and consists of zero or more tuples.

A tuple consists of values for each attribute of the relation.

An attribute takes values from a domain and there exists one value for each attribute in a tuple.

A domain consists of all allowed atomic values for one or more attributes.

Page 7: Database Systems - Relational Data Model (Chapter 2)

2.1 Relational Data Model - Structure Some more Concepts

• Degree of a relation is the number of attributes of the relation.

• Cardinality of a relation is the number of tuples of the relation.

Page 8: Database Systems - Relational Data Model (Chapter 2)

2.1 Relational Data Model Structure Relation

A relation is analogous to a table of rows and columns as shown below:

Employee Name Designation Department Name

Niranjan Software Engineer Development

Praveen Software Engineer Development

Dinesh Director Marketing

Harish Manager Administration

Attributes

Tu

ple

s

12 values in 4 tuples and 3 attributes

Relation: Employees

Page 9: Database Systems - Relational Data Model (Chapter 2)

2.1 Relational Data Model Structure Alternative Terminology

Terms relation, attribute and tuple are from mathematics. The terms used by developers and many RDBMSs are table, row and column:

Employee Name Designation Department Name

Niranjan Software Engineer Development

Praveen Software Engineer Development

Dinesh Director Marketing

Harish Manager Administration

Column names

Ro

ws

Columns

Table: EMPLOYEES

Page 10: Database Systems - Relational Data Model (Chapter 2)

2.1 Relational Data Model Structure One More Alternative Terminology

Term Record is used instead of tuple/row and term Field is used instead of attribute/column.

Employee Name Designation Department Name

Niranjan Software Engineer Development

Praveen Software Engineer Development

Dinesh Director Marketing

Harish Manager Administration

Field names

Re

co

rds

Fields

Table: EMPLOYEES

Page 11: Database Systems - Relational Data Model (Chapter 2)

2.1 Relational Data Model Structure Schema

Schema is description of data.

Relation schema is a description of a relation using its

attributes and domains. Database schema is the collection of relation schemas

and schemas of relationships between the relations

Page 12: Database Systems - Relational Data Model (Chapter 2)

2.1 Relational Data Model Structure Properties of Relations

Every relation has the following properties: A distinct name.

Each attribute has a distinct name.

An atomic value for an attribute in a tuple.

All values for an attribute are from the same domain.

Each tuple is distinct(means no duplicate tuples).

There is no significance to the order of attributes.

There is no significance to the order of tuples.

Page 13: Database Systems - Relational Data Model (Chapter 2)

2.1 Relational Data Model Structure Distinct Relation Names

Each relation or table in a database should have distinct name. For example, three relations with names as specified below in a database are valid EMPLOYEES DEPARTMENTS PROJECTS The following names for three relations in a database are invalid: EMPLOYEES DEPARTMENTS DEPARTMENTS Name DEPARTMENTS is a duplicate name.

Page 14: Database Systems - Relational Data Model (Chapter 2)

2.1 Relational Data Model Structure Distinct Attribute Names

Each relation or table in a database should have distinct names for its attributes or columns. For example, relation EMPLOYEES with attribute names as specified below is valid: Employee Name, Designation, Department Name

It is invalid to specify duplicate names for attributes for a

relation as shown below:

Employee Name, Designation, Designation,

Department Name

Attribute Designation is duplicate.

Page 15: Database Systems - Relational Data Model (Chapter 2)

2.1 Relational Data Model Structure Atomic Values

Each value of each attribute in a relation is atomic. It means the value can not be divided. A tuple can not have multiple values for an attribute. The following table shows intention of storing two designations Manager and Director for employee Harish. This is not possible:

Employee Name Designation Department Name

Niranjan Software Engineer Development

Praveen Software Engineer Development

Dinesh Director Marketing

Jimson K John Manager Director Development

Page 16: Database Systems - Relational Data Model (Chapter 2)

2.1 Relational Data Model Structure Values From The Same Domain

Each value of an attribute can be from only one Domain All values of a domain are expected to be of same data type. For example, if an attribute is used for storing age of employees as an integer number such as 25, you can neither use its word format “twenty five” nor non-integer number such as 25.5 as these two values are not integers.

Page 17: Database Systems - Relational Data Model (Chapter 2)

2.1 Relational Data Model Structure No Duplicate Tuples

Employee Name Designation Department Name

Niranjan Software Engineer Development

Praveen Software Engineer Development

Dinesh Director Marketing

Praveen Software Engineer Development

There are two tuples for employee “Praveen”.

Each relation in a database is expected to have unique tuples. No two tuples are identical in values for all attributes . Relation violating the property:

Page 18: Database Systems - Relational Data Model (Chapter 2)

2.1 Relational Data Model Structure No Significance to Order of Attributes

There is no significance to the order of attributes in a relation. Database language does not depend on the order of attributes. Following three alternatives for a relation are same: Employees(Employee Name, Designation, Department Name)

Employees(Designation, Employee Name, Department Name)

Employees(Department Name, Employee Name, Designation)

Page 19: Database Systems - Relational Data Model (Chapter 2)

2.1 Relational Data Model Structure No Significance to Order of Tuples

Order of tuples does not affect the results of operations on the tables. Employee Name Designation Department Name

Niranjan Software Engineer Development

Praveen Software Engineer Development

Dinesh Director Marketing

Harish Director Administration

Employee Name Designation Department Name

Harish Director Administration

Niranjan Software Engineer Development

Dinesh Director Marketing

Praveen Software Engineer Development

For example, the tables are equivalent.

Page 20: Database Systems - Relational Data Model (Chapter 2)

2.1 Relational Data Model Structure Relation Keys

A property of relation is it has distinct tuples. I.e., no two tuples have same values for each attribute. To make tuples distinct, values of one or more attributes together should be different in each tuple. These attributes that uniquely identify a tuple is called key. There are various kinds of keys.

Page 21: Database Systems - Relational Data Model (Chapter 2)

2.1 Relational Data Model Structure Relation Keys - Super Key

A Super Key is a set of one or more attributes that uniquely identifies a tuple within a relation.

Super Keys

Eno Ename Designation PAN Dno

1001 Niranjan Software Engineer NIR01ABC01 10

1002 Praveen Trainee PRA02ABC02 10

1003 Prashanth Admin PRA03ABC03 20

1004 Srilatha Software Engineer SRI04ABC04 10

1005 Sagar Manager SAG05ABC05 10

There can be multiple super keys for a relation.

A key that has more than one attribute is called composite key.

Page 22: Database Systems - Relational Data Model (Chapter 2)

2.1 Relational Data Model Structure Relation Keys - Candidate Key

A Candidate Key is a super key such that no proper subset is a super key.

Eno Ename Designation PAN Dno

1001 Niranjan Software Engineer NIR01ABC01 10

1002 Praveen Trainee PRA02ABC02 10

1003 Prashanth Admin PRA03ABC03 20

1004 Srilatha Software Engineer SRI04ABC04 10

1005 Sagar Manager SAG05ABC05 10

Candidate Keys

A candidate key has minimum number of attributes required to uniquely identify tuples.

Page 23: Database Systems - Relational Data Model (Chapter 2)

2.1 Relational Data Model Structure Relation Keys - Primary Key

A Primary Key is a candidate key that is chosen by the database designer as the principal means of identifying tuples uniquely within a relation.

Eno Ename Designation PAN Dno

1001 Niranjan Software Engineer NIR01ABC01 10

1002 Praveen Trainee PRA02ABC02 10

1003 Prashanth Admin PRA03ABC03 20

1004 Srilatha Software Engineer SRI04ABC04 10

1005 Sagar Manager SAG05ABC05 10

Primary Key (Eno)

A relation, by definition, will have only one primary key.

Candidate keys other than primary key are called alternate keys.

Page 24: Database Systems - Relational Data Model (Chapter 2)

2.1 Relational Data Model Structure Relation Keys - Foreign Key

A Foreign key is a set of one or more attributes within one relation that matches a candidate key of another relation or possibly the same relation.

Relation that has a foreign key is called referencing relation where as the relation that contains the candidate key referenced relation.

subj_id subj_title book_id book_title subj_id

book_id ch_no ch_title

SUBJECTS BOOKS

CHAPTERS

Primary keys Foreign keys

Page 25: Database Systems - Relational Data Model (Chapter 2)

2.2 Relational Data Model

Operations

Page 26: Database Systems - Relational Data Model (Chapter 2)

2.2 Relational Data Model Operations Description Language

We will use Relational Algebra to describe operations on the relation data model. Relational algebra was introduced by E.F. Codd in 1971. Other languages that can be used to specify the operations is Relational Calculus (domain relational calculus and tuple reational calculus) but we will not cover this.

Page 27: Database Systems - Relational Data Model (Chapter 2)

2.2 Relational Data Model Operations Relational Algebra

The Relational Algebra is a set language. All tuples from one or more relations are operated using one statement of the language without using any looping constructs. Five fundamental operations : Selection, Projection, Cartesian Product, Union and Set difference. Other operations: Join, Intersection, and Division and a few variations of joins. Selection and Projection are unary operations; operate on one relation. Others are binary operations, i.e., operate on two relations.

Page 28: Database Systems - Relational Data Model (Chapter 2)

2.2 Relational Data Model Operations Relational Algebra Operations

The Relational Algebra operations are as follows:

SNO Operation Notation

1 Selection σpredicate

(R)

2 Projection Πa1,...,an

(R)

3 Union R U S

4 Set difference R - S

5 Intersection R ∩ S

6 Division R / S

7 Aggregate GAL

8 Grouping GAGAL

SNO Operation

9 Cartesian Product

10 Theta join

11 Equi oin

12 Natural join

13 Outer join

Page 29: Database Systems - Relational Data Model (Chapter 2)

2.2 Relational Data Model Operations Relational Algebra – Selection

The Selection operation produces a relation that contains only those tuples that satisfy the specified predicate. Predicate is a condition containing columns of the relation and constants that returns a boolean value (true or false). The selection operation σ

salary > 5000(Employees) gives only those tupes

that satisfies the specified predicate which is salary > 5000. Eno Ename Designation Salary Dno

1001 Niranjan Software Engineer

10000 10

1002 Praveen Trainee 5000 10

1003 Prashanth Admin 6000 20

1004 Sugumar Software Engineer

8000 10

1005 Majunath Software Engineer

5000 10

Table: Employees

Eno Ename Designation Salary Dno

1001 Niranjan Software Engineer

10000 10

1003 Prashanth Admin 6000 20

1004 Sugumar Software Engineer

8000 10

Result of selection

Page 30: Database Systems - Relational Data Model (Chapter 2)

2.2 Relational Data Model Operations Relational Algebra – Projection

The Projection operation produces a relation that contains vertical subset of given relation of specified attributes.

Given the following relation Employee,

operation ΠEno, Ename, Salary

(Employees) gives a relation as shown below:

Eno Ename Designation Salary Dno

1001 Niranjan Software Engineer 10000 10

1002 Praveen Trainee 5000 10

1003 Prashanth Admin 6000 20

1004 Sugumar Software Engineer 8000 10

1005 Manjunath Trainee 5000 10

Eno Ename Salary

1001 Niranjan 10000

1002 Praveen 5000

1003 Prashanth 6000

1004 Sugumar 8000

1005 Manjunath 5000

Employees Result relation

Page 31: Database Systems - Relational Data Model (Chapter 2)

2.2 Relational Data Model Operations Relational Algebra – Union

Given two relations, the Union operation produces a relation that contains all tuples of the two relations.

Given the following relations Employees and Contract_Employees,

operation Employees U Contract_Employees gives a relation as shown

below:

Eno Ename Dno

1001 Niranjan 10

1002 Praveen 10

1003 Prashanth 20

1004 Sugumar 10

1005 Manjunath 10

Eno Ename Dno

5001 Ravi 10

5002 Akshay 10

Employees Contract_Employees Eno Ename Dno

1001 Niranjan 10

1002 Praveen 10

1003 Prashanth 20

1004 Sugumar 10

1005 Manjunath 10

5001 Ravi 10

5002 Akshay 10

Result relation

Notes: (1)The two relations must be union-compatible (2) The result will not contain any duplicate tuples.

= U

Page 32: Database Systems - Relational Data Model (Chapter 2)

2.2 Relational Data Model Operations Relational Algebra – Set Difference

Given two relations, the Set Difference operation produces a relation that contains all tuples of the first relation that are not in the second relation.

Given the following relations Employees and Trainee_Employees,

operation Employees - Trainee_Employees gives a relation as shown

below:

Eno Ename Dno

1001 Niranjan 10

1002 Praveen 10

1003 Prashanth 20

1004 Sugumar 10

1005 Manjunath 10

Eno Ename Dno

5001 Ravi 10

5002 Akshay 10

1002 Praveen 10

1005 Manjunath 10

Employees Trainee_Employees

Eno Ename Dno

1001 Niranjan 10

1003 Prashanth 20

1004 Sugumar 10

Result Relation

= -

Note that the two relations must be union-compatible

Page 33: Database Systems - Relational Data Model (Chapter 2)

2.2 Relational Data Model Operations Relational Algebra – Intersection

Given two relations, the Intersection operation produces a relation that contains tuples that are common to both the relations.

Given the following relations Employees and Trainee_Employees,

operation Employees ∩ Trainee_Employees gives a relation as shown

below:

Eno Ename Dno

1001 Niranjan 10

1002 Praveen 10

1003 Prashanth 20

1004 Sugumar 10

1005 Manjunath 10

Eno Ename Dno

5001 Ravi 10

5002 Akshay 10

1002 Praveen 10

1005 Manjunath 10

Employees Trainee_Employees

Eno Ename Dno

1002 Praveen 10

1005 Manjunath 10

∩ =

Result relation

Note that the two relations must be union-compatible

Page 34: Database Systems - Relational Data Model (Chapter 2)

Relational Data Model Operations Relational Algebra – Cartesian Product

Given two relations, the Cartesian Product operation produces a relation that is a concatenation of every tuple of the first relation with every tuple of the second relation.

Cartesian Product of relations Employees and Departments, written as

Employees X Departments, is shown below:

Eno Ename Dno

1001 Niranjan 10

1003 Prashanth 20

Dno Dname

10 Devlopment

20 Admin

30 Marketing

40 Research

Employees

Departments Eno Ename Dno Dno Dname

1001 Niranjan 10 10 Development

1001 Niranjan 10 20 Admin

1001 Niranjan 10 30 Marketing

1001 Niranjan 10 40 Research

1003 Prashanth 10 10 Development

1003 Prashanth 10 20 Admin

1003 Prashanth 10 30 Marketing

1004 Prashanth 10 40 Research

X

Result

=

(2 x 3) (4 x 2)

(8 x 5)

Page 35: Database Systems - Relational Data Model (Chapter 2)

2.2 Relational Data Model Operations Relational Algebra – Theta Join

The Theta Join(θ join)operation produces a relation that contains tuples satisfying a predicate F from cartesian product of given two relations.

Theta join of relations Employees and Departments, written as

Employees F Departments, is shown below:

Eno Ename Dno

1001 Niranjan 10

1003 Prashanth 20

Dno Dname

10 Devlopment

20 Admin

30 Marketing

Employees Departments Eno Ename Dno Dno Dname

1001 Niranjan 10 20 Admin

1001 Niranjan 10 30 Marketing

1003 Prashanth 20 30 Marketing

F

Result relation

=

F is (Employees.Dno < Departments.Dno)

R F S = σ

FR X S

Page 36: Database Systems - Relational Data Model (Chapter 2)

2.2 Relational Data Model Operations Relational Algebra – Equijoin

The Equijoin operation produces a relation that contains tuples satisfying a predicate F that contains only equal operators(=) from cartesian product of given two relations.

Equijoin of relations Employees and Departments, written as

Employees F Departments, is shown below:

Eno Ename Dno

1001 Niranjan 10

1003 Prashanth 20

Dno Dname

10 Devlopment

20 Admin

30 Marketing

Employees Departments Eno Ename Dno Dno Dname

1001 Niranjan 10 10 Marketing

1003 Prashanth 20 20 Admin F

Result relation

=

F is (Employees.Dno = Departments.Dno)

Page 37: Database Systems - Relational Data Model (Chapter 2)

2.2 Relational Data Model Operations Relational Algebra – Natural Join

The Natural Join operation is an Equijoin of given two relations over all common attributes with result containing only one occurrence of each common attribute.

Natural join of relations Employees and Departments, written as

Employees Departments, is shown below:

Eno Ename Dno

1001 Niranjan 10

1003 Prashanth 20

Dno Dname

10 Devlopment

20 Admin

30 Marketing

Employees Departments

Eno Ename Dno Dname

1001 Niranjan 10 Marketing

1003 Prashanth 20 Admin

Result relation

=

Common Attributes

Page 38: Database Systems - Relational Data Model (Chapter 2)

2.2 Relational Data Model Operations Relational Algebra – Left Outer Join The Left Outer Join operation of given two relations is a join in which tuples from the left table of the join that do not have matching tuples in the right table for the specified predicate are also included in the result relation but with null values for the attributes of the right table.

Left Outer Join of relations Courses and Students, written as

Courses P Students is shown below:

Cno Cname

C01 Java

C02 C++

C03 DBMS

C04 Android

C05 jQuery

Sno Sname Cno

101 Akhil C01

102 Akhil C03

103 Prithivi C03

104 Chetan C01

Courses Students Cno Cname Sno Sname Cno

C01 Java 101 Akhil C01

C01 Java 104 Chetan C01

C02 C++ NULL NULL NULL

C03 DBMS 102 Akhil C03

C03 DBMS 103 Prithvi C03

C04 Android NULL NULL NULL

C05 jQuery NULL NULL NULL

P

Result relation

=

Predicate: (Courses.Cno = Students.Cno)

Page 39: Database Systems - Relational Data Model (Chapter 2)

2.2 Relational Data Model Operations Relational Algebra – Semijoin

The (Left) Semijoin operation of given two relations is a join in which each tuple from the left table that has at least one matching tuple in the right table is included in the result relation.

Semijoin of relations Courses and Students, written as Courses I>P

Students is shown below:

Cno Cname

C01 Java

C02 C++

C03 DBMS

C04 Android

C05 jQuery

Sno Sname Cno

101 Akhil C01

102 Akhil C03

103 Prithivi C03

104 Chetan C01

Courses Students

Cno Cname C01 Java

C03 DBMS P

Result relation

=

Predicate: (Courses.Cno = Students.Cno) Note: Columns from right table will not be in the result.

Page 40: Database Systems - Relational Data Model (Chapter 2)

2.2 Relational Data Model Operations Relational Algebra – Anti-Semijoin

The (Left) Anti-Semijoin operation of given two relations is a join in which each tuple from the left table that does not have a matching tuple in the right table is included in the result relation.

Anti-Semijoin of relations Courses and Students is shown below:

Cno Cname

C01 Java

C02 C++

C03 DBMS

C04 Android

C05 jQuery

Sno Sname Cno

101 Akhil C01

102 Akhil C03

103 Prithivi C03

104 Chetan C01

Courses Students

Cno Cname C02 C++

C04 Android

C05 jQuery P

Result relation

=

Predicate: (Courses.Cno = Students.Cno) Note: Columns from right table will not be in the result.

Anti

Page 41: Database Systems - Relational Data Model (Chapter 2)

2.2 Relational Data Model Operations Relational Algebra – Division

The Division operation of two relations produces a relation that consist of set of tuples from the first relation for a set of attributes that match the combination of every tuple in the second relation where the set of attributes exists in the first table but not in the second table.

Division of relations Courses and Students is shown below:

Cno

C01

C03

Students

Sno Sname 101 Akhil

105 Pranoy

Result relation

= ÷

Sno Sname Cno

101 Akhil C01 101 Akhil C03 102 Prithivi C04

103 Chetan C01

104 Pranoy C01

104 Pranoy C03

102 Prithvi C02

Speical-Courses

This answers typical question of who took courses C01 and C03 ?

Page 42: Database Systems - Relational Data Model (Chapter 2)

2.2 Relational Data Model Operations Relational Algebra – Division

Another example for the division operation. Suppose you have a table of prospective candidates whom you want to recruit provided they all have the required skills present in another table. You can use the Division operator to get the result:

Skill

Java

C++

JSP

Candidates

Candidate Name

Akhil

Sparsh

Result relation

= ÷

Candidate Name Skill

Chetan Objective-C

Chetan C++

Chetan Java

Sparsh C++

Sparsh JSP

Sparsh Java

Sparsh HTML

Pranoy Java

Pranoy C++

Akhil PHP

Akhil C++

Akhil Objective-C

Akhil Java

Akhil JSP

Skills

Note that the table of Skills could be derived as a result of another operator on some other tables.

Page 43: Database Systems - Relational Data Model (Chapter 2)

2.2 Relational Data Model Operations Relational Algebra – Aggregate

Aggregate operator, GAL(R), produces a relation by applying an aggregate function list to a given relation. An aggregate function returns one value that is computed from a collection of values. An aggregate function list, AL, consists of pairs of an aggregate function and an attribute. Aggregate operation GSUM(Salary), COUNT(Eno)(Employees) is shown below:

Employees

Result relation

Typical aggregate functions are COUNT, SUM, AVG, MAX and MIN.

SUM(Salary) COUNT(Eno)

74000 7

Eno Ename Salary Dno

1001 Niranjan 10000 10

1002 Praveen 5000 10

1003 Prashanth 6000 20

1004 Sugumar 8000 10

1005 Majunath 5000 10

1006 Dinesh 25000 30

1007 Harish 15000 20

Page 44: Database Systems - Relational Data Model (Chapter 2)

2.2 Relational Data Model Operations Relational Algebra – Grouping

Grouping operator, GAGAL(R), produces a relation that contains the grouping attributes, GA, and result for each aggregate function of the aggregate function list, AL, by grouping tuples of the given relation by the grouping attributes and applying the aggregate operator on the result of the grouping. Grouping operation DnoGSUM(Salary), COUNT(Eno)(Employees) is shown below:

Employees Result relation

Typical aggregate functions are COUNT, SUM, AVG, MAX and MIN.

Eno Ename Salary Dno

1001 Niranjan 10000 10

1002 Praveen 5000 10

1003 Prashanth 6000 20

1004 Sugumar 8000 10

1005 Majunath 5000 10

1006 Dinesh 25000 30

1007 Harish 15000 20

Dno SUM(Salary) COUNT(Eno)

10 28000 4

20 21000 2

30 25000 1

Grouping column Aggregate Columns

Aggregate Values