75
An introduction to relational database management system Fabrice Rossi CEREMADE Université Paris Dauphine 2019

An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

An introduction to relational databasemanagement system

Fabrice Rossi

CEREMADEUniversité Paris Dauphine

2019

Page 2: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Database Management Systems

Database (DB)An organized collection of data

Database Management System (DBMS)A software use to manage databases, namelyI creating a databaseI maintaining a databaseI interacting with a database

2

Page 3: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

DBMS versus Files

File-Based data managementI data in CSV files (or other format)I management with e.g. Python + Pandas (or R + tidyverse)

Advantages of DBMSI meta data managementI integrity constraintsI security (access control)I concurrency controlI etc.

3

Page 4: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

DBMS versus Files

File-Based data managementI data in CSV files (or other format)I management with e.g. Python + Pandas (or R + tidyverse)

Advantages of DBMSI meta data managementI integrity constraintsI security (access control)I concurrency controlI etc.

3

Page 5: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

DBMS components

DBMS architectureI storage manager

I handles saving data and meta data on the storage systemI manages part of the concurrency aspects

I query processorI core part of the DBMSI handles computation on the database (e.g. search, insertion,

deletion, modification, aggregation)I optimize queries for efficiency

I high level toolsI meta data managementI connection and security manager

4

Page 6: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

DBMS components

DBMS langaguesI data definition language (DDL)

I used to specify a data modelI data typesI links between data

I meta data managementI data manipulation language (DML)

I used to query a databaseI specifies computation on a database

5

Page 7: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Types of DBMS

Data model categorizationI old approaches: hierarchical DBMS and network DBMSI current standard: relational DBMS

I based on a relational data modelI typical based on SQL (Structured Query Language, sequel)

I niche approach: object-oriented DBMSI large scale or unstructured data: NoSQL

I not-only SQL (or more specifically not relational)I document storesI graph oriented databasesI column oriented databases

6

Page 8: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

In this course

An introduction to Relational DBMS

What’s missing?I A lot!I DBMS are a very rich and complex subject

I numerous theoretical aspectsI getting under the hood is yet another subjectI practical aspects are also very complex

7

Page 9: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

In this course

An introduction to Relational DBMS

What’s missing?I A lot!I DBMS are a very rich and complex subject

I numerous theoretical aspectsI getting under the hood is yet another subjectI practical aspects are also very complex

7

Page 10: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Outline

Introduction

Conceptual Data Modeling

The Relational Model

SQL

8

Page 11: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Outline

Introduction

Conceptual Data Modeling

The Relational Model

SQL

9

Page 12: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Data Management

Specifying the dataI ideally the first step of a data oriented projectI important questions

I what are the objects under study?I how are they described?I how are they related?

Two steps process

1. conceptual/formal model2. implementation in a meta data oriented language

10

Page 13: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Entity Relationship Model

Proposed by Peter Chen in 1976

ConceptsI Entity: uniquely identified object under study (e.g. a person)I Relationship: a way to relate entities (e.g. a has access to b)I Attribute: a property of an entity or of a relationship. An attribute

has a domain (the set of values it can take)I an ER model describes types, e.g. entity type, not values

11

Page 14: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Example

Loan application data setI https://relational.fit.cvut.cz/dataset/Financial

I 8 tables includingI client tableI account tableI credit card tableI loan tableI etc.

12

Page 15: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Example

(part of the) Client tableclient_id gender birth_date

1 F 1970-12-132 M 1945-02-043 F 1940-10-094 M 1956-12-015 F 1960-07-03

ER modelI entity type: clientI attributes

I gender with domain F andM

I birth_date with a datadomain

I client_id

key attribute

13

Page 16: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Example

(part of the) Client tableclient_id gender birth_date

1 F 1970-12-132 M 1945-02-043 F 1940-10-094 M 1956-12-015 F 1960-07-03

ER modelI entity type: clientI attributes

I gender with domain F andM

I birth_date with a datadomain

I client_id

key attribute

Client

gender birth_date client_id

13

Page 17: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Example

(part of the) Client tableclient_id gender birth_date

1 F 1970-12-132 M 1945-02-043 F 1940-10-094 M 1956-12-015 F 1960-07-03

ER modelI entity type: clientI attributes

I gender with domain F andM

I birth_date with a datadomain

I client_id key attribute

Client

gender birth_date client_id

13

Page 18: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Graphical representation

Entity typeEntity types are represented by rectangles link to attributes

Attribute type

I an ellipse per typeI key attribute type

I a unique identifier of thecorresponding entity

I underlined in therepresentation

I composite attribute typeI can be decomposed into

sub-attributesI linked ellipses

I derived attribute typeI can be computed from

another one (e.g. agefrom birth date)

I dashed borderI multi-valued attribute type

I several values areauthorized

I double border

14

Page 19: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Example

Person

gender birth_date age client_id

email

name

First name

Last name

15

Page 20: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Relationship

PrincipleI a relationship represents an association between at least two

entitiesI it can have attributesI it is characterized by cardinalities

I minimum and maximum number of relationships to which a givenentity can participate

I asymmetricI a relationship type is graphical represented by a rhombus

16

Page 21: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Example

Loan application data setI client entities and account entitiesI relationships

I a client can be the owner of an accountI a client can be allowed to use an account

I cardinalitiesI owner:

I each account has exactly one ownerI each client can own at most one account (in this database)

I user:I each account may have some usersI each client can be the user of some accounts

17

Page 22: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Example

Clientclient_id

Owns

Account account_id

0-1 1-1

Uses

0-n 0-m

18

Page 23: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Outline

Introduction

Conceptual Data Modeling

The Relational Model

SQL

19

Page 24: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Relational Model

HistoryI invented by Edgar F. Codd in 1969-1970I based on a mathematical model, the relational algebraI associated relational calculus

Main conceptsI a domain: a setI a relation: a (finite) subset of

∏ni=1(Di ∪ {NULL}) where each Di

is a domain and NULL a special value (for missing values)I a tuple: an element of a relationI a database: a collection of relations

20

Page 25: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Examples

ClientsI Domains:

I N+: positive integersI G = {F ,M}: gendersI D: dates

I A client relation:I a subset of N+ ×G × DI a set of tuples such as

(1,F , 1970−12−13)

AccountsI Domains:

I N+: positive integersI D: dates

I An account relation:I a subset of N+ × N+ × DI a set of tuples such as

(1, 8, 1995−03−24)

21

Page 26: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Relation type/variable

Specifying relationsI somewhat imprecise vocabularyI a relation (value) r ∈

∏ni=1(Di ∪ {NULL})

I a relation type/variable (relvar)I a named specification of the relationI domains and names: column types

NotationI a relation type/variable is denoted R(A1, . . . ,An):

I R: name of the relation typeI Ak : column type names

I e.g. client(client_id, gender, birth_date)I and account(account_id, district_id, date)

22

Page 27: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

ER and relational model

CorrespondenceI Entity type: relation type/variable, relvarI Entity: tupleI Attribute type: column type (frequently called attribute type)I Attribute: value in a tuple

More about ER→ relational later

23

Page 28: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Keys

Superkeys and keysI a relation is a set⇒ each tuple is uniqueI a superkey

I any subset of the attribute types such that no tuples of the relationshare exactly the same values for these attribute types

I default superkey: all the attribute types!I a key: a minimal superkey (removing attribute types removes it

superkey status)

France data setI departement(region_id,departement_id,name)I superkeys

I (region_id,departement_id,name)I (departement_id,name)I (departement_id): a key!I (name): another key!

24

Page 29: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Primary and foreign keys

Primary keyI several keys: candidate keysI one of the is the primary keyI other are alternative keys

Foreign keysI a set of attribute types FK in a relation type R1 is a foreign key of

R1 if1. a candidate key in a relation R2 has exactly the same domains as

the ones of the attribute types in FK2. values on FK in a tuple in r1 are either NULL or occur in a tuple in r2

I in simple terms: a foreign key links R1 to R2

25

Page 30: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Example

RelvarI client(client_id, gender, birth_date)I account(account_id, district_id, date)I disp(disp_id, client_id, account_id, type) with

type∈ {OWNER,DISPONENT}

Many keysI bold and underlined: primary keysI underlined: foreign keysI district_id is also a foreign key but for another relationI in this database, the relation between a client and an account is

unique and thus (client_id, account_id) is candidate key

26

Page 31: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Integrity Constraints

Relational constraintsWhen implemented the relational model must enforce someconstraintsI Domain integrity: the values in a tuple are unique values from the

associated domain or NULL (if allowed, i.e. if the attribute type isnullable)

I Key constraint: every relation has a primary keyI Entity integrity: values in the primary key cannot be NULLI Referential integrity: foreign keys can be either NULL or must refer

to an existing tuple

27

Page 32: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

ER to relational

MappingI ER models are abstractI must be mapped to a concrete database modelI the relational model is close enough to ER to enable a simple

mapping strategy

PrinciplesI Entity type→ relation (type)I Simple attribute type→ column typeI Key attribute type→ primary or alternative keyI All the rest (relationship types and complex attribute types)→

relation (type) and keys

28

Page 33: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Mapping Entity Types

With simple attribute typesI direct mappingI an entity type is mapped to a relation typeI each attribute type corresponds to a column typeI a key is mapped to a primary or alternative keyI composite attribute types are mapped to a set of column types

ER model

Client

gender birth_date client_id

Relation typeClient(gender, birth_date,client_id)

29

Page 34: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Multi-valued attribute types

MethodI a multi-valued attribute type cannot be mapped to a column type

because of the domain integrity constraintI representation via a relation type

I a relation type per multi-valued attribute typeI a column type for the attributeI a foreign key to map back the attribute to the entity

ER model

Client

gender email client_id

Relation typesI Client(gender, client_id)I ClientEmail(email, client_id)

30

Page 35: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Mapping relationship types

PrinciplesI the mapping depends on the cardinalities of the relationship typeI 0-1 cardinalities on a least one side foreign keyI other cases: foreign keys or specific relational type

One to one (1:1) relationship typeI each side is 0-1 or 1-1I mapped to a foreign key:

I in the 1-1 relation type if it exists with a non nullable column typeI nullable if both side are 0-1

31

Page 36: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Example

Clientclient_id Owns Account account_id0-1 1-1

Relation typesI Client(client_id)I Account(account_id,client_id)I client_id is non nullable in

Account

Inferior alternativeI Client(client_id,account_id)I account_id is nullable in ClientI Account(account_id)

CardinalitiesI Owns↔ Account is enforcedI but an account can be shared by

several clientsI Client↔ Owns is 0−m

CardinalitiesI Client↔ Owns is enforcedI but an account can be assigned

to any number of clientsI Owns↔ Account is 0−m

32

Page 37: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Example

Clientclient_id Owns Account account_id0-1 1-1

Relation typesI Client(client_id)I Account(account_id,client_id)I client_id is non nullable in

Account

Inferior alternativeI Client(client_id,account_id)I account_id is nullable in ClientI Account(account_id)

CardinalitiesI Owns↔ Account is enforcedI but an account can be shared by

several clientsI Client↔ Owns is 0−m

CardinalitiesI Client↔ Owns is enforcedI but an account can be assigned

to any number of clientsI Owns↔ Account is 0−m

32

Page 38: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Higher cardinalities

One to N (1:N) relationship typeI mapped to a foreign key in the 1 side entity relation typeI nullable if this side is 0− 1, non nullable if it is 1− 1I the minimal cardinality on the N side cannot be enforced

Example

Region

region_id

Contains Department

department_id

1-N 1-1

I Department(departement_id,region_id)I Region(region_id)

33

Page 39: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Higher cardinalities

M to N (M:N) relationship typeI mapped to a relation typeI two foreign keys, one for each tableI the primary key of the relation type is the combination of the

foreign keysI minimal cardinalities cannot be enforced

34

Page 40: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Example

Clientclient_id Uses Account account_id0-n 1-m

I Client(client_id)I Account(account_id)I Uses(client_id, account_id) both non nullable

35

Page 41: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Outline

Introduction

Conceptual Data Modeling

The Relational Model

SQL

36

Page 42: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

SQL

What is SQL?I SQL is relational data management languageI Structured Query Language pronounced sequelI developed initially by IBM as an "implementation" of the relational

modelI SQL is a standard since 1986 (numerous versions)

ImplementationsI SQL is “supported” by all relational database management

systemsI many open source solutions (MySQL/MariaDB, PostgreSQL,

SQLite, etc.)I but many variations in the support level (portability is not

guaranteed)

37

Page 43: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

SQL Components

Multiple aspectsI Data Definition/Description Language

I relational model descriptionI domain definition

I Data Manipulation Language: insertion, suppression andmodification

I Data Query LanguageI read only manipulationI selection, filtering, grouping, etc.

I Data Control LanguageI access control to the databasesI users, roles, permissions, etc.

38

Page 44: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

SQL Data Description Language

RelationI a relation (type) is created in SQL by

CREATE TABLE relation_name (column_name domain, ...);

I SQL supports numerous default domains (implementationdependent!):I INT: integerI DATETIME, DATE and TIME: date and timeI BOOLEAN: true or falseI CHAR(n) and VARCHAR(n): string with maximum size n

I constraints can be specified after the domain:I NOT NULL: non nullableI PRIMARY KEY: self explanatoryI UNIQUE: candidate keyI DEFAULT: used to specify a default value

39

Page 45: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Example

Relational model to SQLClient(client_id, first_name, last_name, birth_date)CREATE TABLE Client (

client_id INT PRIMARY KEY,first_name VARCHAR(30) NOT NULL,last_name VARCHAR(30) NOT NULL,birth_date DATETIME NOT NULL

);

40

Page 46: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Domains

SQL domainsI domains can be created in SQLI typical form

CREATE DOMAIN Gender AS VARCHAR(6)CHECK (VALUE IN ('Female','Male'));

I unsupported in many implementations (e.g. MySQL)

Constraints based versionI constraints can be added to the table creationI CHECK can be used to implemented domainsI less elegant (no centralized definition)

41

Page 47: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Example

Relational model to SQLClient(client_id, gender, first_name, last_name, birth_date)CREATE TABLE Client (

client_id INT PRIMARY KEY,gender AS VARCHAR(6),CONSTRAINT gc CHECK(gender IN ('Female','Male')),first_name VARCHAR(30) NOT NULL,last_name VARCHAR(30) NOT NULL,birth_date DATETIME NOT NULL

);

42

Page 48: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Keys

Primary keysI primary keys are not mandatory in SQLI but they should be specified!I UNIQUE is useful as a constraintI a primary key can be made with several columns using

PRIMARY KEY (COL1, COL2, ...)

in the table creation

Foreign keysI declared as FOREIGN KEY (column) during table creationI together with a REFERENCES table(column)I a foreign key can be a set of columns

43

Page 49: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Example

Relational model to SQLI Client(client_id)I Account(account_id,client_id)

CREATE TABLE Client(client_id int PRIMARY KEY);CREATE TABLE Account(account_id int PRIMARY KEY,

client_id int NOT NULL,FOREIGN KEY(client_id) REFERENCES Client(client_id));

44

Page 50: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Referential integrity

Foreign keysI must reference an existing primary keyI SQL allows one to handle consequences of tuple modifications

I what happens if the primary key of a tuple is modified?I what happens if a tuple is deleted?

I ON DELETE something and ON UPDATE somethingI with something being

I CASCADE: propagate the modification to referring tuplesI RESTRICT: forbid the modification if there are referring tuplesI SET NULL or SET DEFAULT: modify the foreign key in the referring

tuples as described

45

Page 51: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Example

Relational model to SQLI Client(client_id)I Account(account_id,client_id)

CREATE TABLE Client(client_id int PRIMARY KEY);CREATE TABLE Account(account_id int PRIMARY KEY,

client_id int NOT NULL,FOREIGN KEY(client_id) REFERENCES Client(client_id))ON DELETE RESTRICT ON UPDATE CASCADE;

46

Page 52: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Additional commands

Modifying the modelI DROP TABLE name: delete a tableI ALTER TABLE ...:

I data model modificationI long list of possibilities

I adding/removing a columnI changing the properties of a column (domain, constraints, etc.)I etc.

47

Page 53: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

SQL Data Query Language

The SELECT commandI the main query command in SQLI general form

SELECT something FROM somewhere[WHERE conditions] [GROUPBY grouping][HAVING group conditions] [ORDER BY something]

I provides all the manipulations available in R+dyplr orPython+pandas:I subsetting, filtering, transforming, orderingI summarizingI joining

48

Page 54: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Subsetting

SubsettingI simple SELECT queries can be used to subset a relation on

interesting columnsI general form SELECT col1, ..., col2 FROM table;

ExempleSELECT client_id, gender FROM Client;

Client relationclient_id gender birth_date district_id

1 F 1970-12-13 182 M 1945-02-04 13 F 1940-10-09 14 M 1956-12-01 55 F 1960-07-03 5

Result relationclient_id gender

1 F2 M3 F4 M5 F

49

Page 55: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Transformation

Expression and renamingI columns may be renamed using orig_name AS new_name in

the SELECT commandI simple calculations may also be performed on columns including

the results as new columns

ExampleSELECT A2 AS Name, A4 AS Population,

A5+A6+A7+A8 AS Municipalities FROM District;

District relationdistrict_id A2 A3 A4 A5 A6 A7 A8

1.00 Hl.m. Praha Prague 1204953.00 0.00 0.00 0.00 1.002.00 Benesov central Bohemia 88884.00 80.00 26.00 6.00 2.003.00 Beroun central Bohemia 75232.00 55.00 26.00 4.00 1.004.00 Kladno central Bohemia 149893.00 63.00 29.00 6.00 2.005.00 Kolin central Bohemia 95616.00 65.00 30.00 4.00 1.00

50

Page 56: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Transformation

Expression and renamingI columns may be renamed using orig_name AS new_name in

the SELECT commandI simple calculations may also be performed on columns including

the results as new columns

ExampleSELECT A2 AS Name, A4 AS Population,

A5+A6+A7+A8 AS Municipalities FROM District;

Result relationName Population MunicipalitiesHl.m. Praha 1204953.00 1.00Benesov 88884.00 114.00Beroun 75232.00 86.00Kladno 149893.00 100.00Kolin 95616.00 100.00

50

Page 57: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Filtering

Selecting tuplesI the WHERE clause can be used to select tuples fulfilling some

conditionsI general form

SELECT columns FROM table WHERE conditions;

ExempleSELECT client_id, birth_date FROM Client

WHERE Gender='F';

Client relationclient_id gender birth_date district_id

1 F 1970-12-13 182 M 1945-02-04 13 F 1940-10-09 14 M 1956-12-01 55 F 1960-07-03 5

Result relationclient_id birth_date

1 1970-12-133 1940-10-095 1960-07-038 1938-02-21

11 1950-08-22

51

Page 58: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Aggregation

Global summariesI aggregation functions can be used in the result part of the

SELECT commandI they operate at the column levelI some examples:

I COUNT and COUNT(DISTINCT(.))I MAX, MIN, SUMI AVG, STD, VARIANCE

Financial databaseSELECT COUNT(birth_date) FROM Client WHERE Gender='F';

gives 2645, whileSELECT COUNT(DISTINCT(birth_date)) FROM Client

WHERE Gender='F';

gives 2461.

52

Page 59: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Conditional aggregation

Split apply combine in SQLI the GROUP BY clause of the SELECT command provides

conditional analysisI it splits the relation into groups of tuples on which it applies

chosen aggregation functionsI groups can be further selected based on global properties with the

HAVING clause

General formSELECT aggregates FROM relation

[WHERE conditions]GROUP BY columns[HAVING group conditions]

53

Page 60: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Split apply combine

X Y4 C2 B2 C3 C4 B1 C4 C4 A3 B3 A1 A1 B1 A3 B2 C

Split

Y XA 4A 3A 1A 1

Y XB 2B 4B 3B 1B 3

Y XC 4C 2C 3C 1C 4C 2

Apply

Y SumA 9

Y SumB 13

Y SumC 16

CombineY SumA 9B 13C 16

54

Page 61: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Examples

Count clients per genderSELECT gender, COUNT(gender) AS number FROM Client

GROUP BY gender;

gender numberF 2645M 2724

Count clients per district id and genderSELECT district_id, gender, COUNT(gender) AS number

FROM Client GROUP BY district_id, gender;

district_id gender number1 F 3241 M 3392 F 262 M 203 F 27

55

Page 62: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Example

Count clients per district id and genderand keep only groups with more than 50 persons;SELECT district_id, gender, COUNT(gender) AS number

FROM Client GROUP BY district_id, genderHAVING COUNT(*)>=50;

district_id gender number1 F 3241 M 339

54 F 7554 M 8064 F 5764 M 5270 F 8870 M 8172 F 5072 M 5474 F 8474 M 96

56

Page 63: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Joins

Joining relationsI a SELECT command can operate on multiple relations at once (in

the FROM clause)I default behavior: cartesian product (a.k.a. a cross join)I key based joins are explicitly constructed via WHERE or ON

conditionsI dotted notation: relation.columnI alias for relation in the FROM clauseI support for complex conditions (not only key equality)I inner joins by default if implicit

I implicit joins: WHEREI explicit joins: JOIN ... ON

57

Page 64: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Example

Financial dataI goal: gender distribution per regionI sources:

I genders in the Client tableI regions in the District tableI district_id is a foreign key in the Client table which references to the

District tableI first step (join)

SELECT gender, A3 AS region From Client, DistrictWHERE Client.district_id = District.district_id;

I full query (with the group by)SELECT C.gender, D.A3 AS region, count(*) as number

FROM Client C, District DWHERE C.district_id = D.district_idGROUP BY region, C.gender;

58

Page 65: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Example

Financial dataI goal: gender distribution per regionI sources:

I genders in the Client tableI regions in the District tableI district_id is a foreign key in the Client table which references to the

District tableI first step (join)

SELECT gender, A3 AS region From ClientINNER JOIN DistrictON Client.district_id = District.district_id;

I full query (with the group by)SELECT C.gender, D.A3 AS region, count(*) as number

FROM Client C INNER JOIN District DWHERE C.district_id = D.district_idGROUP BY region, C.gender;

59

Page 66: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Example

District relationdistrict_id A2 A3

1.00 Hl.m. Praha Prague2.00 Benesov central Bohemia3.00 Beroun central Bohemia4.00 Kladno central Bohemia5.00 Kolin central Bohemia

Joined relationsgender regionF south BohemiaM PragueF PragueM central BohemiaF central Bohemia

Client relationclient_id gender district_id

1 F 182 M 13 F 14 M 55 F 5

Resultsregion gender Numbercentral Bohemia F 336central Bohemia M 328east Bohemia F 318east Bohemia M 342north Bohemia F 281north Bohemia M 280north Moravia F 437north Moravia M 483Prague F 324Prague M 339south Bohemia F 231south Bohemia M 218south Moravia F 474south Moravia M 463west Bohemia F 244west Bohemia M 271

60

Page 67: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Example

Financial dataI goal: persons with an account in a bank another district than their

home districtI sources:

I home district in the Client tableI bank district in the Account tableI links between client and account in the Disposition table

I querySELECT C.client_id, C.district_id AS home,

A.district_id AS bankFROM Client C INNER JOIN Disposition D

INNER JOIN Account AWHERE C.client_id = D.client_id

AND D.account_id=A.account_idAND C.district_id != A.district_id;

61

Page 68: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Example

Resultsclient_id home bank

12 40 7020 46 7042 68 6744 38 147 76 3648 16 6477 1 4898 33 2699 33 26

102 74 54103 74 54114 74 68115 74 68138 54 68151 51 5153 13 2162 65 75193 1 23195 74 60196 74 60

62

Page 69: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Outer joins

Similar approachI replace INNER JOIN by

I LEFT OUTER JOIN (or LEFT JOIN)I RIGHT OUTER JOIN (or RIGHT JOIN)I FULL OUTER JOIN (or FULL JOIN)

63

Page 70: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Other possibilities

Not described hereI ORDER BY: results orderingI DISTINCT: unique results onlyI nested queries: SELECT queries in other SELECT queriesI set operations such as UNION and INTERSECT

64

Page 71: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

SQL Data Manipulation Language

INSERTI inserting a tuple into a relation:

INSERT INTO table VALUES (...);

I variants includeINSERT INTO table (columns...) VALUES (...); tospecify the column names (NULL is assigned to missing columns)

DELETEI deleting is done conditionally, using a WHERE clauseI general syntax

DELETE FROM table WHERE condition;

65

Page 72: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Updating

UPDATEI used to alter tuplesI general syntax

UPDATE tableSET column = value [,column = value...][WHERE condition];

66

Page 73: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Changelog

I December 2019: added a short introduction to SQLI November 2019: initial version

67

Page 74: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Licence

This work is licensed under a Creative CommonsAttribution-ShareAlike 4.0 International License.

http://creativecommons.org/licenses/by-sa/4.0/

68

Page 75: An introduction to relational database management system€¦ · Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational

Version

Last git commit: 2019-12-11By: Fabrice Rossi ([email protected])Git hash: af2cee4da140c15fb47c0ff45a9ff8b1028fcbd0

69