25
Database Systems 236363 Introduction

Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: [email protected]@cs.technion.ac.il

Embed Size (px)

Citation preview

Page 1: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

Database Systems236363

Introduction

Page 2: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

Bureaucratic Info (1/2)

• Lecturer: Prof. Roy Friedman– Office: Taub 605– E-mail: [email protected]

• TA in Charge: Roni Licher– Office: Taub 225– E-mail: [email protected]

• TA: Hadar Levy– Office: Taub 315– E-mail: [email protected]

Page 3: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

Bureaucratic Info (2/2)

• Web site: http://webcourse.cs.technion.ac.il/236363/

• Home assignments: 3 dry, 1 wet

• Grade structure:– 80% final exam, 20% home assignments (takef)– 8% for the wet assignment, 4% each dry one– Home assignments are mandatory – no final grade without a

pass grade in each home assignment• Those who repeat the course must submit all home assignments• Exceptions to this rule according to post-”Zuk Eitan” regulations

– Only applies to students that took the course in Spring 2014 semester

Page 4: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

Topics

• Introduction• Entity Relationship Diagrams (ERD)• Relational Algebra• Relational Calculus• Database Design Theory

– Functional Dependencies– Schemas Decomposition and Normal Forms

• XML and the Query Language Xpath• NoSQL and graph databases

• In the recitation: The SQL Query Language

In general, this course is about using DBs

rather than implem

entation details

Page 5: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

Let the Fun Begin…

• Database– A (persistent) collection of data– Often with some logical structure– Examples include, e.g., bank accounts, students listed to courses

and their grades, geographical data used by a map/navigation service, customers of an online web-site, dentist’s patients

• A Query Language– Enables querying and manipulating the database– Examples include, e.g., Structured Query Language (SQL),

Datalog, Cassandra Query Language (CQL), Cypher• Database Management System (DBMS)

– The system that manages the database and supports the execution of queries on the database

Page 6: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

Why Do We Need a DBMS?

• After all, we have operating systems and file systems…• DBMS provides a data oriented abstraction for

manipulating the data• Enables direct manipulation of the data without

worrying about storage and execution issues– Frees the programmer from worrying about many low level

details such as:• Serializing and de-serializing the data to the storage• Organizing the data in the storage and masking storage latencies

and inefficiencies• How to enable concurrent access to the database?• What to do when the database is larger than physical memory?• Split/cached database operation in combined mobile/cloud

Page 7: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

DBMS Functionality

• Storage management• Query processor– Optimizes query processing for efficient information

retrieval and query execution• Concurrency control and recovery• Data integrity– E.g., that an ID number is a unique 9 digit number

• Security– Access control, authentication and encryption

Page 8: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

Data Model

• The data model defines the framework for how data is represented– For example, in the relational model, data is

represented as tables (relations)• The query language enables extracting data

from the database according to the given data model– For example, SQL can express queries on tables

and the results are tables as well

Page 9: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

• To obtain independence between the data model and the physical storage, we separate between three levels– User view, logical level, and physical level

Data Representation Independence

View

Logical

Physical

Independence of physicallayout beyond this level

Independence of logicallayout beyond this levelData is organized here

according to the datamodel (relational)

Page 10: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

Upper Database Layer

• In the end-user layer, each user is provided with a (potentially partial) view that may be different from the actual data layout

Page 11: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

Operations on a Database

• Database structure definition (following an analysis of the application’s needs): Includes the logical structures for representing the data and their relations– Data Definition Language (DDL)

• Query execution to retrieve data from the database

• Data manipulation: adding, deleting, and updating– Data Manipulation Language (DML)

• Administrative operations:– Defining views, indexes, etc.

Page 12: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

Database Administrator (DBA)

• Responsible for– Planning the logical database layout and adapting

it to the physical layer– Security and access control– Recovery management (after failures)– Performance fine tuning

Page 13: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

Data Models (1/2)

• Relational data model– Data is represented using tables– Correspondence between tables is obtained by using same

values in specific columns (keys) with the same name– The main focus of this course

• Entity Relation– A tool for analyzing the requirements of a database and

designing its schema– This model is an abstract one and has no actual direct

implementation

Page 14: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

Data Models (2/2)

• Object Oriented data model– A model in which the data is represented as objects

similarly to what is done in OOP• ERD can be mapped to OO

• Semi-structured data model– Data is represented as a graph (independent of the

physical layout)• Extensible Markup Language (XML)– A specific instance of the semi-structured model in

which the graph is a tree

Page 15: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

The Relational Data Model• We wish to represent a collection of objects of a given type, each

characterized by a fixed set of properties– For example, student’s name, date of birth, department

• How would we do this in C or Java?

• In the relational data model, we maintain these objects in a table– Each row is used to store one object– Each column represents one of the properties

• The table is a logical structure– Might be physically stored in a completely different manner– Each row must be different than each other row in at least one attribute

• The table represents a set rather than a multi-set

Department Date of Birth Student Name

Computer Science 01/01/1990 John Doe

Electrical Engineering 07/07/1992 Jane Roe

… … …

Page 16: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

Relational Model: Terminology

Department Date of Birth Student Name

Computer Science 01/01/1990 John Doe

Electrical Engineering 07/07/1992 Jane Roe

… … …

A relation (the entire table)A record (an entire row)

Schema (title of table) An attribute (column name)

Page 17: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

A Formal Definition

• For a given set of attributes A1,…,An and a set of corresponding domains D1,…,Dn (each domain is a set of values)

• Denote R(A1,…,An) the relational schema that contains the attributes A1,…,An

• A relation r over R is a subset of the domain product r ⊆ D1xD2x…xDn

Page 18: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

The Formal Definition - Visualized

• Instead of viewing each record as a line in a table, it is viewed as a point in the space of all possible value assignments. The relation is a finite subset of “all possible records”.

Name

Birth Date

John Doe, 01/01/1990

Jane Roe, 07/07/1992

Page 19: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

Keys

• A superkey of a relation r is a subset of attributes of r’s schema such that specific values of these attributes identify a single record in r. In other words, there are no two records in r whose values in all attributes of the superkey are the same.

• A relation may have several superkeys. A superkey is called minimal if none of its subsets is a superkey. Such a key is also called a candidate key.

• One of the superkeys can be selected as the primary key. The primary key is used to identify a row in the implementation of the database.

Yet, in PostgreSQL, when no key is defined, a table can include multiple records with the same values in all attributes

Page 20: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

Simple Databases

• In the simplest case, all objects of interest are of the same “kind”, meaning that they all have the same attributes list – they are only distinguished by their specific attributes values

• For example, the list of songs on my computer – each such object is characterized by the name of the song, the format (mp3/wma/…), playing time, and size

• In these cases, all objects are organized in a single table

Page 21: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

More Involved Databases

• Suppose we wish to design a database for the faculty’s administrative assistants. Here, we can identify at least two types of entities:– Students: student name, id, address– Courses: course name, catalogue number, lecturer

• If a student is registered to a given course, we should be able to know about it and be able to retrieve the student’s final grade– Hence, each student’s participation in a given course should be

recorded somewhere

• The question is how to organize the database for this?

Page 22: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

Possible Organization

• A simple option: Have a single large table – for each student’s registration, we will hold the student’s name, id, address, course name, catalogue number, lecturer, and final grade

• Drawbacks:– Redundancy: Why should the student’s address be stored in each

course she takes?– Inadequacy: How can we maintain the details of a student that

does not take any course?– Difficult to update: If a student changes his address, we will need

to update all records of all courses he is registered to. This is both expensive and a source of inconsistency

Page 23: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

Another Option

• We can maintain one table for students, one for courses, and one for registration

• The registration table schema can include a primary key for the students (e.g., student id) and a primary key for the courses (e.g., catalogue number) as well as the final grade

• Now each student’s data is independent of each course’s data and vice versa

Page 24: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

Another Option

Address Student ID Student Name

Technion City 12345678 Jane Roe

Lecturer Catalogue Number Course Name

Rony Superstar 236363 Database Systems

Final Grade Student ID Catalogue Number

95 12345678 236363

Students

Courses

Registrations

Page 25: Database Systems 236363 Introduction. Bureaucratic Info (1/2) Lecturer: Prof. Roy Friedman – Office: Taub 605 – E-mail: roy@cs.technion.ac.ilroy@cs.technion.ac.il

Life is Full of Difficult Choices

• What if we wish to retrieve the names of all lecturers who taught a given student?

• If in the registrations table we only maintain the course’ catalogue number, this query will require a long time to compute

• Further, if the lecturers change over time, the reply will not be accurate– How do we fix this?

• Should we add a lecturer attribute to the registrations table?• Should we define a new “lecturers” table and corresponding relations?

• The database organizational design choices are not trivial – this is the subject of much of this course