17
atabase Management Systems Introduction to Database Systems Instructor: Xintao Wu Ramakrishnan & Gehrke

Database Management Systems 1 Introduction to Database Systems Instructor: Xintao Wu Ramakrishnan & Gehrke

Embed Size (px)

Citation preview

Page 1: Database Management Systems 1 Introduction to Database Systems Instructor: Xintao Wu Ramakrishnan & Gehrke

Database Management Systems 1

Introduction to Database Systems

Instructor: Xintao Wu

Ramakrishnan & Gehrke

Page 2: Database Management Systems 1 Introduction to Database Systems Instructor: Xintao Wu Ramakrishnan & Gehrke

Database Management Systems 2 Ramakrishnan & Gehrke

http://www.sigmod.org/record/issues/0606/index.html

Page 3: Database Management Systems 1 Introduction to Database Systems Instructor: Xintao Wu Ramakrishnan & Gehrke

Database Management Systems 3 Ramakrishnan & Gehrke

History

60s C. Bachman GE network data model Late 60s IBM IMS hierarchical data model 70 E. Codd relational model 80s SQL IBM R transaction J. Gray Late 80s-90s DB2, Oracle, Informix, Sybase 90s- Data warehouse, internet, 2009- NoSQL, NewSQL Turing award and Turing test?

Turing award list Turing website

Page 4: Database Management Systems 1 Introduction to Database Systems Instructor: Xintao Wu Ramakrishnan & Gehrke

Database Management Systems 4 Ramakrishnan & Gehrke

What Is a DBMS?

A very large, integrated collection of data.

Models real-world enterprise.– Entities (e.g., students, courses)– Relationships (e.g., Madonna is taking

ITCS6160) A Database Management System

(DBMS) is a software package designed to maintain and utilize databases.

Page 5: Database Management Systems 1 Introduction to Database Systems Instructor: Xintao Wu Ramakrishnan & Gehrke

Database Management Systems 5 Ramakrishnan & Gehrke

Why Use a DBMS?

Data independence and efficient access. Reduced application development time. Data integrity and security. Uniform data administration. Concurrent access, recovery from

crashes.

Page 6: Database Management Systems 1 Introduction to Database Systems Instructor: Xintao Wu Ramakrishnan & Gehrke

Database Management Systems 66

Page 7: Database Management Systems 1 Introduction to Database Systems Instructor: Xintao Wu Ramakrishnan & Gehrke

Database Management Systems 7 Ramakrishnan & Gehrke

Data Models A data model is a collection of concepts for

describing data. A schema is a description of a particular

collection of data, using the given data model. The relational model of data is the most widely

used model today.– Main concept: relation, basically a table with rows

and columns.– Every relation has a schema, which describes the

columns, or fields.

Page 8: Database Management Systems 1 Introduction to Database Systems Instructor: Xintao Wu Ramakrishnan & Gehrke

Database Management Systems 8 Ramakrishnan & Gehrke

Levels of Abstraction

Many views, single conceptual (logical) schema and physical schema.– Views describe how

users see the data.

– Conceptual schema defines logical structure

– Physical schema describes the files and indexes used. Schemas are defined using DDL; data is modified/queried using DML.

Physical Schema

Conceptual Schema

View 1 View 2 View 3

Page 9: Database Management Systems 1 Introduction to Database Systems Instructor: Xintao Wu Ramakrishnan & Gehrke

Database Management Systems 9 Ramakrishnan & Gehrke

Example: University Database

Conceptual schema: – Students(sid: string, name: string, login: string,

age: integer, gpa:real)– Courses(cid: string, cname:string, credits:integer) – Enrolled(sid:string, cid:string, grade:string)

Physical schema:– Relations stored as unordered files. – Index on first column of Students.

External Schema (View): – Course_info(cid:string,enrollment:integer)

Page 10: Database Management Systems 1 Introduction to Database Systems Instructor: Xintao Wu Ramakrishnan & Gehrke

Database Management Systems 10 Ramakrishnan & Gehrke

Data Independence

Applications insulated from how data is structured and stored.

Logical data independence: Protection from changes in logical structure of data.

Physical data independence: Protection from changes in physical structure of data.

One of the most important benefits of using a DBMS!

Page 11: Database Management Systems 1 Introduction to Database Systems Instructor: Xintao Wu Ramakrishnan & Gehrke

Database Management Systems 11 Ramakrishnan & Gehrke

Structure of a DBMS

A typical DBMS has a layered architecture.

The figure does not show the concurrency control and recovery components.

This is one of several possible architectures; each system has its own variations.

Query Optimizationand Execution

Relational Operators

Files and Access Methods

Buffer Management

Disk Space Management

DB

These layersmust considerconcurrencycontrol andrecovery

Page 12: Database Management Systems 1 Introduction to Database Systems Instructor: Xintao Wu Ramakrishnan & Gehrke

Database Management Systems 12 Ramakrishnan & Gehrke

Transaction Management: ACID properties

AA tomicity: All actions in the Xact happen, or none happen.

CC onsistency: If each Xact is consistent, and the DB starts consistent, it ends up consistent.

II solation: Execution of one Xact is isolated from that of other Xacts.

D D urability: If a Xact commits, its effects persist.

The Recovery Manager guarantees Atomicity & Durability whereas the Concurrency Control guarantees Consistency & Isolation.

Page 13: Database Management Systems 1 Introduction to Database Systems Instructor: Xintao Wu Ramakrishnan & Gehrke

Database Management Systems 13 Ramakrishnan & Gehrke

Motivation of concurrency control

Consistency Isolation Example

– Two parallel transactions T1 and T2– Serial execution– Execution with interleaving actions– Example shown on board

Page 14: Database Management Systems 1 Introduction to Database Systems Instructor: Xintao Wu Ramakrishnan & Gehrke

Database Management Systems 14

Example

Consider two transactions (Xacts):

T1: BEGIN A=A+100, B=B-100 ENDT2: BEGIN A=1.06*A, B=1.06*B END

Intuitively, the first transaction is transferring $100 from B’s account to A’s account. The second is crediting both accounts with a 6% interest payment.

There is no guarantee that T1 will execute before T2 or vice-versa, if both are submitted together. However, the net effect must be equivalent to these two transactions running serially in some order.

Page 15: Database Management Systems 1 Introduction to Database Systems Instructor: Xintao Wu Ramakrishnan & Gehrke

Database Management Systems 15

Example (Contd.)

Consider a possible interleaving (schedule):

T1: A=A+100, B=B-100 T2: A=1.06*A, B=1.06*B

This is OK. But what about:T1: A=A+100, B=B-100 T2: A=1.06*A, B=1.06*B

The DBMS’s view of the second schedule:

T1: R(A), W(A), R(B), W(B)T2: R(A), W(A), R(B), W(B)

Page 16: Database Management Systems 1 Introduction to Database Systems Instructor: Xintao Wu Ramakrishnan & Gehrke

Database Management Systems 16 Ramakrishnan & Gehrke

Motivation of recovery management

Atomicity: – Transactions may abort (“Rollback”).

Durability:– What if DBMS stops running? (Causes?)

crash! Desired Behavior after

system restarts:– T1, T2 & T3 should be

durable.– T4 & T5 should be

aborted (effects not seen).

T1T2T3T4T5

Page 17: Database Management Systems 1 Introduction to Database Systems Instructor: Xintao Wu Ramakrishnan & Gehrke

Database Management Systems 17 Ramakrishnan & Gehrke

Summary DBMS used to maintain, query large datasets. Benefits include recovery from system

crashes, concurrent access, quick application development, data integrity and security.

Levels of abstraction give data independence. A DBMS typically has a layered architecture. DBAs hold responsible jobs and are

well-paid! DBMS R&D is one of the broadest,

most exciting areas in CS.