31
Modified from: Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use CS425 – Summer 2016 Jason Arnold Chapter 1: Introduction

Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

Modified from:

Database System Concepts, 6th Ed.

©Silberschatz, Korth and Sudarshan

See www.db-book.com for conditions on re-use

CS425 – Summer 2016

Jason Arnold

Chapter 1: Introduction

Page 2: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.2CS425 – Fall 2014 – Boris Glavic

Textbook: Chapter 1

Page 3: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.3CS425 – Fall 2014 – Boris Glavic

Database Management System (DBMS)

n DBMS contains information about a particular domain

l Collection of interrelated data

l Set of programs to access the data

l An environment that is both convenient and efficient to use

n Database Applications:

l Banking: transactions

l Airlines: reservations, schedules

l Universities: registration, grades

l Sales: customers, products, purchases

l Online retailers: order tracking, customized recommendations

l Manufacturing: production, inventory, orders, supply chain

l Human resources: employee records, salaries, tax deductions

n Databases can be very large.

n Databases touch all aspects of our lives

Page 4: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.4CS425 – Fall 2014 – Boris Glavic

University Database Example

n Application program examples

l Add new students, instructors, and courses

l Register students for courses, and generate class rosters

l Assign grades to students, compute grade point averages (GPA)

and generate transcripts

n In the early days, database applications were built directly on top of

file systems

Page 5: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.5CS425 – Fall 2014 – Boris Glavic

Drawbacks of using file systems to store data

l Data redundancy and inconsistency

Multiple file formats, duplication of information in different files

l Difficulty in accessing data

Need to write a new program to carry out each new task

l Data isolation — multiple files and formats

l Integrity problems

Integrity constraints (e.g., account balance > 0) become

“buried” in program code rather than being stated explicitly

Hard to add new constraints or change existing ones

Page 6: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.6CS425 – Fall 2014 – Boris Glavic

Drawbacks of using file systems to store data (Cont.)

l Atomicity of updates

Failures may leave database in an inconsistent state with partial updates

carried out

Example: Transfer of funds from one account to another should either

complete or not happen at all

l Concurrent access by multiple users

Concurrent access needed for performance

Uncontrolled concurrent accesses can lead to inconsistencies

– Example: Two people reading a balance (say 100) and updating it by

withdrawing money (say 50 each) at the same time

l Security problems

Hard to provide user access to some, but not all, data

Database systems offer solutions to all the above problems!

Page 7: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.7CS425 – Fall 2014 – Boris Glavic

Levels of Abstraction

n Physical level: describes how a record (e.g., customer) is stored.

n Logical level: describes data stored in database, and the relationships

among the data.

type instructor = record

ID : string;

name : string;

dept_name : string;

salary : integer;

end;

n View level: application programs hide details of data types. Views can

also hide information (such as an employee’s salary) for security

purposes.

Page 8: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.8CS425 – Fall 2014 – Boris Glavic

View of Data

An architecture for a database system

Page 9: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.9CS425 – Fall 2014 – Boris Glavic

Instances and Schemas

n Similar to types and variables in programming languages

n Schema – the logical structure of the database

l Example: The database consists of information about a set of customers and

accounts and the relationship between them

l Analogous to type information of a variable in a program

l Physical schema: database design at the physical level

l Logical schema: database design at the logical level

n Instance – the actual content of the database at a particular point in time

l Analogous to the value of a variable

n Physical Data Independence – the ability to modify the physical schema without

changing the logical schema

l Applications depend on the logical schema

l In general, the interfaces between the various levels and components should

be well defined so that changes in some parts do not seriously influence others.

n Logical Data Independence – the ability to modify the logical schema without

changing the applications

l For example, add new information to each employee

Page 10: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.10CS425 – Fall 2014 – Boris Glavic

Data Models

n A collection of tools for describing

l Data

l Data relationships

l Data semantics

l Data constraints

n Relational model

n Entity-Relationship data model (mainly for database design)

n Object-based data models (Object-oriented and Object-relational)

n Semistructured data model (XML)

n Other older models:

l Network model l Hierarchical model

n Other newer (or revived) models:

l Key-value

Page 11: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.11CS425 – Fall 2014 – Boris Glavic

Relational Model

n Relational model (Chapter 2)

n Example of tabular data in the relational modelColumns (attributes)

Rows (tuples)

Page 12: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.12CS425 – Fall 2014 – Boris Glavic

A Sample Relational Database

Page 13: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.13CS425 – Fall 2014 – Boris Glavic

Data Manipulation Language (DML)

n Language for accessing and manipulating the data organized by the

appropriate data model

l DML also known as query language

n Two classes of languages

l Procedural – user specifies what data is required and how to get

those data

l Declarative (nonprocedural) – user specifies what data is

required without specifying how to get those data

n SQL is the most widely used query language

Page 14: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.14CS425 – Fall 2014 – Boris Glavic

Data Definition Language (DDL)

n Specification notation for defining the database schema

Example: create table instructor (

ID char(5),

name varchar(20),

dept_name varchar(20),

salary numeric(8,2))

n DDL compiler generates a set of table templates stored in a data dictionary

n Data dictionary contains metadata (i.e., data about data)

l Database schema

l Integrity constraints

Primary key (ID uniquely identifies instructors)

Referential integrity (references constraint in SQL)

– e.g. dept_name value in any instructor tuple must appear in

department relation

l Authorization

Page 15: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.15CS425 – Fall 2014 – Boris Glavic

SQL

n SQL: widely used declarative (non-procedural) language

l Example: Find the name of the instructor with ID 22222select namefrom instructorwhere instructor.ID = ‘22222’

l Example: Find the ID and building of instructors in the Physics dept.

select instructor.ID, department.buildingfrom instructor, departmentwhere instructor.dept_name = department.dept_name and

department.dept_name = ‘Physics’

n Application programs generally access databases through one of

l Language extensions to allow embedded SQL

l Application program interface (e.g., ODBC/JDBC) which allow SQL queries to be sent to a database

n Chapters 3, 4 and 5

Page 16: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.16CS425 – Fall 2014 – Boris Glavic

Database Design

The process of designing the general structure of a database:

n Logical Design – Deciding on the database schema. Database design

requires that we find a “good” representation of the information from an

application domain (e.g., banking) as a collection of relation schemas.

l Business decision – What information should we record in the

database?

l Computer Science decision – What relation schemas should we

have and how should the attributes be distributed among the various

relation schemas?

n Physical Design – Deciding on the physical layout of the database

Page 17: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.17CS425 – Fall 2014 – Boris Glavic

Database Design?

n Is there any problem with this design?

Page 18: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.18CS425 – Fall 2014 – Boris Glavic

Database Design?

n Example: Changing the budget of the ‘Physics’ department

l Updates to many rows!

l Easy to break integrity

If we forget to update a row, then we have multiple budget

values for the physics department!

Page 19: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.19CS425 – Fall 2014 – Boris Glavic

Design Approaches

n Normalization Theory (Chapter 8)

l Formalize what designs are bad, and test for them

n Entity Relationship Model (Chapter 7)

l Models an enterprise as a collection of entities and relationships

Entity: a “thing” or “object” in the enterprise that is

distinguishable from other objects

– Described by a set of attributes

Relationship: an association among several entities

l Represented diagrammatically by an entity-relationship diagram:

Page 20: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.20CS425 – Fall 2014 – Boris Glavic

The Entity-Relationship Model

n Models an enterprise as a collection of entities and relationships

l Entity: a “thing” or “object” in the enterprise that is distinguishable

from other objects

Described by a set of attributes

l Relationship: an association among several entities

n Represented diagrammatically by an entity-relationship diagram:

What happened to dept_name of instructor and student?

Page 21: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.23CS425 – Fall 2014 – Boris Glavic

Query Processing

1. Parsing and translation

2. Optimization

3. Evaluation

Page 22: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.24CS425 – Fall 2014 – Boris Glavic

Query Processing (Cont.)

n Alternative ways of evaluating a given query

l Equivalent expressions

l Different algorithms for each operation

n Cost difference between a good and a bad way of evaluating a query can

be enormous

n Need to estimate the cost of operations

l Depends critically on statistical information about relations which the

database must maintain

l Need to estimate statistics for intermediate results to compute cost of

complex expressions

n Need to search for a good plan (low costs)

l Traversing the search space of alternative ways (plans) to compute

the query result

l This is called query optimization

Page 23: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.25CS425 – Fall 2014 – Boris Glavic

Transaction Management

n What if the system fails?

n What if more than one user is concurrently updating the same data?

n A transaction is a collection of operations that performs a single

logical function in a database application

n Transaction-management component ensures that the database

remains in a consistent (correct) state despite system failures (e.g.,

power failures and operating system crashes) and transaction failures.

n Concurrency-control manager controls the interaction among the

concurrent transactions, to ensure the consistency of the database.

Page 24: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.26CS425 – Fall 2014 – Boris Glavic

Database Users and Administrators

Database

Page 25: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.27CS425 – Fall 2014 – Boris Glavic

Database System Internals

Page 26: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.28CS425 – Fall 2014 – Boris Glavic

Database Architecture

The architecture of a database systems is greatly influenced by

the underlying computer system on which the database is running:

n Centralized

n Client-server

n Parallel (multi-processor)

n Distributed

Page 27: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.29CS425 – Fall 2014 – Boris Glavic

Build a Complete Database System in

your free time?n How much time do you need?

n To get a rough idea:

l Postgres (about 800,000 lines of code)

Hundreds of man-years of work

l Oracle (about 8,000,000 lines of code)

Probably thousands of man-years of work?

n Hmm, … probably not!

n Maybe a limited research prototype or new feature ;-)

Page 28: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.30CS425 – Fall 2014 – Boris Glavic

History of Database Systems

n 1950s and early 1960s:

l Data processing using magnetic tapes for storage

Tapes provided only sequential access

l Punched cards for input

n Late 1960s and 1970s:

l Hard disks allowed direct access to data

l Network and hierarchical data models in widespread use

l Ted Codd defines the relational data model

Would win the ACM Turing Award for this work

IBM Research begins System R prototype

UC Berkeley begins Ingres prototype

l High-performance (for the era) transaction processing

Page 29: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.31CS425 – Fall 2014 – Boris Glavic

History (cont.)

n 1980s:

l Research relational prototypes evolve into commercial systems

SQL becomes industrial standard

l Parallel and distributed database systems

l Object-oriented database systems

n 1990s:

l Large decision support and data-mining applications

l Large multi-terabyte data warehouses

l Emergence of Web commerce

n Early 2000s:

l XML and XQuery standards

l Automated database administration

n Later 2000s:

l Giant data storage systems

Google BigTable, Yahoo PNuts, Amazon, ..

Page 30: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.32CS425 – Fall 2014 – Boris Glavic

Recap

n Why databases?

n What do databases do?

n Data independence

l Physical and Logical

n Database design

n Data models

l Relational, object, XML, network, hierarchical

n Query languages

l DML

l DDL

n Architecture and systems aspects of database systems

l Recovery

l Concurrency control

l Query processing (optimization)

l File organization and indexing

n History of databases

Page 31: Chapter 1: Introductioncs.iit.edu/~cs425/previous/16summer/slides/ch01-intro.pdf · CS425 –Fall 2014 –Boris Glavic 1.15 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used

©Silberschatz, Korth and Sudarshan1.33CS425 – Fall 2014 – Boris Glavic

End of Chapter 1