Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Modified from:
Database System Concepts, 6th Ed.
©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
CS425 – Summer 2016
Jason Arnold
Chapter 1: Introduction
©Silberschatz, Korth and Sudarshan1.2CS425 – Fall 2014 – Boris Glavic
Textbook: Chapter 1
©Silberschatz, Korth and Sudarshan1.3CS425 – Fall 2014 – Boris Glavic
Database Management System (DBMS)
n DBMS contains information about a particular domain
l Collection of interrelated data
l Set of programs to access the data
l An environment that is both convenient and efficient to use
n Database Applications:
l Banking: transactions
l Airlines: reservations, schedules
l Universities: registration, grades
l Sales: customers, products, purchases
l Online retailers: order tracking, customized recommendations
l Manufacturing: production, inventory, orders, supply chain
l Human resources: employee records, salaries, tax deductions
n Databases can be very large.
n Databases touch all aspects of our lives
©Silberschatz, Korth and Sudarshan1.4CS425 – Fall 2014 – Boris Glavic
University Database Example
n Application program examples
l Add new students, instructors, and courses
l Register students for courses, and generate class rosters
l Assign grades to students, compute grade point averages (GPA)
and generate transcripts
n In the early days, database applications were built directly on top of
file systems
©Silberschatz, Korth and Sudarshan1.5CS425 – Fall 2014 – Boris Glavic
Drawbacks of using file systems to store data
l Data redundancy and inconsistency
Multiple file formats, duplication of information in different files
l Difficulty in accessing data
Need to write a new program to carry out each new task
l Data isolation — multiple files and formats
l Integrity problems
Integrity constraints (e.g., account balance > 0) become
“buried” in program code rather than being stated explicitly
Hard to add new constraints or change existing ones
©Silberschatz, Korth and Sudarshan1.6CS425 – Fall 2014 – Boris Glavic
Drawbacks of using file systems to store data (Cont.)
l Atomicity of updates
Failures may leave database in an inconsistent state with partial updates
carried out
Example: Transfer of funds from one account to another should either
complete or not happen at all
l Concurrent access by multiple users
Concurrent access needed for performance
Uncontrolled concurrent accesses can lead to inconsistencies
– Example: Two people reading a balance (say 100) and updating it by
withdrawing money (say 50 each) at the same time
l Security problems
Hard to provide user access to some, but not all, data
Database systems offer solutions to all the above problems!
©Silberschatz, Korth and Sudarshan1.7CS425 – Fall 2014 – Boris Glavic
Levels of Abstraction
n Physical level: describes how a record (e.g., customer) is stored.
n Logical level: describes data stored in database, and the relationships
among the data.
type instructor = record
ID : string;
name : string;
dept_name : string;
salary : integer;
end;
n View level: application programs hide details of data types. Views can
also hide information (such as an employee’s salary) for security
purposes.
©Silberschatz, Korth and Sudarshan1.8CS425 – Fall 2014 – Boris Glavic
View of Data
An architecture for a database system
©Silberschatz, Korth and Sudarshan1.9CS425 – Fall 2014 – Boris Glavic
Instances and Schemas
n Similar to types and variables in programming languages
n Schema – the logical structure of the database
l Example: The database consists of information about a set of customers and
accounts and the relationship between them
l Analogous to type information of a variable in a program
l Physical schema: database design at the physical level
l Logical schema: database design at the logical level
n Instance – the actual content of the database at a particular point in time
l Analogous to the value of a variable
n Physical Data Independence – the ability to modify the physical schema without
changing the logical schema
l Applications depend on the logical schema
l In general, the interfaces between the various levels and components should
be well defined so that changes in some parts do not seriously influence others.
n Logical Data Independence – the ability to modify the logical schema without
changing the applications
l For example, add new information to each employee
©Silberschatz, Korth and Sudarshan1.10CS425 – Fall 2014 – Boris Glavic
Data Models
n A collection of tools for describing
l Data
l Data relationships
l Data semantics
l Data constraints
n Relational model
n Entity-Relationship data model (mainly for database design)
n Object-based data models (Object-oriented and Object-relational)
n Semistructured data model (XML)
n Other older models:
l Network model l Hierarchical model
n Other newer (or revived) models:
l Key-value
©Silberschatz, Korth and Sudarshan1.11CS425 – Fall 2014 – Boris Glavic
Relational Model
n Relational model (Chapter 2)
n Example of tabular data in the relational modelColumns (attributes)
Rows (tuples)
©Silberschatz, Korth and Sudarshan1.12CS425 – Fall 2014 – Boris Glavic
A Sample Relational Database
©Silberschatz, Korth and Sudarshan1.13CS425 – Fall 2014 – Boris Glavic
Data Manipulation Language (DML)
n Language for accessing and manipulating the data organized by the
appropriate data model
l DML also known as query language
n Two classes of languages
l Procedural – user specifies what data is required and how to get
those data
l Declarative (nonprocedural) – user specifies what data is
required without specifying how to get those data
n SQL is the most widely used query language
©Silberschatz, Korth and Sudarshan1.14CS425 – Fall 2014 – Boris Glavic
Data Definition Language (DDL)
n Specification notation for defining the database schema
Example: create table instructor (
ID char(5),
name varchar(20),
dept_name varchar(20),
salary numeric(8,2))
n DDL compiler generates a set of table templates stored in a data dictionary
n Data dictionary contains metadata (i.e., data about data)
l Database schema
l Integrity constraints
Primary key (ID uniquely identifies instructors)
Referential integrity (references constraint in SQL)
– e.g. dept_name value in any instructor tuple must appear in
department relation
l Authorization
©Silberschatz, Korth and Sudarshan1.15CS425 – Fall 2014 – Boris Glavic
SQL
n SQL: widely used declarative (non-procedural) language
l Example: Find the name of the instructor with ID 22222select namefrom instructorwhere instructor.ID = ‘22222’
l Example: Find the ID and building of instructors in the Physics dept.
select instructor.ID, department.buildingfrom instructor, departmentwhere instructor.dept_name = department.dept_name and
department.dept_name = ‘Physics’
n Application programs generally access databases through one of
l Language extensions to allow embedded SQL
l Application program interface (e.g., ODBC/JDBC) which allow SQL queries to be sent to a database
n Chapters 3, 4 and 5
©Silberschatz, Korth and Sudarshan1.16CS425 – Fall 2014 – Boris Glavic
Database Design
The process of designing the general structure of a database:
n Logical Design – Deciding on the database schema. Database design
requires that we find a “good” representation of the information from an
application domain (e.g., banking) as a collection of relation schemas.
l Business decision – What information should we record in the
database?
l Computer Science decision – What relation schemas should we
have and how should the attributes be distributed among the various
relation schemas?
n Physical Design – Deciding on the physical layout of the database
©Silberschatz, Korth and Sudarshan1.17CS425 – Fall 2014 – Boris Glavic
Database Design?
n Is there any problem with this design?
©Silberschatz, Korth and Sudarshan1.18CS425 – Fall 2014 – Boris Glavic
Database Design?
n Example: Changing the budget of the ‘Physics’ department
l Updates to many rows!
l Easy to break integrity
If we forget to update a row, then we have multiple budget
values for the physics department!
©Silberschatz, Korth and Sudarshan1.19CS425 – Fall 2014 – Boris Glavic
Design Approaches
n Normalization Theory (Chapter 8)
l Formalize what designs are bad, and test for them
n Entity Relationship Model (Chapter 7)
l Models an enterprise as a collection of entities and relationships
Entity: a “thing” or “object” in the enterprise that is
distinguishable from other objects
– Described by a set of attributes
Relationship: an association among several entities
l Represented diagrammatically by an entity-relationship diagram:
©Silberschatz, Korth and Sudarshan1.20CS425 – Fall 2014 – Boris Glavic
The Entity-Relationship Model
n Models an enterprise as a collection of entities and relationships
l Entity: a “thing” or “object” in the enterprise that is distinguishable
from other objects
Described by a set of attributes
l Relationship: an association among several entities
n Represented diagrammatically by an entity-relationship diagram:
What happened to dept_name of instructor and student?
©Silberschatz, Korth and Sudarshan1.23CS425 – Fall 2014 – Boris Glavic
Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation
©Silberschatz, Korth and Sudarshan1.24CS425 – Fall 2014 – Boris Glavic
Query Processing (Cont.)
n Alternative ways of evaluating a given query
l Equivalent expressions
l Different algorithms for each operation
n Cost difference between a good and a bad way of evaluating a query can
be enormous
n Need to estimate the cost of operations
l Depends critically on statistical information about relations which the
database must maintain
l Need to estimate statistics for intermediate results to compute cost of
complex expressions
n Need to search for a good plan (low costs)
l Traversing the search space of alternative ways (plans) to compute
the query result
l This is called query optimization
©Silberschatz, Korth and Sudarshan1.25CS425 – Fall 2014 – Boris Glavic
Transaction Management
n What if the system fails?
n What if more than one user is concurrently updating the same data?
n A transaction is a collection of operations that performs a single
logical function in a database application
n Transaction-management component ensures that the database
remains in a consistent (correct) state despite system failures (e.g.,
power failures and operating system crashes) and transaction failures.
n Concurrency-control manager controls the interaction among the
concurrent transactions, to ensure the consistency of the database.
©Silberschatz, Korth and Sudarshan1.26CS425 – Fall 2014 – Boris Glavic
Database Users and Administrators
Database
©Silberschatz, Korth and Sudarshan1.27CS425 – Fall 2014 – Boris Glavic
Database System Internals
©Silberschatz, Korth and Sudarshan1.28CS425 – Fall 2014 – Boris Glavic
Database Architecture
The architecture of a database systems is greatly influenced by
the underlying computer system on which the database is running:
n Centralized
n Client-server
n Parallel (multi-processor)
n Distributed
©Silberschatz, Korth and Sudarshan1.29CS425 – Fall 2014 – Boris Glavic
Build a Complete Database System in
your free time?n How much time do you need?
n To get a rough idea:
l Postgres (about 800,000 lines of code)
Hundreds of man-years of work
l Oracle (about 8,000,000 lines of code)
Probably thousands of man-years of work?
n Hmm, … probably not!
n Maybe a limited research prototype or new feature ;-)
©Silberschatz, Korth and Sudarshan1.30CS425 – Fall 2014 – Boris Glavic
History of Database Systems
n 1950s and early 1960s:
l Data processing using magnetic tapes for storage
Tapes provided only sequential access
l Punched cards for input
n Late 1960s and 1970s:
l Hard disks allowed direct access to data
l Network and hierarchical data models in widespread use
l Ted Codd defines the relational data model
Would win the ACM Turing Award for this work
IBM Research begins System R prototype
UC Berkeley begins Ingres prototype
l High-performance (for the era) transaction processing
©Silberschatz, Korth and Sudarshan1.31CS425 – Fall 2014 – Boris Glavic
History (cont.)
n 1980s:
l Research relational prototypes evolve into commercial systems
SQL becomes industrial standard
l Parallel and distributed database systems
l Object-oriented database systems
n 1990s:
l Large decision support and data-mining applications
l Large multi-terabyte data warehouses
l Emergence of Web commerce
n Early 2000s:
l XML and XQuery standards
l Automated database administration
n Later 2000s:
l Giant data storage systems
Google BigTable, Yahoo PNuts, Amazon, ..
©Silberschatz, Korth and Sudarshan1.32CS425 – Fall 2014 – Boris Glavic
Recap
n Why databases?
n What do databases do?
n Data independence
l Physical and Logical
n Database design
n Data models
l Relational, object, XML, network, hierarchical
n Query languages
l DML
l DDL
n Architecture and systems aspects of database systems
l Recovery
l Concurrency control
l Query processing (optimization)
l File organization and indexing
n History of databases
©Silberschatz, Korth and Sudarshan1.33CS425 – Fall 2014 – Boris Glavic
End of Chapter 1