CS 435 Database Systems. Chapter 1 An Overview of Database Management

Preview:

Citation preview

CS 435Database Systems

Chapter 1

An Overview of Database Management

What is a database?

“An electronic filing cabinet”

“A repository for a collection of computerized data files”

A collection of interrelated data.

“Descriptions”--not definitions

So, how is that different from a file?

File processing systems

• Independent systems

• Each has its own definition of data

• Each has its own data formats

File processing systems

• Independent systems. Each has its own definition of data. Each has its own data formats

Faculty Data File

Payroll System

Reports

Class Data File

Class Scheduling System

Reports

Student Data File

Grade Posting System

Reports

File processing systems

Faculty Data File

Payroll System

Reports

Class Data File

Class Scheduling System

Reports

Problems of inconsistency.

May need faculty member name in each file. May be recorded differently in each.

Database systems

• A single data definition

• All data (potentially) accessible from each application

• Less paperwork exchange between applications

Database systemsA single data definition

All data (potentially) accessible from each application

Faculty Data

Class Data

Student Data

Data DefinitionDatabase Management System

Payroll System

Reports

Class Scheduling System

Reports

Grade Posting System

Reports

Less paperwork exchange between applications

So, a database is a collection of "files," or at least a collection of data that would otherwise usually exist in multiple files.

What is a database management system?

The software that makes it possible for multiple applications and multiple users to access the same (single) set of data.

The software that enables users to access and share the single set of integrated data without concern about files and file structure.

What is a database system?

• The data

• The database software (database management system)

• The other software (applications)

• The hardware where the data and software reside (and execute)

• The users who use the system

data

Computerhardware

software

data

Computerhardware

users

A database system is the collection of data, the software to provide access to that data, (and the hardware upon which the data and software reside and execute.)

To that we can also add the users. They are also part of the "system."

+-----+----------------+---------------+------+---------+-------+| bin | wine | producer | year | bottles | ready |+-----+----------------+---------------+------+---------+-------+| 2 | Chardonnay | Buena Vista | 2001 | 1 | 2003 || 3 | Chardonnay | Geyser Peak | 2001 | 5 | 2003 || 6 | Chardonnay | Simi | 2000 | 4 | 2002 || 12 | Joh. Riesling | Jekel | 2002 | 1 | 2003 || 21 | Fume Blanc | Ch. St. Jean | 2001 | 4 | 2003 || 22 | Fume Blanc | Robt. Mondavi | 2000 | 2 | 2002 || 30 | Gewurztraminer | Ch. St. Jean | 2002 | 3 | 2003 || 43 | Cab. Sauvignon | Windsor | 1995 | 12 | 2004 || 45 | Cab. Sauvignon | Geyser Peak | 1998 | 12 | 2006 || 48 | Cab. Sauvignon | Robt. Mondavi | 1997 | 12 | 2008 || 50 | Pinot Noir | Gary Farrell | 2000 | 3 | 2003 || 51 | Pinot Noir | Fetzer | 1997 | 3 | 2004 || 52 | Pinot Noir | Dehlinger | 1999 | 2 | 2002 || 58 | Merlot | Clos du Bois | 1998 | 9 | 2004 || 64 | Zinfandel | Cline | 1998 | 9 | 2007 || 72 | Zinfandel | Rafanelli | 1999 | 2 | 2007 |+-----+----------------+---------------+------+---------+-------+

Date’s “CELLAR” Example

Date’s “CELLAR” Example

Retrieval:

select wine, bin_num, producer

from Cellar

where ready = '2000' ;

Result:

+----------------+-----+--------------+| wine | bin | producer |+----------------+-----+--------------+| Cab. Sauvignon | 43 | Windsor || Pinot Noir | 51 | Fetzer || Merlot | 58 | Clos du Bois |+----------------+-----+--------------+

3 rows in set (0.00 sec)

Date’s “CELLAR” Example

Inserting new data:

insert

into Cellar

values (53, 'Pinot Noir', 'Saintsbury', 2003, 6, 2008);

Date’s “CELLAR” Example

Changing existing data:

update Cellar

set bottles = 4

where bin_num = 3;

Deleting existing data:

delete

from cellar

where bin_num = 2;

+-----+----------------+---------------+------+---------+-------+| bin | wine | producer | year | bottles | ready |+-----+----------------+---------------+------+---------+-------+| 3 | Chardonnay | Geyser Peak | 2001 | 4 | 2003 || 6 | Chardonnay | Simi | 2000 | 4 | 2002 || 12 | Joh. Riesling | Jekel | 2002 | 1 | 2003 || 21 | Fume Blanc | Ch. St. Jean | 2001 | 4 | 2003 || 22 | Fume Blanc | Robt. Mondavi | 2000 | 2 | 2002 || 30 | Gewurztraminer | Ch. St. Jean | 2002 | 3 | 2003 || 43 | Cab. Sauvignon | Windsor | 1995 | 12 | 2004 || 45 | Cab. Sauvignon | Geyser Peak | 1998 | 12 | 2006 || 48 | Cab. Sauvignon | Robt. Mondavi | 1997 | 12 | 2008 || 50 | Pinot Noir | Gary Farrell | 2000 | 3 | 2003 || 51 | Pinot Noir | Fetzer | 1997 | 3 | 2004 || 52 | Pinot Noir | Dehlinger | 1999 | 2 | 2002 || 58 | Merlot | Clos du Bois | 1998 | 9 | 2004 || 64 | Zinfandel | Cline | 1998 | 9 | 2007 || 72 | Zinfandel | Rafanelli | 1999 | 2 | 2007 || 53 | Pinot Noir | Saintsbury | 2003 | 6 | 2008 |+-----+----------------+---------------+------+---------+-------+

CELLAR changed from 5 to 4

row for bin 2 deleted

new row inserted

Note that the “CELLAR database" looks like a "table," and in fact,that is what it is.

In particular it is a relational table, or just a "relation."

Aside regarding tables…

and, “looks like a table.”

+-----+----------------+---------------+------+---------+-------+| bin | wine | producer | year | bottles | ready |+-----+----------------+---------------+------+---------+-------+| 2 | Chardonnay | Buena Vista | 2001 | 1 | 2003 || 3 | Chardonnay | Geyser Peak | 2001 | 5 | 2003 || 6 | Chardonnay | Simi | 2000 | 4 | 2002 || 12 | Joh. Riesling | Jekel | 2002 | 1 | 2003 || 21 | Fume Blanc | Ch. St. Jean | 2001 | 4 | 2003 || 22 | Fume Blanc | Robt. Mondavi | 2000 | 2 | 2002 || 30 | Gewurztraminer | Ch. St. Jean | 2002 | 3 | 2003 || 43 | Cab. Sauvignon | Windsor | 1995 | 12 | 2004 || 45 | Cab. Sauvignon | Geyser Peak | 1998 | 12 | 2006 || 48 | Cab. Sauvignon | Robt. Mondavi | 1997 | 12 | 2008 || 50 | Pinot Noir | Gary Farrell | 2000 | 3 | 2003 || 51 | Pinot Noir | Fetzer | 1997 | 3 | 2004 || 52 | Pinot Noir | Dehlinger | 1999 | 2 | 2002 || 58 | Merlot | Clos du Bois | 1998 | 9 | 2004 || 64 | Zinfandel | Cline | 1998 | 9 | 2007 || 72 | Zinfandel | Rafanelli | 1999 | 2 | 2007 |+-----+----------------+---------------+------+---------+-------+

“Tables”

rows

column “headings”

columns

Columns are aligned:

i.e., strings left justified

numbers right justified

+-----+----------------+---------------+------+---------+-------+| bin | wine | producer | year | bottles | ready |+-----+----------------+---------------+------+---------+-------+| 2 | Chardonnay | Buena Vista | 2001 | 1 | 2003 || 3 | Chardonnay | Geyser Peak | 2001 | 5 | 2003 || 6 | Chardonnay | Simi | 2000 | 4 | 2002 || 12 | Joh. Riesling | Jekel | 2002 | 1 | 2003 || 21 | Fume Blanc | Ch. St. Jean | 2001 | 4 | 2003 || 22 | Fume Blanc | Robt. Mondavi | 2000 | 2 | 2002 || 30 | Gewurztraminer | Ch. St. Jean | 2002 | 3 | 2003 || 43 | Cab. Sauvignon | Windsor | 1995 | 12 | 2004 || 45 | Cab. Sauvignon | Geyser Peak | 1998 | 12 | 2006 || 48 | Cab. Sauvignon | Robt. Mondavi | 1997 | 12 | 2008 || 50 | Pinot Noir | Gary Farrell | 2000 | 3 | 2003 || 51 | Pinot Noir | Fetzer | 1997 | 3 | 2004 || 52 | Pinot Noir | Dehlinger | 1999 | 2 | 2002 || 58 | Merlot | Clos du Bois | 1998 | 9 | 2004 || 64 | Zinfandel | Cline | 1998 | 9 | 2007 || 72 | Zinfandel | Rafanelli | 1999 | 2 | 2007 |+-----+----------------+---------------+------+---------+-------+

Separating lines provided by MySQL

Separating lines provided by textbook publisher

bin wine producer year bottles ready

2 Chardonnay Buena Vista 2001 1 2003 3 Chardonnay Geyser Peak 2001 5 2003 6 Chardonnay Simi 2000 4 2002 12 Joh. Riesling Jekel 2002 1 2003 21 Fume Blanc Ch. St. Jean 2001 4 2003 22 Fume Blanc Robt. Mondavi 2000 2 2002 30 Gewurztraminer Ch. St. Jean 2002 3 2003 43 Cab. Sauvignon Windsor 1995 12 2004 45 Cab. Sauvignon Geyser Peak 1998 12 2006 48 Cab. Sauvignon Robt. Mondavi 1997 12 2008 50 Pinot Noir Gary Farrell 2000 3 2003 51 Pinot Noir Fetzer 1997 3 2004 52 Pinot Noir Dehlinger 1999 2 2002 58 Merlot Clos du Bois 1998 9 2004 64 Zinfandel Cline 1998 9 2007 72 Zinfandel Rafanelli 1999 2 2007

No separating lines

So, a database is usually said to consist of tables rather than files.

The rows of the tables would be the "records" of a file.

The columns of the table are the "fields" of those records.

Note that the "database," the collection of tables, is a logical concept, a data structure.

The database software (the database manager, the DBMS) provides

the mapping of the logical database

into one or more logical files, and ultimately into a physical representation on disk.

Thus, there are stored files, stored records, and stored fields.

(Sometimes the DBMS shares this mapping with the operating system)

Parts+------+-------+-------+--------+--------+| pnum | pname | color | weight | city |+------+-------+-------+--------+--------+| P1 | Nut | Red | 12.0 | London || P2 | Bolt | Green | 17.0 | Paris || P3 | Screw | Blue | 17.0 | Rome || P4 | Screw | Red | 14.0 | London || P5 | Cam | Blue | 12.0 | Paris || P6 | Cog | Red | 19.0 | London |+------+-------+-------+--------+--------+

Say this parts table is stored in the database as a file

Stored database

Other stored files

P1 Nut Red 12.0

P2 Bolt Green 17.0

“Parts” stored

file

Two occurrences of the “part” stored record type.

Stored field occurrences

…and the table rows become records in the file, the columns fields within each record

P1 Nut Red 12.0

But, for example, the data for a part (a table row):

P1 12.0

P1 Nut Red

might be stored as two records:

and

Stored database

Other stored files

P1 Nut Red 12.0

P2 Bolt Green 17.0

“Parts” stored

file

Parts+------+-------+-------+--------+--------+| pnum | pname | color | weight | city |+------+-------+-------+--------+--------+| P1 | Nut | Red | 12.0 | London || P2 | Bolt | Green | 17.0 | Paris || P3 | Screw | Blue | 17.0 | Rome || P4 | Screw | Red | 14.0 | London || P5 | Cam | Blue | 12.0 | Paris || P6 | Cog | Red | 19.0 | London |+------+-------+-------+--------+--------+

DBMS

Data independence--the immunity of applications to change in physical representation.

The DBMS relieves the user of any concern about how the data is represented physically.

+-----+----------------+---------------+------+---------+-------+| bin | wine | producer | year | bottles | ready |+-----+----------------+---------------+------+---------+-------+| 2 | Chardonnay | Buena Vista | 2001 | 1 | 2003 || 3 | Chardonnay | Geyser Peak | 2001 | 5 | 2003 || 6 | Chardonnay | Simi | 2000 | 4 | 2002 || 12 | Joh. Riesling | Jekel | 2002 | 1 | 2003 || 21 | Fume Blanc | Ch. St. Jean | 2001 | 4 | 2003 || 22 | Fume Blanc | Robt. Mondavi | 2000 | 2 | 2002 || 30 | Gewurztraminer | Ch. St. Jean | 2002 | 3 | 2003 || 43 | Cab. Sauvignon | Windsor | 1995 | 12 | 2004 || 45 | Cab. Sauvignon | Geyser Peak | 1998 | 12 | 2006 || 48 | Cab. Sauvignon | Robt. Mondavi | 1997 | 12 | 2008 || 50 | Pinot Noir | Gary Farrell | 2000 | 3 | 2003 || 51 | Pinot Noir | Fetzer | 1997 | 3 | 2004 || 52 | Pinot Noir | Dehlinger | 1999 | 2 | 2002 || 58 | Merlot | Clos du Bois | 1998 | 9 | 2004 || 64 | Zinfandel | Cline | 1998 | 9 | 2007 || 72 | Zinfandel | Rafanelli | 1999 | 2 | 2007 |+-----+----------------+---------------+------+---------+-------+

CELLAR

Suppose we want to add some information about each wine.

+-----+----------------+-------|---------------+------+-----+-------+| bin | wine | type | producer | year | qty | ready |+-----+----------------+-------|---------------+------+-----+-------+| 2 | Chardonnay | white | Buena Vista | 2001 | 1 | 2003 || 3 | Chardonnay | white | Geyser Peak | 2001 | 5 | 2003 || 6 | Chardonnay | white | Simi | 2000 | 4 | 2002 || 12 | Joh. Riesling | white | Jekel | 2002 | 1 | 2003 || 21 | Fume Blanc | white |Ch. St. Jean | 2001 | 4 | 2003 || 22 | Fume Blanc | white | Robt. Mondavi | 2000 | 2 | 2002 || 30 | Gewurztraminer | white | Ch. St. Jean | 2002 | 3 | 2003 || 43 | Cab. Sauvignon | red | Windsor | 1995 | 12 | 2004 || 45 | Cab. Sauvignon | red | Geyser Peak | 1998 | 12 | 2006 || 48 | Cab. Sauvignon | red | Robt. Mondavi | 1997 | 12 | 2008 || 50 | Pinot Noir | red | Gary Farrell | 2000 | 3 | 2003 || 51 | Pinot Noir | red | Fetzer | 1997 | 3 | 2004 || 52 | Pinot Noir | red | Dehlinger | 1999 | 2 | 2002 || 58 | Merlot | red | Clos du Bois | 1998 | 9 | 2004 || 64 | Zinfandel | red | Cline | 1998 | 9 | 2007 || 72 | Zinfandel | red | Rafanelli | 1999 | 2 | 2007 |+-----+----------------+-------|---------------+------+-----+-------+

for example:

“redundancies”Chardonnay is white--3 times

Pinot Noir is red--3 times

So, more tables--for example:

• Wine: Name, Type, Description, Characteristic

• Producer: Name, Area, Appellation

Wines

+----------------+-----------+------------------+---------------------+| wine_name | wine_type | wine_description | wine_characteristic |+----------------+-----------+------------------+---------------------+| Chardonnay | white | dry | buttery || Joh. Riesling | white | semi-sweet | fruity || Fume Blanc | white | dry | smoky || Gewurztraminer | white | semi-sweet | spicy || Cab. Sauvignon | red | dry | oaky || Pinot Noir | red | dry | fruity || Merlot | red | dry | plummy || Zinfandel | red | dry | spicy |+----------------+-----------+------------------+---------------------+

Note that this gives us the ability to describe a wine as "Red" in one place, rather than adding it to the CELLAR table and repeating it each time that wine appears.

This eliminates "redundancy."

Producers

+--------------+-----------------+-------------+| name | area | appellation |+--------------+-----------------+-------------+| Fetzer | Hopland | Mendocino || Gary Farrell | Russian River V | Sonoma || Geyser Peak | Alexander Valle | Sonoma || Jekel | Arroyo Seco | Monterey || . | . | . || . | . | . || . | . | . || . | etc. | . |+--------------+-----------------+-------------+

Similarly,

Thus the database, (the collection of tables) is "integrated," i.e., the entirety of the data is formed by use of all of the tables.

Cellar+-----+----------------+---------------+------+---------+-------+| bin | wine | producer | year | bottles | ready |+-----+----------------+---------------+------+---------+-------+| 2 | Chardonnay | Buena Vista | 2001 | 1 | 2003 || 3 | Chardonnay | Geyser Peak | 2001 | 5 | 2003 || 6 | Chardonnay | Simi | 2000 | 4 | 2002 || 12 | Joh. Riesling | Jekel | 2002 | 1 | 2003 || 21 | Fume Blanc | Ch. St. Jean | 2001 | 4 | 2003 || 22 | Fume Blanc | Robt. Mondavi | 2000 | 2 | 2002 || 30 | Gewurztraminer | Ch. St. Jean | 2002 | 3 | 2003 || 43 | Cab. Sauvignon | Windsor | 1995 | 12 | 2004 || 45 | Cab. Sauvignon | Geyser Peak | 1998 | 12 | 2006 || 48 | Cab. Sauvignon | Robt. Mondavi | 1997 | 12 | 2008 || 50 | Pinot Noir | Gary Farrell | 2000 | 3 | 2003 || 51 | Pinot Noir | Fetzer | 1997 | 3 | 2004 || 52 | Pinot Noir | Dehlinger | 1999 | 2 | 2002 || 58 | Merlot | Clos du Bois | 1998 | 9 | 2004 || 64 | Zinfandel | Cline | 1998 | 9 | 2007 || 72 | Zinfandel | Rafanelli | 1999 | 2 | 2007 |+-----+----------------+---------------+------+---------+-------+

Wines+----------------+-----------+------------------+---------------------+| wine_name | wine_type | wine_description | wine_characteristic |+----------------+-----------+------------------+---------------------+| Chardonnay | white | dry | buttery || Joh. Riesling | white | semi-sweet | fruity || Fume Blanc | white | dry | smoky || Gewurztraminer | white | semi-sweet | spicy || Cab. Sauvignon | red | dry | oaky || Pinot Noir | red | dry | fruity || Merlot | red | dry | plummy || Zinfandel | red | dry | spicy |+----------------+-----------+------------------+---------------------+

Producers+--------------+-----------------+-------------+| name | area | appellation |+--------------+-----------------+-------------+| Fetzer | Hopland | Mendocino || Gary Farrell | Russian River V | Sonoma || Geyser Peak | Alexander Valle | Sonoma || Jekel | Arroyo Seco | Monterey || . | . | . |

The database: a collection of "files," or at least a collection of data that would otherwise usually exist in multiple files.

And, we can add

Maps to the wineries

Photographs of wineries, wine bottles

Recordings of our own “tasting notes”

Etc.

The data can also be "shared," by programs or by users.

(Single-user vs. multi-user systems.)

Persistent vs. Transient data

The persistent data. Once put into the database, it stays there until explicitly removed.

Faculty Data

Class Data

Student Data

Data DefinitionDatabase Management System

Payroll System

Reports

Class Scheduling System

Reports

Grade Posting System

Reports

Transient or ephemeral data. Input, output, intermediate results.

Another definition:

A database is a collection of persistent data that is used by the application systems of some given enterprise.

What are applications? Application programs? Application systems?

What is an “enterprise”?

The Benefits of a Database

• Makes possible, supports, enhances– Rapid availability of current data.– Reduced redundancy.– Less inconsistency.– Sharing of data among users, applications– Enforcement of standards– Enforcement of security– Maintenance of integrity

Entity Relationship Modeling

Entities: The “things” that we need to record data about.

Relationships: How these things are related to one another.

Entities: The “things” that we need to record data about.

PeopleProductsPlacesProcessesPoliciesPaper (documents)

Relationships: How these things are related to one another--connections between and among the “things” and their data;

Which people make which products

Which products are stored in which places

What places use what processes

What processes require what policies

the relationships

ProductsPeople

People make products.

Products are made by people.

Relationships are bi-directional.

Entity Entity

Relationship

ProductsPeople

Relationships are bi-directional.

Entity Entity

Relationship

For example:

Given a person, find which products that person makes.

Given a product, find which people make that product.

binary relationship

ternary relationship

recursive relationship

A “horizontal,” or row , subset of the table CELLAR

A “vertical,” or column, subset

Operations on tables produce only tables.

Relational Database Management Systems

DB2

Ingres

Informix

Microsoft SQL Server

Oracle

Sybase

Other Database Management System “Models”

(pre-relational)

Hierarchical (tree structure)

Network (graph structure)

Inverted list

Other (new) Approaches

(post-relational)

Deductive

Expert

Extendable

Object Oriented

Semantic

Universal Relation

DBMSs

People

Data Administrator

High level position

Responsible for defining the data to be maintained

Makes policy (regarding security, etc.)

Non-”technical”

Database Administrator

Creates the database

Implements the policies

IT professional

People

Application Programmers

Write the programs to maintain the database

and provide access to it

Need to know only the external view of the DB

End Users

Interact with the programs to enter data,

change data, generate reports

May not need to know anything about the DB

Application Programs

4GL Systems

Interfaces

Query Language Processor

Command Driven Menu or Forms Driven

1.1 Define the following terms:

binary relationship menu-driven interface

command-driven interface multi-user system

concurrent access online application

data administration persistent data

database property

database system query language

data independence redundancy

DBA relationship

DBMS security

entity sharing

entity/relationship diagram stored field

forms-driven interface stored file

integration stored record

integrity transaction

Recommended