View
232
Download
0
Tags:
Embed Size (px)
Citation preview
File and Databases
CS208
File Organization
The three principal file organizations are Sequential Direct Indexed Sequential
Sequential File Organization
Records physically stored one after another in order determined by key field
Advantages: Very efficient when many or all records in a
file need to be accessed Cost of tape and drives very low
Disadvantages: Major drawback is very slow access time for a
particular record. Must rewrite all records following record
insertion.
Direct File Organization
Records stored at a specific address, determined by their key field A mathematical technique called hashing converts
the key field value into a corresponding address Advantages:
Record can be quickly accessed by going directly to its address
Disadvantages: Must be done using random access storage
(disk/optical) which have higher cost than sequential
Can only use one key
Indexed SequentialFile Organization
Compromise between direct and sequential methods Records stored sequentially Index created that records the address of each
record Advantages:
Good compromise between previous two methods Can have multiple index tables to use multiple
keys Disadvantages:
Slower than direct-access
What is a “database”?
A “database” is just a collection of related data.
Databases can exist in many forms. Examples: Sheets of paper in folders in a file cabinet A book (think of it as a collection of sentences
and illustrations) Books in a collection (e.g., a library) Sets of 3"x5" cards containing notes Maps and other geographic information systems Blood samples in a medical laboratory
Common Elements Sets of data and information composed
of, and/or represented by: bits alphanumeric symbols lines and shapes in drawings, pictures, and
maps audio and video recordings actual substances
A means by which the sets of data and information are organized in order to facilitate access to individual desired sets
Examples of Access Methods
Phone book - Collection of several independent databases, each consisting of names and corresponding phone numbers: Blue-pages governmental listings:
primary arrangement alphabetical by type of government (city, county, state, federal),
secondary arrangement alphabetical by agency within type of government,
tertiary arrangement alphabetical by office within agency
Examples of Access Methods (continued)
White-pages personal listings: arranged alphabetically by surname within surname by first names
Yellow-pages listings: primary arrangement by type of business, secondary arrangement alphabetically by
company within type of business, plus various special groupings (e.g.,
restaurants by ethnic type)
Flat File Databases (DBs)
Flat-file DBs are like the DBs you can construct in a single spreadsheet page All the information in the DB is in one
file consisting of one array of rows and columns.
For example:SSN Surname First Name(s) Telephone Number123-45-6789 Doe J ohn X. 303-555-1234987-65-4321 Smith Martin 720-111-2222567-89-0123 J ohnson Billy Bob 303-444-5555
Flat File Advantages/Disadvantages
Advantages Simple Good for few records with few fields
Disadvantages Unnecessary duplication of data or data
redundancy Inconsistent, incomplete or inaccurate data,
lacking data integrity Changes in data are difficult to implement Separate and isolated data with limited data
sharing
Data Heirarchcy in Computerized Databases
Character - single letter, number or special character
Field - a set of related characters Record - collection of related fields File - collection of related records Database - collection of related files
Types of Databases
Individual - often on a PC used by one person
Company or shared - usually on a mainframe and managed by a database administrator Example: Common operational
databases contain information about company operations
Distributed databases - have data stored in multiple locations, but the data is accessible through communications networks
Proprietary databases - created by an organization, and stored information is offered to others for a fee Examples include Dialog Information Services
and Dow Jones Interactive Publishing
Types of Databases
Database Terminology
Database Management System (DBMS) –Allows a user to deal with data in logical terms, without having to understand the computer's physical view.
Logical data view – How humans see things
Physical data view – How things are stored in a computer
DBMS – provides storage, retrieval, analysis, sorting, and printing of information in a database.
DataBase Management System (DBMS)
DBMS is:
• A collection of program independent, interrelated data
• A set of programs to access the data
• Information about a particular enterprise
• An environment that is both convenient and efficient to use.
Current DBMS Systems Mainframe database vendors:
Oracle IBM DB2 Microsoft (SQL Server) Sybase Informix
Desktop: MS Access Borland Paradox
Some free database systems (Unix) :
Postgres MySQL, mSQL Predator
DBMS Uses
The processing power of a DBMS allows it to: Sort Match Link Aggregate Skip fields Calculate Arrange
Database Administrator (DBA)
Coordinates all the activities of the database system.
Must have a good understanding of the enterprise’s information resources and needs.
Database Administrator (DBA)
Database administrator's duties include: Database definition Database modification Granting user authority to access the
database for security and privacy Specifying integrity constraints Acting as liaison with users Monitoring DB performance and
responding to changes in requirements Decides strategy for backup and
recovery
Types of DBMS Users End-User:
Non-specialist accessing data via a query language Naïve user accessing data via a special-purpose
interface Performs data retrieval and update (extend/modify)
Applications Programmer: Writes programs that use the DB by embedding
queries to the DB in a HLL Develops interfaces for the naïve user
DBMS Organization
The four principal DBMS organizations are Hierarchical Network Relational Object-Oriented
Hierarchical Databases Viewed as branches of an upside-down tree
Each item is subordinate to its parent item Only one parent per item
Any element (node) in the database is linked only to the elements directly above it and directly below it. If parent node is deleted, all the child nodes
are as well New parent node must be created before
adding a new child node No direct relationships between child nodes
Hierarchical Databases Limited by rigid structure Typically require custom programming Example:
Original computer-based databases were designed for banking.
Hierarchical databases were appropriate for such purposes, e.g.:
individual accounts can be grouped by family or business;
sets of accounts, grouped by branch; accounts in different branches, grouped by
city; accounts in different cities, grouped by state.
Hierarchical Model
- Must always begin at top to search for data- One parent per child, no other relationships allowed
CourseNum Name Section
DeptNum Name
IDNum Name
IDNum Name Course
Department
Courses Students
Professors
Network Databases Permits links among all components
(i.e. elements can be linked to other elements anywhere in the database, not just those directly above and below)
The interconnected design allows for access via multiple pathways
Can be extremely difficult to manage
The World-Wide Web is a very large example of a network database.
Network Model
- Added data paths across the tree (instead of just up and down)- Reduced time required to access data, but increased overhead space requirements
CourseNum Name Section
DeptNum Name
IDNum Name
IDNum Name Course
Department
Courses
Students
Professors
Relational Databases
A relational database is a set of one or more tables that together embody information about a set of related concepts and entities.
The tables are connected (related) via fields within the table that are shared by a pair of tables.
Relational Model
Based on tables of objects (the data), rather than specific paths (ways to access the data)
Department
CoursesStudents
Professors
DeptNum Name
… …… …… …
IDNum Name DeptNum Course
… … … …… … … …… … … …
Num ProfID Section
… … …… … …… … …
IDNum Name Course
… … …… … …… … …
Relational Database Rules
Each row is unique (distinct) Each column name is unique within a
table It is permissible to have the same name
for a column in two different tables in the same database. To distinguish between them we use
a qualified name: Pet.Name vs Family.Name
Relational Databases In a Relational DB, the information
content of a table does not depend on either The order of the rows; or The order of the columns
In other words, the rows and columns of a table can be rearranged at will without affecting the table's information content
File (Table) = RelationRows (Tuples) = RecordsColumns (Attributes) = Fields
Relationships between objects are defined by common attributes
Num Name Credits
CS208 CS Fundamentals 3
MT360 Calculus 4
CS320 C Programming 3
Courses
Relational DB Terminology
Records(Rows, Tuples)
Fields(Columns, Attributes)
Primary Keys
In a Relational DB, each table Must have a primary key (unique identifier) Must have no duplicate rows
A primary key is A data attribute (column), or a combination
of attributes, that uniquely identifies each record in the table.
A simple key consists of a single attribute A composite key consists of two or more
attributes
Primary Keys (continued)
Primary Key Provides unique way to identify each
record Can be obvious from the structure of the
table. If there is no easy natural choice, you can add a column containing a unique identifier.
May consist of the entire record (especially with two-column tables, which occur often in the development of RDBs)
Other Types of Keys
Secondary Key A column that is used to aid in the
retrieval of information from a table. A secondary key is not required to have
unique values in each of its rows,
Foreign Key A column used to retrieve information
from one table (i.e., is a secondary key) that is also the primary key in another table. Foreign keys are a major tool in Relational DBs.
Relational Database Design
Design Goals:
Avoid redundant data
Ensure that relationships among attributes are represented
Facilitate checking updates for violation of database integrity constraints
Creating a Student Database
Student ID First Name Middle Initial Last Name Home Address School Address Street Address City State Zip
Home Phone Work Phone Cell Phone Course ID Course Name Course Instructor
Database NormalizationOptimize the Tables by:
Storing each piece of data once and only once (i.e. Eliminate redundant data).
Ensuring data dependencies make sense (only storing related data within a particular table).
Maintaining data integrity.
First Normal Form
Eliminate repeating groups in individual tables.
Create a separate table for each set of related data.
Identify each set of related data with a primary key.
1st Normal Student DatabaseStudentStudent IDFirst NameMiddle InitialLast Name Address
Student IDAddress TypeStreet AddressCityStateZip
PhoneStudent IDPhone TypePhone Number
StudentCourseStudent IDCourse IDCourse NameCourse Instructor
Second Normal Form
Create separate tables for sets of values that apply to multiple records
i.e. Remove partial data dependencies.
Relate these tables using a foreign key.
2nd Normal Student DatabaseStudentStudent IDFirst NameMiddle InitialLast Name
AddressStudent IDAddress TypeStreet AddressZip
PhoneStudent IDPhone TypePhone Number
StudentCourseStudent IDCourse IDCourse NameCourse Instructor
ZipCodeZipCityState
Third Normal Form
Eliminate any fields that do not depend on the key.
3rd Normal Student DatabaseStudentStudent IDFirst NameMiddle InitialLast Name
AddressStudent IDAddress TypeStreet AddressZip
PhoneStudent IDPhone TypePhone Number
StudentCourseStudent IDCourse ID
ZipCodeZipCityState
CourseCourse IDCourse NameCourse Instructor
Relational Design Summary
Data is stored in records, inside of tables
Primary keys uniquely identify a record
Foreign keys link data in one table to the primary key in another table
Designs should maintain data integrity
Normalization concepts should be used
Dis/Advantages of Relational Datebases
Disadvantages Require more overhead
Advantages Cut down on needless repetition of
information Ensure more accuracy Facilitate updating and deletion of
information. Design avoids errors that occur when
adding/deleting information from flat files
Query Languages
Query languages allow non-programmers to question the DBMS
Structured Query Language (SQL) –the only standard structured query language
Structured Query Language (SQL)
Pronounced either "S, Q, L" or "sequel"
Widely used standard set of commands and syntax for doing things with Relational DBMSs
Used especially for query and retrieval
Includes commands for defining Relational DBs, conducting transactions, storing data, etc.
Each Relational DBMS also has additional features unique to it, because SQL does not handle all the practical details involved in using a Relational DB
Select
The select clause directs the DBMS to choose a subset of fields from one or more tables.
For example,
SELECT last_name, first_name
chooses all last names and first names and will place them in this order in the result set.
From clause The from clause directs the query toward
one or more tables. from Student (directs the query to use
the student table) Used with the SELECT clause:
SELECT last_name, first_name FROM Student
chooses the last and first name of all rows in Student table.
Note: As long as the column names within the tables used in the from clause are unique, qualified names are not required.
Where clause
A where clause specifies a selection criterion that we will use to limit our choice of records.
A where clause follows the name of the table.
Format:
SELECT field1,…, fieldN
FROM tablename
WHERE (boolean expression)
Boolean Expressions
Boolean expressions used in where clauses typically involve a field compared to another field or a field compared to a value.
For example:
WHERE GPA > 3.50
WHERE last_name > “M”
WHERE amount_owed >= amount_paid
Comparison Operators
= is equal to
> is greater than
< is less than
>= is greater than or equal to
<= is less than or equal to
not is not equal to
Simple SQL QueryTitle Price Category Publisher
Java Intro 109.99 ComputersPrentice
Hall
Calculus 129.99 Math Kaufman
Advanced C
115.99 Computers Tech Inc
Philosophy 83.99 Lib ArtsPrentice
HallSELECT *FROM TextbooksWHERE category=‘Computers’
SELECT *FROM TextbooksWHERE category=‘Computers’
Textbooks
Title Price Category Publisher
Java Intro 109.99 ComputersPrentice
Hall
Advanced C
115.99 Computers Tech Inc
“selection”
Simple SQL QueryTitle Price Category Publisher
Java Intro 109.99 ComputersPrentice
Hall
Calculus 129.99 Math Kaufman
Advanced C
115.99 Computers Tech Inc
Philosophy 83.99 Lib ArtsPrentice
HallSELECT Title, Price, PublisherFROM TextbooksWHERE Price < 110
SELECT Title, Price, PublisherFROM TextbooksWHERE Price < 110
Textbooks
“selection” and“projection”
Title Price Publisher
Java Intro 109.99Prentice
Hall
Philosophy 83.99Prentice
Hall
Joins
You can use more than one table in a query.
The join clause tells the DBMS that you are using two tables.
You must also specify the fields on which you are linking the files, using the ON clause.
Example
SELECT Textbooks.Title, Price, RequiredFROM Textbooks, CourseTextsON Textbooks.Title = CourseTexts.TitleWHERE Category = ‘Computers’
The dot (.) notation specifies the table then the field within the table to reconcile the ambiguity with the two names.
Join SQL Query
Title Price Category Publisher
Java Intro 109.99 ComputersPrentice
Hall
Calculus 129.99 Math Kaufman
Adv C 115.99 Computers Tech Inc
Philosophy 83.99 Lib ArtsPrentice
Hall
Textbooks
Title Course Required
Java Intro CS434 Yes
Adv C CS422 No
CourseTexts
Title Price Required
Java Intro 109.99 Yes
Adv C 115.99 No
SELECT Textbooks.Title, Price, Required
FROM Textbooks, CourseTexts
ON Textbooks.Title = CourseTexts.Title
WHERE Category = ‘Computers’
SELECT Textbooks.Title, Price, Required
FROM Textbooks, CourseTexts
ON Textbooks.Title = CourseTexts.Title
WHERE Category = ‘Computers’
Database Concerns
Privacy/Unauthorized Access to Data
Data is easier to gather and exploit using a computer, so precautions must be taken to guard the data.
Accuracy/Completeness
Owner of database must ensure accuracy of the data.
Users must take data with grain of salt -
must verify