View
226
Download
0
Category
Preview:
Citation preview
Lecture 8Logical Database
Design
SFDV2002 - Principles of Information Systems
2
3
Levels of Information DesignHigh
Low
Employee
SalaryEmployee Project Role
Project
Budget
1: Specification
2: ImplementationCREATE TABLE department( dept_code CHAR(4), name VARCHAR2(30) NOT NULL,
PRIMARY KEY (dept_code), UNIQUE (name));
CREATE TABLE employee( emp_id NUMBER(7), firstnames VARCHAR2(50) NOT NULL, surname VARCHAR2(50) NOT NULL, phone VARCHAR2(15), sex CHAR(1) DEFAULT 'F', dept_code CHAR(4) NOT NULL,
PRIMARY KEY (emp_id), FOREIGN KEY (dept_code) REFERENCES department );
CREATE TABLE department( dept_code CHAR(4), name VARCHAR2(30) NOT NULL,
PRIMARY KEY (dept_code), UNIQUE (name));
Ab
stra
ctio
n s
pect
rum
Conceptual
Logical
Physical
Business
System
Technology
4
Overview
Databases Choosing databases Features
Relationship Model Relations, Keys, etc. Integrity constraints Referential integrity
Transformation (ERD to Database)
5
Databases“… is a collection of persistent data that is used
by the application systems of some given enterprise.”
Logical organisation of data
Requires DBMS to be of any use
What are they good for?
Advantages over paperCompact: No need for tonnes of paper filesSpeed: Computers can retrieve and update information faster than humansDrudgery: Tedium of maintaining files is removedCurrency: Accurate, up-to-date information available anytimeProtection: Information better protected against unintentional loss
[Date, 2004]
6
Database ModelsHierarchical
Network
Relational
Object- Fit well with OOP, HybridXML
Tree structure – Data is organized in a top-down Suited to one to many relationshipsAdvantages
fast accessDisadvantages
Non-hierarchical data retrieval difficultOther relationships difficult to representHard to change data structure (modify)
Can represent many:many relationshipsAdvantage: fast accessDisadvantages: Inflexible
7
Which DBMS to use?
Database sizeNumbers of concurrent users -
scalabilityPerformance – how fastIntegration – ability to export import
data between applicationsFeatures – security, Etc.Vendor – reputation & financial
stability of the vendorCost
8
DBMS FeaturesData storage
managementData dictionaryData independenceSecurity
managementMulti-user access
controlBackup and recovery
management
Data integrity management
Performance monitoring and optimisation
Standardisation of data access
9
1- Data storage management (plus creation)•Controls the storing, retrieval, and updating of data.•Data independence – how the data is model is independent the actual physical storage•Accessed (defined) by the user – query language SQL (covered in the next lecture)2- Data dictionary management•Definitions of the data elements and their relationships (metadata) •Data types and structure (what is modelled by entities, attributes, and relationships) •Provide a standard definition of terms and data elements3- Data Independence•Automatic and invisible transformation and presentation (physical storage)•Data independence – logical, physical – DBMS need to control and make these transformation as transparently as possible.4- Security management•Enforces user security and data privacy within a database•Rules determine which users can access the databases – which data items and which operations (read, add, delete, or modify) the user may perform.•More important is multi-user databases5- Multi-user access control•More than one person trying to access the database at the same time •Procedures and process are required in order to maintain data integrity & data consistency (e.g. What happens when two people trying to change the same record at the same time6- Backup and recovery management•DBMS provide backup, both onsite (other computers) and offsite •Recovery plans for when data is lost
10
Recall Quality Information?Accurate Complete Economical
Current Relevant
11
Poor Database DesignInconsistent dataIncorrect dataMissing dataLost dataData redundancyEmploye
eSalar
yProject Name
Budget (M)
Role
Brown 20 Alpha 2 Technician
Green 35 Gamma 15 Designer
Green 35 Epsilon 9 Designer
Hoskins 55 Epsilon 9 Manager
Hoskins 55 Gamma 15 Consultant
Moore 48 Gamma 15 Manager
Moore 48 Epsilon 9 Designer
Inconsistent data: where contradictory facts are stored in the database, it is not always easy to identify which fact is the correct one, and which should be changed or removed. Example: For one person you may have two different dates of birth storedIncorrect data: Where facts do not reflect the real-world, Errors could result of poor data entry Or could be caused by data corruption after entryMissing data: Where a desired fact was never captured, Usually indicated with a NULL Lost data: Occurs when a previously stored fact has been deleted, either deliberately or accidentally Data Redundancy: When the same fact is stored twice – which lead to inconsistencies (anomalies). Example: Relation associates employees with projects – assume no nulls are allowed, Note that this doesn’t violate the rule that relations cannot have duplicate rows.
12
Anomaly 1: UpdateEmployee
Salary Project Budget
Role
Brown 20 Alpha 2 Technician
Green Gamma
15 Designer
Green Epsilon 9 Designer
Hoskins 55 Epsilon 9 Manager
Hoskins 55 Gamma
15 Consultant
Moore Gamma
15 Manager
Moore 48 Epsilon 9 Designer
35
35Both values updated: OK
37
37
48Only one value updated
50
ANOMALY!Action: Update salaryEach person’s salary is repeated for each project they are involved with. What does this imply when we need to increase someone’s salary?
13
Anomaly 2: DeletionEmployee
Salary ProjectBudge
tRole
Green 35 Gamma
15 Designer
Green 35 Epsilon 9 Designer
Hoskins 55 Epsilon 9 Manager
Hoskins 55 Gamma
15 Consultant
Moore 48 Gamma
15 Manager
Moore 48 Epsilon 9 Designer
Brown 20 Alpha 2 Technician
What happens to(Brown, 20)?
ANOMALY!Action: Delete project AlphaIf a project ends (i.e., is deleted), what happens to the data for employees on that project?Project Alpha ends and the corresponding row for it is deleted.We now can’t store any data about employee Brown, because they are no longer assigned to any projects.
14
Anomaly 3: Insertion
Employee
Salary Project Budget
Role
Brown 20 Alpha 2 Technician
Green 35 Gamma
15 Designer
Green 35 Epsilon 9 Designer
Hoskins 55 Epsilon 9 Manager
Hoskins 55 Gamma
15 Consultant
Moore 48 Gamma
15 Manager
Moore 48 Epsilon 9 Designer
Employee
Salary Project Budget Role
Brown 20 Alpha 2 Technician
Green 35 Gamma
15 Designer
Green 35 Epsilon 9 Designer
Hoskins 55 Epsilon 9 Manager
Hoskins 55 Gamma
15 Consultant
Moore 48 Gamma
15 Manager
Moore 48 Epsilon 9 Designer
Johnson 36 ??? ??? ???
ANOMALY!
Where do we store(Johnson, 36) until then?
Action: Hire Johnson on a salary of 36, but they haven’t been assigned to any project yet.We aren’t allowed to store nulls, which means we can’t add Johnson until they’ve been assigned to a project (click). This is effectively the inverse of the problem on the previous slide.
15
Reduce Redundancy
Employee
Salary
Brown 20
Green 35
Hoskins 55
Moore 48
Employee
Project Role
Brown Alpha Technician
Green Gamma
Designer
Green Epsilon Designer
Hoskins Epsilon Manager
Hoskins Gamma
Consultant
Moore Gamma
Manager
Moore Epsilon Designer
Project Budget
Alpha 2
Gamma
15
Epsilon 9
Employee ProjectRole
Breaking up the relation eliminates the worst of the redundancy Normalisation is a process which groups logically related data into a structure, has minimal redundancy and has no update anomalies (later courses=SFDV3003)
16
Relational Databases
Devised in 1969 by Edgar CoddThree aspects
1.Structural2. Integrity3.Manipulation
Attribute
Tuples
Relation
17
Table / Relation:• Entities transformed into relations (physically tables in the database)• Data model is independent • Although table is used a synonym for relation• Physical level – Record type or fileRows / Tuples:• Entity occurrences = tuples (row) = MS Access records• Contains all the attribute values for a particular occurrence of a
relation• In the relation model order of tuples not important (i.e. ordering is
irrelevant)• Tuples must be unique (i.e. no duplicates allowed).Attributes:• Attributes are referred to by name not position (order is not
significant)• Attributes (intersection of row and column) can contain just one value
atomic• Attribute types (domains) refer to the set of values or pool of values
that the attribute can contain (often represented as a data type)Aspects:1.Structural: Data in DB is perceived by user as tables, and nothing but
tables2.Integrity: Tables satisfy certain integrity constraints (later in the
lecture)3.Manipulative: Operators available to users for manipulating (update,
delete, create, read) tables
18
Relational Keys
Paper_code
COMP102
PSYC101
COMP102
Name Birth_date
Mickey 3/4/1963
Pluto 3/4/1963
Mickey 6/11/1975
Paper Title Description
COMP102 Software Enginee … …
PSYC101 … …
Composite PK FK
Non-composite PK
Types » Candidate key Primary key Alternate key Composite key Surrogate key Foreign key
19
Candidate keyAny key that meets the unique, stable, and minimal. Can be > 1 for any given relation.
Primary keyJust 1 (chosen from the candidate key)
Alternate keyCandidate keys that do not become the primary key
Composite key (compound)Key (any of the above types plus foreign keys) that have more than one attributes
Surrogate key (artificial)An “invented” key (e.g. example customer ID which is just numbers).
Foreign keyUse to form relationships between entities (with primary keys).Not necessarily unique
20
Referential Integrity Example
STUDENT
StudentNo
Name … CourseID
5467346 Jenny … BBW
1676349 Mun Chan
… DRC
9437316 Alexander
… DFA
4346786 Richard … BBW
7643465 Monique … <null>
134675 Sarah … DJK
… … … …
COURSE
CourseID Title Length
BKEBachelor of Kite Engineering
36
DRCDiploma in Rock Climbing
12
BBWBachelor of Bird Watching
24
DFADiploma in Flower Arranging
18?Violation ofreferentialintegrity PK
FK
[Source: D’Orazio and Happel, 1996]
21
Review referential integrity
Using "Enforce Referential Integrity" (i.e. tick the check box) will match the related records of two tables and return zero value of Anomaly records).
When the "Cascade Update Related Fields" check box is selected, changing a primary key value in the primary table (main) automatically updates the matching values in all related records.
When the "Cascade Delete Related Fields" check box is selected, deleting a record in the primary table deletes any related records) in the related table.
22
The Transformation Process
General rules:1. Each entity becomes a relation2. Each attribute becomes an attribute in corresponding
relation3. Unique identifiers become primary keys (PK) in
corresponding relation4. Implement relationships through foreign key (FK)
placements
Conceptual ERD
Candidate relations
Database Tables
[Source: D’Orazio and Happel, 1996]
23
Relationship Transformation Rules
1:1 Place PK of first relation into the second
relation as a foreign key (or vice-versa)1:M
Place PK of the ‘1’ end relation into the ‘M’ end relation as a FK
M:M Create a new ‘all key relation’ to represent M:M
relationship Follow 1:M transformation rules
24
M:M Transformation Example
GenreG#, desc, …
CDCD#, title, …
GenreG#, desc, …
ClassificationG#, CD#
CDCD#, title, …
always one& mandatory
Intersecting relation
25
References
Date, An Introduction to Database Systems, 8th Edition, Addison Wesley, 2004
Rob and Coronel, Database Systems: Design, Implementation, and Management, 7th Edition, Thomson, 2007
-------------------------------------------------------
Note: Start Practical Sessions 4
Recommended