6.1
Lecture 6
Data Design
IMS2805 - Systems Design and Implementation
6.2
References
HOFFER, J.A., GEORGE, J.F. and VALACICH (2002) 3rd ed., Modern Systems Analysis and Design, Prentice-Hall, New Jersey, Chap 12
HOFFER, J.A., GEORGE, J.F. and VALACICH (2005) 4th ed., Modern Systems Analysis and Design, Prentice-Hall, New Jersey, Chap 10
WHITTEN, J.L., BENTLEY, L.D. and DITTMAN, K.C. (2001) 5th ed., Systems Analysis and Design Methods, Irwin/McGraw-Hill, New York, NY. Chapter 12
6.3
Data Design
Logical data design:
Logical data modelling
Physical file and database design
6.4
Overview of logical data modelling
Logical data modelling involves:
collecting detailed attributes for each entity and relationship identified
converting ER models to relations normalising the relations merging relations from each user viewpoint converting the normalised and merged relations
to create a data structure diagram
6.5
The data structures developed in the logical system design phase need to be converted to physical data structures for the physical design stage.
A knowledge of the relevant data base management system (DBMS) is important for this phase.
The aim is to develop a compromise between the “ideal” (most flexible) structure developed during normalisation and the most efficient (fastest, cheapest) structure preferred for implementation.
Data Design
6.6
Information required:
Normalised relations, including volume estimates
Definitions of each attribute
Descriptions of where and when data are used (CRUD)
Expectations/requirements for response time and data integrity
Descriptions of technologies to be used for implementing the files and databases
Physical Data Design
6.7
Key decisions: The storage format (data type) for each attribute, e.g.
length,coding scheme, decimal places, range of values
Grouping of attributes into physical records (data structure)
Arranging records in secondary memory for storage, retrieval, update (file organisation)
Selection of media and structures for efficient access
DBMS technologies require selection of options for retrieval and for implementation of relationships between entities
Physical Data Design
6.8
The relational database model
The relational database model represents data in the form of tables or relations
Important concepts are: relation primary key foreign key functional dependency normalisation
6.9
Normalisation
Normalisation is a process for converting complex data structures into simple, stable data structures in the form of relations.
Data models consisting of normalised relations: are robust and stable have minimum redundancy are flexible are technology-independent (logical)
6.10
Normalisation ensures that each attribute is attached to the appropriate relation:
i.e. each attribute is contained in the relation which represents the real world system object or concept that the attribute describes or is a property ofe.g. the attribute Student-name should be in the relation STUDENT which represents the real world object “student” of interest to a student records system
Normalisation
6.11
Normalisation
Normalisation was originally developed as part of relational database theory by E.F. Codd (1970)
Normalisation is accomplished in stages, each of which corresponds to a “normal form”
Originally, Codd defined first, second and third normal forms. Third normal form is adequate for most business applications.
Later extensions include Boyce-Codd, 4th, 5th and domain-key normal forms.
6.12
Relation
A relation is a named, two-dimensional table of data
Each relation consists of a set of named columns and an arbitrary number of rows
Each column corresponds to an attribute of the relation
Each row corresponds to an instance (or record) that contains values for that instance
6.13
Example relation
A relation generally corresponds to some real world object or concept of interest to the system (similar to an entity) e.g.:
Emp# Name Salary Dept
1247
1982
9314
Adams
Smith
Jones
24000
27000
33000
Finance
MIS
Finance
Employee
Employee (Emp#, Name, Salary, Dept)
6.14
Properties of relations
Relational tables are tables in which:
data values are atomic (single-valued)
data values in columns are from the same domain
each row in the relation is unique
the sequence of columns is insignificant
the sequence of rows is insignificant
6.15
Primary key An attribute or group of attributes which uniquely
identifies a row of a relation Entity integrity (relational data base theory)
requires that each relation has a non-null primary key
Where several possible keys are identified, they are known as candidate keys - choose one to be the primary key
Employee (Emp#, Name, Salary, Dept)
Order-item (Order#, Item#, Qty-ordered)
E.g.
6.16
Foreign key
A foreign key is an attribute in one relation that is also a primary key in another relation
The referential integrity constraint (relational database theory) specifies that if an attribute value exists in one relation then it must also exist in a linked relation
A foreign key must satisfy referential integrity
6.17
Foreign key example
In the example below, if a given Dept# exists in an Employee relation then that Dept# must exist in the Department relation
Employee (Emp#, Name, Salary, Dept#)
Department (Dept#, Dname, Budget)
foreign key
6.18
Functional dependency A functional dependency is a particular
relationship between attributes in a relation For any relation R, attribute B is functionally
dependent on attribute A if each value of A has only ONE value of B associated with it,i.e. if for every valid instance of A, that value of A uniquely determines the value of B
A identifies B
A B
Emp# Emp-name
Emp# Salary
6.19
Normalisation to third normal form is accomplished in 3 steps each corresponding to a basic normal form
A normal form is a state of a relation that can be determined by applying simple rules concerning dependencies within that relation
Each step of the normalisation process is applied to a single relation in sequence so that the relation is converted to third normal form
Steps in normalisation
6.20
Steps in normalisation
First Normal Form
Second Normal Form
Third Normal Form
Unnormalised table
Remove repeating groups
Remove partial dependencies
Remove transitive dependencies
6.21
Well Structured Relations
A well structured relation contains a minimum amount of redundancy and allows users to insert, modify, and delete rows in a table without errors or inconsistencies (known as “anomalies”)
Three types of anomaly are possible: insertion deletion Modification
Third normal form relations are considered to be well structured relations
6.22
First normal form (1 NF) A relation is in first normal form if it contains no
repeating data :the value of the data at the intersection of each row and column must be single-valued
Remove any repeating groups of attributes to convert a relation to 1 NF (key of the removed group is a composite key)
Order (Order#, Customer#, (Item#, Desc, Qty))
Order-Item (Order#, Item#, Desc, Qty)
Order (Order#, Customer#)
6.23
Second normal form (2 NF)
A relation is in 2 NF if it is in 1 NF and every non-ket attribute is fully functionally dependent on the primary key
A partial dependency exists if one or more non-key attributes are dependent on only part of a composite primary key
Remove any partial dependencies to convert a 1 NFrelation to 2 NF
If the primary key consists of only one attribute or there are no non-key attributes, then a 1 NF relation is automatically in 2 NF
6.24
Second normal form (2 NF)
Remove Partial Dependencies a non-key attribute cannot be identified by part
of a composite key
Order (Order#, Item#, Desc, Qty-ordered)
Order-Item (Order#, Item#, Qty-ordered)
Item (Item#, Desc)
6.25
Partial dependency anomalies
Order# Item# Item-desc Qty
27
28
28
873
402
873
nut
bolt
nut
2
1
10
Order-Item
30 495 washer 50
UPDATE change item =-desc in many places DELETE data for last item lost when last order
for that item is deleted CREATE cannot add new item until it is
ordered
6.26
The solution for these anomalies: 2 NF
Order# Item# Qty
27
28
28
873
402
873
2
1
10
Order
30 495 50
Item# Desc
873
402
nut
bolt
495 washer
delete last order for item, but item remains
add new item at any time
change item description in one place only
Item
6.27
If a relation has a composite key, it must be checked for dependencies within the key to determine whether they should be retained e.g.
DEPT (Dept#, Dept-name, (Emp#, Emp-name))
DEPT (Dept#, Dept-name), DEPT-EMP (Dept#, Emp#, Emp-name)
But DEPT-EMP (Dept#, Emp#, Emp-name)
Remove Dept# from the key:EMP (Emp# ,Emp-name, Dept#)
Dependencies within the primary key
6.28
Third normal form (3 NF)
A relation is in 3 NF if it is in 2 NF and no transitive dependencies exist
A transitive dependency is a functional dependency between two or more non-key attributes
Remove any transitive dependencies to convert a 2 NF relation to 3 NF
6.29
Third normal form
Remove Transitive Dependencies a non-key attribute cannot be identified
by another non-key attribute
Employee (Emp#, Ename, Dept#, Dname)
Employee (Emp#, Ename, Dept#)
Department( Dept#, Dname)
(look for foreign keys and their attributes)
6.30
Transitive Dependency Anomalies
Emp# Emp-name Dept# Dname
10
20
25
Smith
Jones
Smith
D5
D7
D7
MIS
Finance
Finance
Employee
30 Black D8 Sales
UPDATE change dept name in many places DELETE data for dept lost when last employee
for that dept is deleted CREATE cannot add new dept until an
employee is allocated to it
6.31
The solution for these anomalies: 3 NF
Emp# Ename Dept#
10
20
25
Smith
Jones
Smith
D5
D7
D7
Employee
30 Black D8
Dept# Dname
D5
D7
MIS
Finance
D8 Sales
delete last emp in dept, but dept remains
add new dept at any time
change dept name in one place
Item
6.32
Normalisation to 3 NF
A relation is normalised if all attributes are fullyfunctionally dependent on the primary key:
Remove repeating groups
Remove partial dependencies
Remove transitive dependencies
6.33
Transforming ER Diagrams into Relations
Transforming an ER diagram into normalised relations, and then merging all the relations into one final, consolidated set of relations can be accomplished in four steps:
1. Represent entities as relations2. Represent relationships as relations3. Normalise each relation4. Merge the relations
6.34
Representing Entities as Relations
Each entity in the ER model is transformed into a relation e.g.
becomes
CUSTOMER
Customer (Customer-no, Name, Address, City, State, Postcode, Discount)
6.35
Representing Relationships as Relations
1. Binary Relationships (1:N, 1:1)
CUSTOMER ORDER
Customer (Customer-no, Name, Address, City, State, Postcode, Discount)
Order (Order-no, Order-date, Promised-date, Customer-no)
places
6.36
Representing Relationships as Relations
1. Binary Relationships (1:N, 1:1)
For 1:M, add the primary key of the entity on the one side of the relationship as a foreign key in the relation that is on the many side
For 1:1 relationship involving entities A and B, choose from
add the primary key of A as a foreign key of Badd the primary key of B as a foreign key of Aboth of the above
6.37
2. Binary and Higher Degree Relationships (M:N)
Representing Relationships as Relations
PRODUCT ORDER
Where we wish to know the quantity of each product on each order, i.e. this attribute is an attribute of the relationship requests/requested by:
requests
Order (Order-no, Order-date, Promised-date)Order Line (Order-no, Product-no, Quantity-ordered)Product (Product-no, description, (other attributes))
6.38
Representing Relationships as Relations
2. Binary and Higher Degree Relationships (M:N)
For M:N, first create a relation for each for each of the entity types, and then create a relation for the relationship, with a composite primary key formed from the primary keys of the participating entity types.
6.39
3. Unary Relationships
Representing Relationships as Relations
EMPLOYEEITEM
Has component manages
Reports to
Is a componentof
(1:N)(M:N)
Employee (Emp-id, Name, Birthdate, Manager-id)
Item (Item-no, Name, Cost)Item-Bill (Item-no, Component-no, Quantity)
6.40
Representing Relationships as Relations
4. IS-A Relationship (Class-Subclass or generalisation)
PROPERTY
BEACH PROPERTY
MOUNTAINPROPERTY
Property (Street-address, City-state-postcode, No-rooms, Typical-rent)Beach (Street-address, City-state-postcode , Distance-to-beach)Mountain (Street-address, City-state-postcode , Skiing)
6.41
During the normalisation process two or more relations with the same primary key may appear
The set of 3NF relations must not contain any duplicate data
Relations with the same primary key should be merged
Normalisation of relations: merging relations
6.42
Normalisation of relations
Synonyms Two or more attributes may have different names
but the same meaning Either adopt one of the names as a standard or
choose a third nameSTUDENT1 (Student-id, Name, Phone-no)
STUDENT2 (VCE-no, Name, Address)
STUDENT (Student-id, Name, Address, Phone-no)
6.43
Homonyms Two or more attributes may have the same name
but different meanings To resolve the conflict, new attribute names need
to be created
STUDENT1 (Student-id, Name, Address)STUDENT2 (Student-id , Name, Phone-no, Address)
STUDENT (Student-id, Name, Phone-no, Campus-address, Permanent-address )
Normalisation of relations
6.44
Normalisation of relations Transitive dependencies
When two 3NF relations are merged, transitive dependencies may result e.g.
STUDENT1 (Student-id, Major)STUDENT2 (Student-id , Advisor)
STUDENT (Student-id, Major, Advisor)
BUT MAJOR ADVISOR
STUDENT1 (Student-id, Major)MAJOR (Major , Advisor)
6.45
Data Structure Diagrams
A set of 3 NF relations may be converted to a simple diagrammatic form to begin physical database design
The conversion is simple:
1. Draw a named rectangle for each relation2. Draw a relationship line between rectangles
linked by foreign keys with a “many” cardinality at the foreign key end of the relationship
6.46
Data Structure Diagram example
CUSTOMER (Cust#, Cname,Phone number) SALES ORDER (Sord#, Sord-date, Cust#) SALES ORDER-ITEM (Sord#, Item#, Qty) ITEM (Item#, Item-desc)
CUSTOMER
SALESORDER
ITEM
SALES ORDERLINE
6.47
Data Structure Diagram example
Eliminate Redundant Relationships
(Tourcode, ...)
(Tourcode, depdate, ...)
(Booking#, ... , Tourcode, depdate)
redundant
TOUR
DEPARTURE
BOOKING
6.48
Logical Data Model+
Designer Added Data+
Volume & Sizing Info+
Constraints
Logical Process Model+
Designer Added Procedures+
Frequency & Response Req.+
Constraints
Logical Database Design+
Logical Transaction Model
Target Platform Specific Design
Relational Database Design
6.49
Structure (Stability / Adaptability)Preserve correspondence to normalised data model
Preserve data integrity
Performance (Speed and Resource Utilisation)Transaction response requirements
Minimise secondary storage requirementsMinimise transfers between primary and secondary
storage
Database Design Objectives
6.50
Logical Design: Designer added entity types and attribute types Review primary and foreign keys Review derived and aggregate data Define table and column structure Establish row and column ratios Define transaction access maps Review transaction access maps considering
performance Develop composite load map Evaluate design in terms of corporate standards,
response requirements and resource utilisation.
Database Design - Logical Design
6.51
Selection of table access paths including indexes and file organisation.
Selection of data types and sizes Selection of other DBMS dependent
variables page size, free space management option, data and index clustering, compression, concurrency control issues, database sizing - % fill for tables, buffer sizes, access control constraints.
Database Design - Physical Design
6.52
It is often necessary to extend the logical data model with extra entity types, attribute types and / or relationship types to reflect design and implementation decisions.
Additional data may be required to support: archiving back-up access control restart and recovery audit and integrity requirements system housekeeping functions and the storing of
historical information
Designer Added Data
6.53
Adding entities for the purpose of archiving:
Project EmployeeHas Assigned
ArchivedProject
ArchivedEmployee
Last UsedEmpno
Example: Designer Added Data
6.54
Relations become tables. Attributes become columns
a number of decisions must be made when mapping attributes to columns.
Domains must be defined and linked to the appropriate table columns. (if domains are supported by the DBMS)
Convert Logical Data Model
6.55
Some guidelines regarding primary keys: Primary keys identified in the logical data model may
not always be suitable for implementation. The length of each attribute forming the primary key
should be as short as possible. Avoid long character fields.
Keep the use of composite keys to a minimum. Surrogate keys may be used to reduce these
problems. Sometimes these keys may be hidden from the users.
Compact unique identifiers are particularly suitable as primary and therefore foreign keys.
No component of a primary key can accept null values, and the uniqueness property must be enforced.
Primary Keys
6.56
A technique to analyse the data access loads generated by each transaction type.
This allows alternative approaches and their impact on data access loads to be considered.
The loads generated by each transaction type may be aggregated to calculate the total access loads on the database, and this information is useful in identifying heavily loaded structures of the database.
Transaction Analysis
6.57
To properly evaluate a design it may be necessary to calculate several sets of performance statistics.
Transaction loads may vary widely from peak to average.
Database size may vary widely during application execution particularly if cumulative data is held.
It is much safer to perform design calculations using peak loads, rather than 'average' loads.
Planned equipment utilisation should never exceed 70%.
Statistics Must be Representative
6.58
Define the cardinality of each table, and the ratios between table rows.
Develop transaction access maps for each transaction type showing the sequence in which each table is accessed and the basis of each access.
Use table cardinality and ratio information to determine the number of rows of each table that each transaction type will access.
Transaction Analysis Technique
6.59
Aggregate these details to produce total access loads for each table in the database.
The individual loads for each transaction type and the composite load map should be carefully examined for opportunities to improve the efficiency of the system.
Great care needs to be taken to balance flexibility against performance, when considering amendments to transaction access paths.
Transaction Analysis Technique
6.60
The following data model fragment will be used for examples. The numbers relate to table cardinality (eg Order table 900) table ratios (eg Order to Orderline ratio 1:10)
NextOrder#
(1)
Order Item
(900) (1000)
(9000)
(1:10) (1:9)
OrderLine
Transaction Analysis Technique
6.61
Item (Item#, Description, Price, Quantity-on-hand) 1000 records with 5% growth pa
Order-line (Order#, Item#, Qty-ordered, Qty-shipped, Unit-price) 9000 records with 7% growth pa, Ratios ( 1 Item: 9 Order-lines), (1 Order : 10 Order-
lines) Order ( Order#, Ord-date, etc)
900 records with 7% growth pa, Next Order (Order#)
1 record with no growth Each Item has on average 9 Order-lines Each Order has on average 10 Order-lines
Table Cardinality & Row Ratios
6.62
Developed from action diagrams for elementary processes. For each transaction, or at least the most critical, it is necessary to identify: Each table access, The sequence of the access within the transaction, The type of access C. R. U. D.(create, read, update,
delete) The columns and predicate of the search condition for
the CRUD access Whether the predicate values come from other tables
accessed by the same transaction, The columns required as a result of this access, The number of accesses per transaction, The number of accesses per period.
Transaction Access Path Maps
6.63
Transaction Acces Path MapTransaction Number 1 Frequency 100 per hour peakTransaction Name: Enter New Order 30 per hour average
Destination Destination Destination Source Source Accesses AccessAccess Access Table Access Columns Table Column Per PerNo Type Name Predicate Required Name Names Tran Period
1 read Next Order Order# 1 1002 update Next Order current row Order# 1 1003 insert Order all screen 1 1004 read Item Item= screen all screen 10 10005 update Item current row Quantity -OH screen 10 10006 insert Order-line screen 10 1000
Total Logical Acceses 33 3300
NextOrder#
(1)
1 & 2 3 4 & 5
6
Order Item
(900) (1000)
(9000)
(1:10) (1:9)
OrderLine
Example: Transaction Access Path Map
6.64
Item
1 & 2
(1000)
Transaction Acces Path MapTransaction Number 33 Frequency 300 per day peakTransaction Name: Item Status Enquiry 100 per day average
Destination Destination Destination Source Source Accesses AccessAccess Access Table Access Columns Table Column Per PerNo Type Name Predicate Required Name Names Tran Period
1 read Item Item = screen Item#, Desc, QOH 0.7 2102 read Item Desc= screen Item#, Desc, QOH 0.3 90
Total Logical Acceses 1 300
Assumption: 70% of enquiries are by Item#, 30% by description (Desc)
Example: Transaction Access Path Map
6.65
3.
2.
1.
Transaction Acces Path Map
Transaction Number 66 Frequency 20 per hour peak
Transaction Name: Check Order Details 15 per hour average
Destination Destination Destination Source Source Accesses Access
Access Access Table Access Columns Table Column Per Per
No Type Name Predicate Required Name Names Tran Period
1 read Order all 1 20
2 read Order-line all Order Order# 10 200
3 read Item Item# = Item# all Order-line Item# 10 200
Total Logical Acceses 21 420
Order Item
(900) (1000)
(9000)
(1:10) (1:9)
OrderLine
Example: Transaction Access Path Map
6.66
Complicate the situation, because they are difficult to quantify and predict in advance.
Some controls on when ad hoc queries are permitted may be necessary.
Some DBMS allow the cost of a query to be calculated and the user may be forewarned or prevented from executing an expensive query.
Data may be extracted from the production database during quiet times and loaded onto another database on a different machine to minimise the impact of ad hoc query.
Ad Hoc Queries
6.67
The access path for each transaction should be reviewed for efficiency.
A number of techniques may be used to reduce the cost of transactions. Alteration of aggregation time. Alteration of column derivation time. Data Replication Combining and splitting tables Introducing codes simplify identification
and compress data
Analysis of the Transaction Access Path Map
6.68
Summarise the information from each transaction access map to show the total number of accesses by table, access type, table columns used for access and the access predicate.
This will highlight possible keys and indexes, row sequencing requirements and the columns used to join tables.
Ideally the production of the composite load map should be automated to allow for automatic re-calculation of totals following changes to transaction access maps.
Developing the Composite Load Map
6.69
Composite Load MapDestination Destination Destination Destination Accesses Source SourceTable Access Column Access Per Table Column Trans AccessName Type Name(s) Predicate Hour Name Name(s) Number NumberItem read Item# "=" 200 Order-line Item# 66 3
"=" 1000 screen 1 4"=" 27 screen 33 1Total 1227
Description "=" 12 screen 33 2Total 12
update item# "=" 1000 item item# 1 5Total 1000
Next Order read Order# 100 1 1Total 100
Next Order update Order# 100 1 2Total 100
Order insert all 100 screen 1 3Total 100
read order# "=" 20 screen 66 1Total 20
Order-line insert all 1000 screen 1 6Total 1000
read order# "=" 200 order order# 66 2Total 200
Example: Composite Load Map
6.70
The composite load map is designed to highlight table access types that have a very high frequency.
One logical input-output operation does not necessarily result in one physical input-output operation. Buffering techniques, indexes, and logging complicate the actual number of physical I/O operations.
The transactions contributing to these high levels of access should be closely scrutinised to determine whether the number of I/O accesses could be reduced by altering their data access strategy.
The composite load map also provides a base for determining where entry point access methods such as primary and secondary indexes may provide substantial benefit.
Interpreting the Composite Load Map
6.71
Where the composite load map shows high join frequencies between tables, control of table row placement, or the use of table access methods such as hashing techniques or indexes may be used to optimise join operations.
Where a table has a high frequency of sequential access, physical ordering of table rows, or the creation of an index to support the access could be useful.
Tables with a small number of rows could be held in main storage, if this option is supported by the DBMS.
The actual access path taken will depend on the DBMS query optimiser.
Interpreting the Composite Load Map
6.72
Given the limitations of some DBMS it may be necessary to compromise the logical data model for performance reasons.
In general this should be hidden from application logic by the use of special access routines that are responsible for mapping the normalised application view to the de-normalised storage view.
Some DBMS specific design techniques may allow a normalised application view to be maintained.
Compromising the Logical Model
6.73
Some transactions may implement designer added procedures that are part of the designed business system rather than part of the process model developed during analysis.
The starting point for transaction design for process implementing procedures is the logic defined via an action diagram for the elementary process(es) which the transaction (procedure) is to implement.
If action diagrams have not been developed during analysis then these should be developed in close association with the data model as part of design.
Transaction Design
6.74
Operational, audit and access control issues. Impact of designer added data on procedure logic. Concurrency and locking issues. Failure, fallback, restart and recovery issues. Standardisation of transaction resource
requirements. Frequency of use and transaction response
requirements. The use of derived and redundant data to minimise
access to secondary storage. The most appropriate time to derive aggregate
data.
Physical Issues in Transaction Design