MIS- Database Management Systems

Database Management System Conceptual View

11/25/2008

Institute of Management Sciences

Muhammad Atif Nasim

Table of Contents

Database .......................................................................................................................... 2

Database Management Systems ....................................................................................... 2

Uses of databases ............................................................................................................. 2

Type of Databases ............................................................................................................ 2

Delimited text files .............................................................................................................................. 2

Comma-separated variable (CSV) files ................................................................................................ 3

Locking ................................................................................................................................................ 3

Complex data ...................................................................................................................................... 3

Efficiency ............................................................................................................................................. 3

Hierarchical Database Definition ........................................................................................................ 4

Network model ................................................................................................................................... 4

Relational Database ............................................................................................................................ 4

Object-Oriented Database Definition ................................................................................................. 5

Tables and relationships ................................................................................................... 5

Entity-Relationship Diagrams (ERD) .................................................................................. 8

Data Flow Diagram (DFD) ............................................................................................... 13

Guidelines ......................................................................................................................................... 13

Decomposition .................................................................................................................................. 13

Symbols ............................................................................................................................................. 15

Data process ............................................................................................................................ 15

Data store ................................................................................................................................ 15

Actor ......................................................................................................................................... 16

Anchor ...................................................................................................................................... 16

Data flow .................................................................................................................................. 16

Control flow .............................................................................................................................. 16

Update flow.............................................................................................................................. 16

Flow names and inheritance .................................................................................................... 17

Data Flow Diagram Layers ....................................................................................................... 19

Context Diagrams .................................................................................................................... 20

DFD levels ................................................................................................................................. 20

Key ................................................................................................................................. 21

Primary key ....................................................................................................................................... 21

Secondary/Foreign key ..................................................................................................................... 21

Database Normalization ................................................................................................. 21

1. Eliminate Repeating Groups ......................................................................................................... 24

2. Eliminate Redundant Data ............................................................................................................ 25

3. Eliminate Columns Not Dependent On Key .................................................................................. 26

BCNF. Boyce-Codd Normal Form.............................................................................................. 26

4. Isolate Independent Multiple Relationships ................................................................................. 27

5. Isolate Semantically Related Multiple Relationships .................................................................... 28

6. Optimal Normal Form ................................................................................................................... 29

7. Domain-Key Normal Form ............................................................................................................ 29

Components of DBMS..................................................................................................... 30

Data dictionary/directory ................................................................................................................. 30

Data languages .................................................................................................................................. 30

Teleprocessing monitors ................................................................................................................... 31

Application development system ..................................................................................................... 31

Security software .............................................................................................................................. 31

Archiving and recovery system ......................................................................................................... 31

Report writers ................................................................................................................................... 31

SQL and other Query languages ....................................................................................................... 31

Data Redundancy ........................................................................................................... 21

Data Integrity ................................................................................................................. 21

Cascade Updates and Deletes .................................................................................................. 22

Business Rules and Levels of Enforcement ............................................................................... 22

Field Level Integrity .................................................................................................................. 22

Table Level Integrity ................................................................................................................. 23

Validation Tables ..................................................................................................................... 23

2 | P a g e

Database

A database is a collection of related information in organize manner. The data stored in a

database is Constant.

Database Management Systems

A database management system (DBMS) is software or a collection of software which can be used to create, maintain and work with databases.

A client/server database system is one in which the database is stored and managed by a database server, and client software is used to request information from the server or to send commands to the server.

Uses of databases

Databases are commonly used to store bodies of data which are too large to be managed on paper or through simple spreadsheets.

Most businesses use databases for accounts, inventory, personnel, and other record keeping.

Databases are also becoming more widely used by home users for address books, cd collections, recipe archives, etc.

There are very few fields in which databases cannot be used.

Type of Databases

• Flat-file text databases

• Hierarchical databases such as LDAP

• Network databases

• Relational databases

• Object Oriented databases

Delimited text files

A delimited text file is one in which each line of text is a record, and the fields are separated by a known character. The character used to delimit the data varies according to the type of

data. Common delimiters include the tab character (\t in Perl) or various punctuation characters. The delimiter should always be one which does not appear in the data.

Delimited text files are easily produced by most desktop spreadsheet and database applications (eg Microsoft Excel, Microsoft Access). You can usually choose "File" then "Save As" or "Export", then select the type of file you would like to save as.

3 | P a g e

Imagine a file which contains peoples' given names, surnames, and ages, delimited by the

pipe (|) symbol:

Fred|Flintstone|40

Wilma|Flintstone|36

Barney|Rubble|38

Betty|Rubble|34

Homer|Simpson|45

Marge|Simpson|39

Bart|Simpson|11

Lisa|Simpson|9

The file above is available in your exercises directory as delimited.txt.

Comma-separated variable (CSV) files

Comma separated variable files are another format commonly produced by spreadsheet and database programs. CSV files delimit their fields with commas, and wrap textual data in quotation marks, allowing the textual data to contain commas if required:

"Fred","Flintstone",40

"Wilma","Flintstone",36

"Barney","Rubble",38

"Betty","Rubble",34

"Homer","Simpson",45

"Marge","Simpson",39

"Bart","Simpson",11

"Lisa","Simpson",9

CSV files are harder to parse than ordinary delimited text files. The best way to parse them is to use the Text::ParseWords module:

Problems with flat file databases

Locking

When using flat file databases without locking, problems can occur if two or more people open the files at the same time. This can cause data to be lost or corrupted.

If you are implementing a flat file database, you will need to handle file locking using Perl's

flock function.

Complex data

If your data is more complex than a single table of scalar items, managing your flat file database can become extremely tedious and difficult.

Efficiency

Flat file databases are very inefficient for large quantities of data. Searching, sorting, and other simple activities can take a very long time and use a great deal of memory and other system resources.

4 | P a g e

Hierarchical Database Definition

A kind of {database management system} that links records together like a family tree such that each record type has only one owner, e.g. an order is owned by only one customer. Hierarchical structures were widely used in the first {mainframe} database management systems. However, due to their restrictions, they often cannot be used to relate structures that exist in the real world.

Network model

The network model is a database model conceived as a flexible way of representing objects and their relationships. Its original inventor was Charles Bachman, and it was developed into a standard specification published in 1969 by the CODASYL Consortium. Where the hierarchical model structures data as a tree of records, with each record having one parent record and many children, the network model allows each record to have multiple parent and child records, forming a lattice structure.

The chief argument in favour of the network model, in comparison to the hierarchic model, was that it allowed a more natural modelling of relationships between entities. Although the model was widely implemented and used, it failed to become dominant for two main reasons. Firstly, IBM chose to stick to the hierarchical model with semi-network extensions in their established products such as IMS and DL/I. Secondly, it was eventually displaced by the relational model, which offered a higher-level, more declarative interface. Until the early 1980s the performance benefits of the low-level navigational interfaces offered by hierarchical and network databases were persuasive for many large-scale applications, but as hardware became faster, the extra productivity and flexibility of the relational model led to the gradual obsolescence of the network model in corporate enterprise usage

Relational Database

• A relational database is a collection of data items organized as a set of formally-described tables from which data can be accessed or reassembled in many different ways without having to reorganize the database tables. The relational database was invented by E. F. Codd at IBM in 1970.

• The standard user and application program interface to a relational database is the structured query language (SQL). SQL statements are used both for interactive queries for information from a relational database and for gathering data for reports.

• In addition to being relatively easy to create and access, a relational database has the important advantage of being easy to extend. After the original database creation, a new data category can be added without requiring that all existing applications be modified.

• A relational database is a set of tables containing data fitted into predefined categories. Each table (which is sometimes called a relation) contains one or more data categories in columns. Each row contains a unique instance of data for the categories defined by the columns. For example, a typical business order entry database would include a table that described a customer with columns for name, address, phone number, and so forth. Another table would describe an order: product,

5 | P a g e

customer, date, sales price, and so forth. A user of the database could obtain a view of the database that fitted the user's needs. For example, a branch office manager might like a view or report on all customers that had bought products after a certain date. A financial services manager in the same company could, from the same tables, obtain a report on accounts that needed to be paid.

• When creating a relational database, you can define the domain of possible values in a data column and further constraints that may apply to that data value. For example, a domain of possible customers could allow up to ten possible customer names but be constrained in one table to allowing only three of these customer names to be specifiable.

• The definition of a relational database results in a table of metadata or formal descriptions of the tables, columns, domains, and constraints.

Object-Oriented Database Definition

(OODB) A system offering {DBMS} facilities in an {object-oriented programming} environment. Data is stored as {objects} and can be interpreted only using the {method}s specified by its {class}. The relationship between similar objects is preserved ({inheritance}) as are references between objects. Queries can be faster because {joins} are often not needed (as in a {relational database}). This is because an object can be retrieved directly without a search, by following its object ID. The same programming language can be used for both data definition and data manipulation. The full power of the database programming language's {type system} can be used to model {data structures} and the relationship between the different data items. {Multimedia} {applications} are facilitated because the {class} {method}s associated with the data are responsible for its correct interpretation. OODBs typically provide better support for {versioning}. An object can be viewed as the set of all its versions. Also, object versions can be treated as fully fledged objects. OODBs also provide systematic support for {triggers} and {constraints} which are the basis of {active databases}. Most, if not all, object-oriented {application programs} that have database needs will benefit from using an OODB. {Ode} is an example of an OODB built on {C++}.

Tables and relationships

In a relational database, data is stored in tables. Each table contains data about a particular type of entity (either physical or conceptual).

For instance, our sample database is the inventory and sales system for Acme Widget Co. It has tables containing data for the following entities:

Table 4-1. Acme Widget Co Tables

Table Description

stock_item Inventory items

customer Customer account details

saleperson Sales people working for Acme Widget Co.

Sales Sales events which occur

6 | P a g e

Tables in a database contain fields and records. Each record describes one entity. Each field describes a single item of data for that entity. You can think of it like a spreadsheet, with the rows being the records and the columns being the fields, thus:

Table 4-2. Sample table

ID number Description Price Quantity in stock

1 widget $9.95 12

2 gadget $3.27 20

Every table must have a primary key, which is a field which uniquely identifies the record. In the example above, the Stock ID number is the primary key.

The following figures show the tables used in our database, along with their field names and primary keys (in bold type).

Table 4-3. the stock_item table

stock_item

Id

Description

Price

Quantity

Table 4-4. the customer table

Customer

Id

Name

Address

Suburb

State

Postcode

Table 4-5. the salesperson table

salesperson

Id

Name

Table 4-6. the sales table

Sales

7 | P a g e

Id

sale_date

salesperson_id

customer_id

stock_item_id

quantity

Price

• A database table contains fields and records of data about one entity • SQL (Structured Query Language) can be used to manipulate and retrieve data in a

database

• A SELECT query may be used to retrieve records which match certain criteria • An INSERT query may be used to add new records to the database

• A DELETE query may be used to delete records from the database • An UPDATE query may be used to modify records in the database

• A CREATE query may be used to create new tables in the database • A DROP query may be used to remove tables from the database

8 | P a g e

Entity-Relationship Diagrams (ERD)

Data models are tools used in analysis to describe the data requirements and assumptions in the

system from a top-down perspective. They also set the stage for the design of databases later on in the

SDLC.

There are three basic elements in ER models:

Entities are the "things" about which we seek information.

Attributes are the data we collect about the entities.

Relationships provide the structure needed to draw information from multiple entities.

Generally, ERD's look like this:

9 | P a g e

Developing an ERD

Developing an ERD requires an understanding of the system and its components. Before discussing the procedure, let's look at a narrative created by Professor Harman.

Consider a hospital: Patients are treated in a single ward by the doctors assigned to them. Usually each patient will be assigned a single doctor, but in rare cases they will have two. Heathcare assistants also attend to the patients, a number of these are associated with each ward. Initially the system will be concerned solely with drug treatment. Each patient is required to take a variety of drugs a certain number of times per day and for varying lengths of time. The system must record details concerning patient treatment and staff payment. Some staff are paid part time and doctors and care assistants work varying amounts of overtime at varying rates (subject to grade). The system will also need to track what treatments are required for which patients and when and it should be capable of calculating the cost of treatment per week for each patient (though it is currently unclear to what use this information will be put).

How do we start an ERD?

1. Define Entities: these are usually nouns used in descriptions of the system, in the discussion of business rules, or in documentation; identified in the narrative (see highlighted items above).

2. Define Relationships: these are usually verbs used in descriptions of the system or in discussion of the business rules (entity ______ entity); identified in the narrative (see highlighted items above).

10 | P a g e

Fully attributed ERD with keys

3. Add attributes to the relations; these are determined by the queries,and may also suggest

new entities, e.g. grade; or they may suggest the need for keys or identifiers.

What questions can we ask? a. Which doctors work in which wards? b. How much will be spent in a ward in a given week? c. How much will a patient cost to treat? d. How much does a doctor cost per week? e. Which assistants can a patient expect to see? f. Which drugs are being used? 4. Add cardinality to the relations Many-to-Many must be resolved to two one-to-manys with an additional entity Usually automatically happens Sometimes involves introduction of a link entity (which will be all foreign key) Examples: Patient-Drug 5. This flexibility allows us to consider a variety of questions such as: a. Which beds are free? b. Which assistants work for Dr. X? c. What is the least expensive prescription? d. How many doctors are there in the hospital? e. Which patients are family related?

11 | P a g e

6. Represent that information with symbols. Generally E-R Diagrams require the use of the following symbols:

Reading an ERD

It takes some practice reading an ERD, but they can be used with clients to discuss business rules.

These allow us to represent the information from above such as the E-R Diagram below:

12 | P a g e

ERD brings out issues: Many-to-Manys Ambiguities Entities and their relationships What data needs to be stored The Degree of a relationship Now, think about a university in terms of an ERD. What entities, relationships and attributes might you consider? Look at this simplified view. There is also an example of a simplified view of an airline on that page.

13 | P a g e

Data Flow Diagram (DFD)

The DFDs show the flow of data values from their sources in objects through the processes that transform them to their destination in other objects. Values can include input values, output values, and internal data stores. Control information is shown only in the form of control flows.

The following table lists the important elements of DFDs.

Symbol Stands For

Data process Data processing

Data flow Data flow or the exchange of data between processes

Data store Data storage

Actor Object producing and consuming data

Guidelines

You can follow certain guidelines to draw meaningful DFDs.

• Optional input flows do not exist. A process can perform its function only if all its input flows are always available.

• You cannot assign the same data to two output flows from the same process. If a process produces more than one data flow, these flows are mutually exclusive.

• You can split a flow, and you can merge two flows into one.

Decomposition

To specify what a high-level process does, break it down into smaller units in more DFDs. A high-level process is an entire DFD. Each high-level process is decomposed into other processes with data flows and data stores. Each decomposition is a DFD in itself. You can continue to break down processes until you reach a level on which further decomposition seems impossible or meaningless.

The data flows of the opened process are connected in the new diagram to the process related to the opened process. Vertices, and the flows and objects connected to them, are transferred with the flows that are connected to the decomposed process.

14 | P a g e

Example DFD

The following illustration shows a sample DFD.

15 | P a g e

Symbols

Data process

A data process transforms data values.

You can make a distinction between the following types of processes:

Process Type Indicates

High-level Process containing nonfunctional components such as data stores or external

objects that cause side effects

Low-level Pure function without side effects, such as the sum of two numbers

Leaf or atomic

processes Process that is not further decomposed

The name of a process is usually a description of the transformation it performs.

There are three sorts of transformation:

• Transformation of the structure, for example, reformatting

• Transformation of information contained in data

• Generation of new information

If you open a process, you can either create a new DFD or open an existing DFD in which the process is specified.

The data flows of the opened process are connected in the new diagram to the process with the name of the opened process. Vertices, and the flows and objects connected to them, are transferred with the flows that are connected to the decomposed process.

If a data process has a decomposition at a lower level, an asterisk is placed inside the ellipse. The data process can be opened only if it has a name.

Data store

A data store stores data passively for later access. A data store responds to requests to store and access data. It does not generate any operations. A data store allows values to be accessed in an order different from the order in which they were generated.

Input flows indicate information or operations that modify the stored data such as adding or deleting elements or changing values. Output flows indicate information retrieved from the store; this information can be an entire value or a component of a value.

16 | P a g e

Actor

An actor produces and consumes data, driving the DFD. Actors lie on the boundary of the diagram; they terminate the flow of data as sources and sinks of data. They are also known as terminators. Data flows between an actor and a diagram are inputs to and outputs of the diagram. The system interacts with people through the actor.

Anchor

A DFD anchor provides a start or end point. In decomposition diagrams, anchors represent the nodes connected to the decomposed process in the higher level diagram.

Data flow

A data flow moves data between processes or between processes and data stores. As such, it represents a data value at some point within a computation and an intermediate value within a computation if the flow is internal to the diagram. This value is not changed.

The names of input and output flows can indicate their roles in the computation or the type of the value they move. Data names are preferably nouns. The name of a typical piece of data, the data aspect, is written alongside the arrow.

Control flow

A control flow is a signal that carries out a command or indicates that something has occurred. A control flow occurs at a discrete point in time. The arrow indicates the direction of the control flow. The name of the event is written beside the arrow.

Control flows can correspond to messages in CCDs or events in STDs; however, because they duplicate information in the DFD, use them sparingly.

Update flow

Update (or bidirectional) flows are used to indicate an update of a data store, that is, a read, change, and store operation on a data flow.

17 | P a g e

Flow names and inheritance

Flows in DFDs must be named. However, flows can inherit the names of the objects they are connected to. The table below shows the rules for inheritance of names. These rules are applied in the order shown, until nothing more can be inherited. In some situations, the flow's inherited name causes an error when a Check command is carried out. The result of the inheritance is confusing in the diagram.

Original

Situation

Situation After

Inheritance Explanation

Diverging flows without names inherit the name of an incoming

flow with a name. If the incoming flow has several names, each

diverging flow inherits all of them.

Converging flows without names inherit the name of an outgoing

flow with a name. If the outgoing flow has several names, each

converging flow inherits all of them.

Flows connected to a data store, control store, message queue,

message box, event queue, or event flag inherit the name of that

node.

A forked (converging or diverging) data flow is either a split or merging data flow, or a composite data flow. A composite data flow has one name for each branch. A composite flow can split into the original flows again. A split or a merging data flow has only one name.

The name of the flow is taken as type name if no data type is specified

18 | P a g e

Process Notations

Yourdon and Coad

Process Notations

Gane and Sarson

Process Notation

Datastore Notations

Yourdon and Coad

Datastore Notations

Gane and Sarson

Datastore Notations

Dataflow Notations

External Entity Notations

19 | P a g e

Data Flow Diagram Layers

Draw data flow diagrams in several nested layers. A single process node on a high level

diagram can be expanded to show a more detailed data flow diagram. Draw the context

diagram first, followed by various layers of data flow diagrams.

The nesting of data flow layers

20 | P a g e

Context Diagrams

A context diagram is a top level (also known as Level 0) data flow diagram. It only contains

one process node (process 0) that generalizes the function of the entire system in relationship

to external entities.

DFD levels

The first level DFD shows the main processes within the system. Each of these processes can

be broken into further processes until you reach pseudocode.

An example first-level data flow diagram

21 | P a g e

Key

Primary key

Most DBMSs require a table to be defined as having a single unique key, rather than a number of possible unique keys. A designated for this purpose. Primary Key

Secondary/Foreign key

Secondary key is a key which reference to the Primary key which exists in the other table. It is necessary to make the relationships.

Data Redundancy

Data Redundancy refers to a data organization act that duplicates your

database. To make any changes or modification in the redundant data, you are supposed to make

changes in the multiple fields of the database. While this is a general behaviour for Spreadsheet and

Flat File Database structure, it overwhelms the function of relation

The data connections should allow you to keep up and maintain just one data field, only at one

location, and make the database the main relational model that would be responsible for any changes,

across the data base. The redundant

problem for the maintenance of the database.

The database software removes the data redundancy by centralizing the data into one database and all

the application can access the same data

Data Integrity

The database designer is responsible for incorporating elements to promote the accuracy and reliability of stored data within the database. There are many different techniques that can be used to encourage data integrity

require a table to be defined as having a single unique key, rather than a number of possible unique keys. A primary key is a key which the database designer has

Primary Key identifies the whole record.

key

Secondary key is a key which reference to the Primary key which exists in the other table. It make the relationships.

Data Redundancy refers to a data organization act that duplicates your unnecessary data within the

. To make any changes or modification in the redundant data, you are supposed to make


Flat File Database structure, it overwhelms the function of relational database structure.



across the data base. The redundant database utilizes lot of place unnecessarily and also creates

problem for the maintenance of the database.

the data redundancy by centralizing the data into one database and all

the application can access the same data

he database designer is responsible for incorporating elements to promote the accuracy and reliability of stored data within the database. There are many different techniques that can be

data integrity, with some of these dependants on what database technology

require a table to be defined as having a single unique key, rather than a is a key which the database designer has

Secondary key is a key which reference to the Primary key which exists in the other table. It

unnecessary data within the

. To make any changes or modification in the redundant data, you are supposed to make


al database structure.



database utilizes lot of place unnecessarily and also creates

the data redundancy by centralizing the data into one database and all

he database designer is responsible for incorporating elements to promote the accuracy and reliability of stored data within the database. There are many different techniques that can be

at database technology

22 | P a g e

is being used. There are different types of data integrity techniques available whilst working with Microsoft Access: 1. Referential Integrity

2. Cascade Updates & Deletes

3. Table Level Integrity

1. Field Comparisons

2. Validation Tables

Referential Integrity - part of the definition of a true relational database product is that it supports referential integrity. Referential Integrity principles may be stated by: "Every non-null foreign key value must match an existing primary key value"

If a value exists in the foreign key field of a table, then there must be a matching value in the primary key field of the table to which it is related. Referential Integrity is all about preserving the validity of the foreign key values.

Cascade Updates and Deletes

As with anything in the real world, things can alter and you will need to ensure that the database can cope with this. Code names such as DepartmentCode will get revised, and departments can close or merge, therefore we need to be able to maintain the data when changes required will violate referential integrity rules.

RDBMS products generally handle these changes through cascading updates and deletes (different products may handle this differently, and have different names and techniques for this). In some database products you may need to create rules or triggers or use an operator.

Business Rules and Levels of Enforcement

Referential Integrity is enforced at the database level, in that it controls the integrity of the data between tables. As the database designer, you can also do things at both field and table levels to help ensure data integrity. Business rules should be implemented to ensure that the data entered meets the requirements of a particular setting for the database.

Business rules should be documented as they are implemented. This should detail each rule, where and how it is implemented and enforced within the database design. Over time these rules may change, and having each and every rule documented will make it much easier to find and modify the design.

As you implement a rule, it is important that each one is tested. Does the rule give the intended result? What happens when the rule is violated?

Good application design will also give the user feedback (messages) when a rule is broken, and allow them to rectify any changes they were making.

Field Level Integrity Using Field Properties - Each of the fields that are contained in the database has properties associated with it. These properties may be referred to as elements or attributes of the field. These enable you, as the database designer, to place constraints on the values that may be entered into that field. Data Types - the most obvious constraint that can be placed on the fields in your database will be done with the selection of a data type for the field. Data types may vary by RDBMS,

23 | P a g e

however in general they will be pretty much the same; usually, you will also be able to create custom data types through code.

As you begin to collect information regarding the design of the database, you will be defining what types of data can, or should be entered into the fields that you define.

� A number or numeric data type will only allow the entry of numbers and should be used for

most fields on which calculations will be performed; it will however drop leading zeros and may

occasionally encounter rounding errors.

� A currency data type can eliminate rounding errors, but may not be as accurate as to the many

digits that a number data type can contain.

� A text field can contain basically anything, but may be limited to a certain number of

characters. It can be used for numeric data on fields where no calculations will be required, or

where the data needs to retain a leading zero(s).

� Memo data types, if available, will allow for a much larger number of characters.

� Date/Time fields are restricted to only allowing valid dates and times.

� A Boolean (Yes/No data type in Microsoft Access) will permit the entry of only one of two

values - yes/no, true/false or on/off.

Most of these data types can also be restricted further by setting allowable sizes (some may already have default values that cannot be changed). Some of the data types may also allow you to define a format, for example the amount of decimal places.

Table Level Integrity Field Comparisons - Database tables also have properties that you can use to set a validation rule on records in the table. By doing this, you can set a rule that compares the value of one field in the record to that of another value, in another field, in the same record. This rule is run before the record is saved.

An example of this would be to compare dates, as part of your business rules. You business may have a rule in place that a OrderDespatchDate must be no more than 3 days after the OrderPlacedDate. The rule would look something like:

OrderDespatchDate <= OrderPlacedDate + 3

If the rule is violated, an error message can be displayed, and the data must be amended before the record can be saved.

Validation Tables

A validation table is created to promote data integrity. Normally, a validation table will consist of two fields; one is the primary key, and the other holds the values used by some other field in the database. The validation table normally will hold a static set of values, enabling you to store a master set of values in one location and, by referencing those values instead of entering values directly into a field, you can ensure consistent values are used.

Database Normalization

Database normalization, sometimes referred to as canonical synthesis, is a technique for designing relational database tables to minimize duplication of information and, in so doing, to safeguard the database against certain types of logical or structural problems, namely data anomalies. For example, when multiple instances of a given piece of information occur in a table, the possibility exists that these instances will not be kept consistent when the data within the table is updated, leading to a loss of data integrity. A table that is sufficiently normalized is less vulnerable to problems of this kind, because its structure reflects the basic assumptions for when multiple instances of the same information should be represented by a single instance only.

24 | P a g e

Higher degrees of normalization typically involve more tables and create the need for a larger number of joins, which can reduce performance. Accordingly, more highly normalized tables are typically used in database applications involving many isolated transactions (e.g. an Automated teller machine), while less normalized tables tend to be used in database applications that need to map complex relationships between data entities and data attributes (e.g. a reporting application, or a full-text search application).

Database theory describes a table's degree of normalization in terms of normal forms of successively higher degrees of strictness. A table in third normal form (3NF), for example, is consequently in second normal form (2NF) as well; but the reverse is not always the case.

Although the normal forms are often defined informally in terms of the characteristics of tables, rigorous definitions of the normal forms are concerned with the characteristics of mathematical constructs known as relations. Whenever information is represented relationally, it is meaningful to consider the extent to which the representation is normalized.

1NF Eliminate Repeating Groups - Make a separate table for each set of related attributes, and

give each table a primary key.

2NF Eliminate Redundant Data - If an attribute depends on only part of a multi-valued key, remove

it to a separate table.

3NF Eliminate Columns Not Dependent On Key - If attributes do not contribute to a description of

the key, remove them to a separate table.

BCNF Boyce-Codd Normal Form - If there are non-trivial dependencies between candidate key

attributes, separate them out into distinct tables.

4NF Isolate Independent Multiple Relationships - No table may contain two or more 1:n or n:m

relationships that are not directly related.

5NF Isolate Semantically Related Multiple Relationships - There may be practical constrains on

information that justify separating logically related many-to-many relationships.

ONF Optimal Normal Form - a model limited to only simple (elemental) facts, as expressed in

Object Role Model notation.

DKNF Domain-Key Normal Form - a model free from all modification anomalies.

1. Eliminate Repeating Groups

In the original member list, each member name is followed by any databases that the member has experience with. Some might know many, and others might not know any. To answer the question, "Who knows DB2?" we need to perform an awkward scan of the list looking for references to DB2. This is inefficient and an extremely untidy way to store information.

Moving the known databases into a seperate table helps a lot. Separating the repeating groups of databases from the member information results in first normal form. The MemberID in

25 | P a g e

the database table matches the primary key in the member table, providing a foreign key for relating the two tables with a join operation. Now we can answer the question by looking in the database table for "DB2" and getting the list of members.

2. Eliminate Redundant Data

In the Database Table, the primary key is made up of the MemberID and the DatabaseID. This makes sense for other attributes like "Where Learned" and "Skill Level" attributes, since they will be different for every member/database combination. But the database name depends only on the DatabaseID. The same database name will appear redundantly every time its associated ID appears in the Database Table.

Suppose you want to reclassify a database - give it a different DatabaseID. The change has to be made for every member that lists that database! If you miss some, you'll have several members with the same database under different IDs. This is an update anomaly.

Or suppose the last member listing a particular database leaves the group. His records will be removed from the system, and the database will not be stored anywhere! This is a delete anomaly. To avoid these problems, we need second normal form.

To achieve this, separate the attributes depending on both parts of the key from those depending only on the DatabaseID. This results in two tables: "Database" which gives the name for each DatabaseID, and "MemberDatabase" which lists the databases for each member.

Now we can reclassify a database in a single operation: look up the DatabaseID in the "Database" table and change its name. The result will instantly be available throughout the application.

26 | P a g e

3. Eliminate Columns Not Dependent On Key

The Member table satisfies first normal form - it contains no repeating groups. It satisfies second normal form - since it doesn't have a multivalued key. But the key is MemberID, and the company name and location describe only a company, not a member. To achieve third normal form, they must be moved into a separate table. Since they describe a company, CompanyCode becomes the key of the new "Company" table.

The motivation for this is the same for second normal form: we want to avoid update and delete anomalies. For example, suppose no members from the IBM were currently stored in the database. With the previous design, there would be no record of its existence, even though 20 past members were from IBM!

BCNF. Boyce-Codd Normal Form

Boyce-Codd Normal Form states mathematically that: A relation R is said to be in BCNF if whenever X -> A holds in R, and A is not in X, then X is a candidate key for R.

BCNF covers very specific situations where 3NF misses inter-dependencies between non-key (but candidate key) attributes. Typically, any relation that is in 3NF is also in BCNF. However, a 3NF relation won't be in BCNF if (a) there are multiple candidate keys, (b) the keys are composed of multiple attributes, and (c) there are common attributes between the keys.

27 | P a g e

Basically, a humorous way to remember BCNF is that all functional dependencies are: "The key, the whole key, and nothing but the key, so help me Codd."

4. Isolate Independent Multiple Relationships

This applies primarily to key-only associative tables, and appears as a ternary relationship, but has incorrectly merged 2 distinct, independent relationships.

The way this situation starts is by a business request list the one shown below. This could be any 2 M:M relationships from a single entity. For instance, a member could know many software tools, and a software tool may be used by many members. Also, a member could have recommended many books, and a book could be recommended by many members.

Initial business request

So, to resolve the two M:M relationships, we know that we should resolve them separately, and that would give us 4th normal form. But, if we were to combine them into a single table, it might look right (it is in 3rd normal form) at first. This is shown below, and violates 4th normal form.

Incorrect solution

To get a picture of what is wrong, look at some sample data, shown below. The first few records look right, where Bill knows ERWin and recommends the ERWin Bible for everyone to read. But something is wrong with Mary and Steve. Mary didn't recommend a book, and Steve Doesn't know any software tools. Our solution has forced us to do strange things like create dummy records in both Book and Software to allow the record in the association, since it is key only table.

28 | P a g e

Sample data from incorrect solution

The correct solution, to cause the model to be in 4th normal form, is to ensure that all M:M relationships are resolved independently if they are indeed independent, as shown below.

Correct 4th normal form

NOTE! This is not to say that ALL ternary associations are invalid. The above situation made it obvious that Books and Software were independently linked to Members. If, however, there were distinct links between all three, such that we would be stating that "Bill recommends the ERWin Bible as a reference for ERWin", then separating the relationship into two separate associations would be incorrect. In that case, we would lose the distinct information about the 3-way relationship.

5. Isolate Semantically Related Multiple Relationships

OK, now lets modify the original business diagram and add a link between the books and the software tools, indicating which books deal with which software tools, as shown below.

Initial business request

This makes sense after the discussion on Rule 4, and again we may be tempted to resolve the multiple M:M relationships into a single association, which would now violate 5th normal form. The ternary association looks identical to the one shown in the 4th normal form

29 | P a g e

example, and is also going to have trouble displaying the information correctly. This time we would have even more trouble because we can't show the relationships between books and software unless we have a member to link to, or we have to add our favorite dummy member record to allow the record in the association table.

Incorrect solution

The solution, as before, is to ensure that all M:M relationships that are independent are resolved independently, resulting in the model shown below. Now information about members and books, members and software, and books and software are all stored independently, even though they are all very much semantically related. It is very tempting in many situations to combine the multiple M:M relationships because they are so similar. Within complex business discussions, the lines can become blurred and the correct solution not so obvious.

Correct 5th normal form

6. Optimal Normal Form

At this point, we have done all we can with Entity-Relationship Diagrams (ERD). Most people will stop here because this is usually pretty good. However, another modeling style called Object Role Modeling (ORM) can display relationships that cannot be expressed in ERD. Therefore there are more normal forms beyond 5th. With Optimal Normal Form (OMF) It is defined as a model limited to only simple (elemental) facts, as expressed in ORM.

7. Domain-Key Normal Form

This level of normalization is simply a model taken to the point where there are no opportunities for modification anomalies.

30 | P a g e

• "if every constraint on the relation is a logical consequence of the definition of keys and domains"

• Constraint "a rule governing static values of attributes"

• Key "unique identifier of a tuple"

• Domain "description of an attribute’s allowed values"

1. A relation in DK/NF has no modification anomalies, and conversely. 2. DK/NF is the ultimate normal form; there is no higher normal form related to

modification anomalies 3. Defn: A relation is in DK/NF if every constraint on the relation is a logical

consequence of the definition of keys and domains. 4. Constraint is any rule governing static values of attributes that is precise enough to be

ascertained whether or not it is true 5. E.g. edit rules, intra-relation and inter-relation constraints, functional and multi-

valued dependencies. 6. Not including constraints on changes in data values or time-dependent constraints. 7. Key - the unique identifier of a tuple. 8. Domain: physical and a logical description of an attributes allowed values. 9. Physical description is the format of an attribute. 10. Logical description is a further restriction of the values the domain is allowed 11. Logical consequence: find a constraint on keys and/or domains which, if it is

enforced, means that the desired constraint is also enforced. 12. Bottom line on DK/NF: If every table has a single theme, then all functional

dependencies will be logical consequences of keys. All data value constraints can them be expressed as domain constraints.

13. Practical consequence: Since keys are enforced by the DBMS and domains are enforced by edit checks on data input, all modification anomalies can be avoided by just these two simple measures.

Components of DBMS

Data dictionary/directory

Database management systems, a file that defines the basic organization of a database. A data

dictionary contains a list of all files in the database, the number of records in each file, and

the names and types of each field. Most database management systems keep the data

dictionary hidden from users to prevent them from accidentally destroying its contents.

Data dictionaries do not contain any actual data from the database, only bookkeeping information for managing it. Without a data dictionary, however, a database management system cannot access data from the database.

Data languages

To define the entries in the data dictionary special language is used which is known as DDL

(Data Definition Language or Data Description Language). This language is concerned with

the database administrators

31 | P a g e

Teleprocessing monitors

A teleprocessing monitor is a communication software package that manages

communications between the database and remote terminals. The example is a transaction

from a remote terminal to the database. The teleprocessing monitors is normally a part of

DBMS

Application development system

An application development system is a set of programs designed to help programmers in

developing the applications that use the database. Application development system may or

may not be a component of a DBMS. For example “Oracle Forms” is a application

development package shipped with Oracle DBMS

Security software

A security software package provides a variety of tools to protect the database from

unauthorized access. Security software also protects data from getting corrupt of being

damaged. Oracle corporation claims to have absolute security on their DBMS

Archiving and recovery system

Archiving or Backup provides a way to make copies of the database, which can be used in

case the original database records are damaged. Recovery system restores damaged the data

from its copy

Report writers

A report writer allows programmers, managers and other users to design output reports

without writing much code in any programming language

SQL and other Query languages

A query language consists of set of commands used for updating, inserting, and deleting

records from a database. SQL (Structured Query Language) is a standard query language that

has become a standard for about all the Database management systems

Documents

MIS- Database Management Systems