DATABSE MANAGEMENT - NIILM University DATABASE MANAGEMENT Lesson No. Topic Page No. Lesson 31 LAB 90 Lesson 32 Database Cursors 91 Lessom 33 LAB 100 Lesson 34 LAB 101 Lesson 35 Normalisation

managementMEDIAHEALTH

lawD

ESIGN

EDU

CAT

ION

MU

SICagriculture

LA

NG

UA

GEM E C H A N I C S

psychology

BIOTECHNOLOGY

GEOGRAPHY

ARTPHYSICS

history

ECOLOGY

CHEMISTRY

math

ematicsENGINEERING

DATABSE MANAGEMENT

Subject: DATABSE MANAGEMENT Credits: 4

SYLLABUS

Introduction to data base management system – Data versus information, record, file; data dictionary, database

administrator, functions and responsibilities; file-oriented system versus database system

Database system architecture – Introduction, schemas, sub schemas and instances; data base architecture, data

independence, mapping, data models, types of database systems

Data base security – Threats and security issues, firewalls and database recovery; techniques of data base

security; distributed data base

Data warehousing and data mining – Emerging data base technologies, internet, database, digital libraries,

multimedia data base, mobile data base, spatial data base

Lab: Working over Microsoft Access

Suggested Readings:

1. A Silberschatz, H Korth, S Sudarshan, “Database System and Concepts”; fifth Edition; McGraw-Hill

2. Rob, Coronel, “Database Systems ”, Seventh Edition, Cengage Learning

ii

DATABASE MANAGEMENT

DATABASE MANAGEMENT SYSTEM

COURSE OVERVIEW

This course provides an immediately useable tools and thetechniques in the methods of database management system,requirements analysis, definition, specification and design etc. Itprovides participants with the details if the tools, techniquesand methods to lead or participate in the front-end phases.

Database systems are designed to manage large bodies ofinformation.Management of data involves both definingstructures for storage of information & providing way formanipulation of data. In addition, the database system mustensure safety of data.

DBMS is collection of programs that enables you to store,modify, and extract important information from a database.There are many different types of DBMS, ranging from smallsys-tems that run on personal computers to huge systems thatrun on mainframes.

Objectives• To help you to learn DBMS and design technique: what it is

and how one goes about doing it.

• The primary goal of DBMS is to provide an environmentthat is both convenient & efficient for people to use inretrieving & storing information..

• Database systems are designed to store large bodies ofinformation.

By the end of this material, you will be equipped with goodknowledge of technical information that will help you develop& understand DBMS

The students on completion of the course shall develop thefollowing skills and competencies:

• Database

• DBMS

• Database System Application

• File System

• Data Ionsistency

v

DATABASE MANAGEMENT

Lesson No. Topic Page No.

Lesson 1 Introduction to Database I 1Lesson 2 Introduction to Database II 5Lesson 3 Tutorial 8Lesson 4 Database Concepts I 9Lesson 5 Database Concepts II 13

Lesson 6 Data Models 17

Lesson 7 Relational Database Management System I 21Lesson 8 Relational Database Management System II 27

Lesson 9 E-R Model I 31Lesson 10 E-R Model II 36

Lesson 11 Structured Query Language(SQL) I 40

Lesson 12 LAB 45

Lesson 13 Lab 46

Lesson 14 SQL II 47

Lesson 15 LAB 55

Lesson 16 LAB 56

Lesson 17 SQL III 57

Lesson 18 Lab 62

Lesson 19 Lab 63

Lesson 20 SQL IV 64

Lesson 21 Lab 67

Lesson 22 Lab 68

Lesson 23 Integrity and security 69

Lesson 24 LAB 75

Lesson 25 LAB 76

Lesson 26 PL/SQL 77

Lesson 27 Lab 82

Lesson 28 Lab 83

Lesson 29 Database Triggers 84

Lesson 30 LAB 89

CONTENT

DATABASE MANAGEMENT

RELATIONAL DATA BASE MANAGEMENT SYSTEM

STRUCTURED QUERY LANGUAGES

INTRODUCTION TO DBMS

DATA MODELS IN DATABASES

vi

DATABASE MANAGEMENT

Lesson No. Topic Page No.

Lesson 31 LAB 90

Lesson 32 Database Cursors 91Lessom 33 LAB 100Lesson 34 LAB 101

Lesson 35 Normalisation I 102Lesson 36 Normalisation I I 107

Lesson 37 Normalisation III 112

Lesson 38 File Organization Method I 118Lesson 39 File Organization Method II 123

Lesson 40 Transactions Management 130Lesson 41 Concurrency Control I 136

Lesson 43 Concurrency Control III 146

CONTENT

DATABASE MANAGEMENT

NORMALIZATION

FILE ORGANIZATION METHODS

Lesson 42 Concurency Control II 141

DATA BASE OPERATIONAL MAINTENANCE

Lesson 44 Database Recovery 152

1

DATABASE MANAGEMENT

LESSON 1:

INTRODUCTION TO DATABASE I

Lessons Objective

• Database• Database management system• Essentials of data• Benefits of DBMS• Database system application• Purpose of database system

1.1 What is Database ManagementSystem(DBMS)A database can be termed as a repository of data. A collectionof actual data which constitutes the information regarding anorganisation is stored in a database. For ex. There are 1000students in a college & we have to store their personal details,marks details etc., these details will be recorded in a database.A collection of programs that enables you to store, modify, andextract information from a database is known as DBMS.Theprimary goal of a DBMS is to provide a way to store & retrievedatabase information that is both convenient & efficient.

Database systems are designed to manage large bodies ofinformation.Management of data involves both definingstructures for storage of information & providing way formanipulation of data. In addition, the database system mustensure safety of data.DBMS is collection of programs that enables you to store,modify, and extract important information from a database.There are many different types of DBMS, ranging from smallsys-tems that run on personal computers to huge systems thatrun on mainframes.Good data management is an essential prerequisite to corporatesuccess.Data InformationInformation KnowledgeKnowledge JudgmentJudgment DecisionDecision Successprovided that data is:• complete• accurate• timely• easily available

1.2 Database System ApplicationsThere are many different types of DBMSs, ranging from smallsystems that run on personal computers to huge systems thatrun on mainframes. Database are applied in wide no. ofapplications. Following are some of the examples :-• Banking: For customer information, accounts, loans & other

banking transactions• Airlines: For reservation & schedule information• Universities: For student information, course

registration,grades etc.• Credit card transaction: For purchase of credit cards &

generation of monthly statements.• Tlecommunication: For keeping records of calls made ,

generating monthly billetc.• Finance: For storing information about holdings, sales &

purchase of financial statements• Sales: For customer,product & purchase information• Manufacturing: For management of supply chain.• Human Resource: For recording information about

employees,salaries,tax,benefits etc.We can say that when ever we need to have a computerisedsystem, we need a database system

1.3 Purpose of Database systemA file system is one in which we keep the information inoperating system files. Before the evolution of DBMS,organisations used to store information in file systems. Atypical file processing system is supported by a conventionaloperating system. The system stores permanent records invarious files & it need application program to extract records ,or to add or delete records .We will compare both systems with the help of an example.There is a saving bank enterprise that keeps information aboutall customers & saving accounts. Following manipulations hasto be done with the system• A program to debit or credit an account• A program to add a new account.• A program to find balance of an account.• A program to generate monthly statements.As the need arises new applications can be added at a particularpoint of time as checking accounts can be added in a savingaccount. Using file system for storing data has got followingdisadvantages:-

1. Data Redundancy and InconsistencyDifferent programmers work on a single project , so variousfiles are created by different programmers at some interval oftime. So various files are created in different formats & differentprograms are written in different programming language.

Actual data

storage

2

DATABASE MANAGEMENT

Same information is repeated.For ex name & address mayappear in saving account file as well as in checking account. Thisredundancy sesults in higher storage space & access cost.It alsoleads to data inconsistency which means that if we change somerecord in one place the change will not be reflected in all theplaces. For ex. a changed customer address may be reflected insaving record but not any where else.

2. Difficulty in Accesing dataAccessing data from a list is also a difficulty in filesystem.Suppose we want to see the records of all customerswho has a balance less than $10,000, we can either check the list& find the names manually or write an application program .Ifwe write an application program & at some later time, we needto see the records of customer who have a balance of less than$20,000, then again a new program has to be written.It means that file processing system do not allow data to beaccessed in a convenient manner.

3. Data IsolationAs the data is stored in various files, & various files may bestored in different format, writing application program toretrieve the data is difficult.

4. Integrity ProblemsSometimes, we need that data stored should satisfy certainconstraints as in a bank a minimum deposit should be of $100.Developers enforce these constraints by writing appropriateprograms but if later on some new constraint has to be addedthen it is difficult to change the programs to enforce them.

5. Atomicity ProblemsAny mechanical or electrical devive is subject to failure, and so isthe computer system. In this case we have to ensure that datashould be restored to a consistent state.For example an amountof $50 has to be transferred from Account A to Account B. Letthe amount has been debited from account A but have notbeen credited to Account B and in the mean time, some failureoccurred. So, it will lead to an inconsistent state.So,we have to adopt a mechanism which ensures that either fulltransaction hould b executed or no transaction should beexcuted i.e. the fund transfer should be atomic.

6. Concurrent access ProblemsMany systems allows multiple users to update the datasimultaneously. It can also lead the data in an inconsistentstate.Suppose a bank account contains a balance of $ 500 & twocustomers want to withdraw $100 & $50 simultaneously. Boththe transaction reads the old balance & withdraw from that oldbalance which will result in $450 & &400 which is incorrect.

7.Security ProblemsAll the user of database should not be able to access all the data.For example a payrollPersonnel needs to access only that part of data which hasinformation about various employees & are not needed toaccess information about customer accounts.

Points to Ponder

• A DBMS contains collection of inter-related data &collection of programs to access the data.

• The primary goal of DBMS is to provide an environmentthat is both convenient & efficient for people to use inretrieving & storing information.

• DBMS systems are ubiquitous today & most peopleinteract either directly or indirectly with database many timesevery day.


• A major purpose of a DBMS is to provide users with anabstract view of data i.e. the system hides how the data isstored & maintained.

Review Terms

• Database• DBMS• Database System Application• File System• Data Ionsistency• Consistency constraints• Atomicity• Redundancy• Data isolation• Data Security

Students Activity

1. What is database?Explain with example?

2. What is DBMS?Explain with example?

3. List four significant difference between file system &DBMS?

3

DATABASE MANAGEMENT

4. What are the advantages of DBMS?

5. Explain various applications of database?

6. Explain data inconsistency with example?

7. Explain data security? Why it is needed?Explain withexample?

8. Explain isolation & atomicity property of database?

9. Explain why redundancy should be avoided in database?

10. Explain consistency constraints in database?

4

DATABASE MANAGEMENT

Student Notes

5

DATABASE MANAGEMENT

Lesson Objective

• Data abstraction• View of data• Levels of data• Physical level• Logical level• View level• Database language• DDL,DML

View of DataA database contains a no. of files & certain programs to access &modify these files.But the actual data is not shown to the user,the system hides actual details of how data is stored & main-tained.

Data AbstractionData abstraction is the process of distilling data down to itsessentials. The data when needed should be retrieved efficiently.As all the details are not of use for all the users, so we hide theactual(complex) details from users. Various level of abstractionto data is provided which are listed below:-• Physical level

It is the lowest level of abstraction & specifies how the datais actually stored. It describes the complex data structure indetails.

• Logical levelIt is the next level of abstraction & describes what data arestored in database & what relationship exists between variusdata. It is less complex than physical level & specifies simplestructures. Though the complexity of physical level isrequired at logical level, but users of logical level need notknow these complexities.

LESSON 2:

INTRODUCTION TO DATABASE II

• View levelThis level contains the actual data which is shown to theusers. This is the highest level of abstraction & the user ofthis level need not know the actual details(complexity) ofdata storage.

Database LanguageAs a language is required to understand any thing, similarly tocreate or manipulate a database we need to learn alanguage.Database language is divided into mainly 2 parts :-1. DDL(Data definition language)2. DML(Data Manipulation language)

Data Definition Language (DDL)Used to specify a database scheme as a set of definitionsexpressed in a DDL1. DDL statements are compiled, resulting in a set of tables

stored in a special file called a data dictionary or datadirectory.

2. The data directory contains metadata (data about data)3. The storage structure and access methods used by the

database system are specified by a set of definitions in a

6

DATABASE MANAGEMENT

special type of DDL called a data storage and definitionlanguage

4. basic idea: hide implementation details of the databaseschemes from the users

Data Manipulation Language (DML)

1. Data Manipulation is:• retrieval of information from the database• insertion of new information into the database• deletion of information in the database• modification of information in the database

2. A DML is a language which enables users to access andmanipulate data. The goal is to provide efficient humaninteraction with the system.

3. There are two types of DML:• procedural: the user specifies what data is needed and

how to get it• nonprocedural: the user only specifies what data is

needed• Easier for user• May not generate code as efficient as that produced

by procedural languages4. A query language is a portion of a DML involving

information retrieval only. The terms DML and querylanguage are often used synonymously.

Points to Ponder

• DBMS systems are ubiquitous today & most peopleinteract either directly or indirectly with database many timesevery day.


• A major purpose of a DBMS is to provide users with anabstract view of data i.e. the system hides how the data isstored & maintained.

• Structure of a database is defined through DDL. &manipulated through DML.

• DDL statements are compiled, resulting in a set of tablesstored in a special file called a data dictionary or datadirectory.

• A query language is a portion of a DML involvinginformation retrieval only. The terms DML and querylanguage are often used synonymously.

Review Terms

• Data Security• Data Views• Data Abstraction• Physical level• Logical level• View level• Database language

• Ddl

• Dml• Query language

Students Activity

1. Define data abstraction?

2. How many views of data abstraction are there? Explain indetails?

3. Explain database language? Differentiate between DDL &DML?

7

DATABASE MANAGEMENT

Student Notes

8

DATABASE MANAGEMENT

LESSON 3:

TUTORIAL

9

DATABASE MANAGEMENT

Lesson objective

• Data dictionary• Meta data• Database schema• Database Instance• Data independence

Data DictionaryEnglish language dictionaries define data in terms such as“…known facts or things used as a basis for inference orreckoning, typically (in modern usage) operated upon ormanipulated by computers, or Factual information used as abasis for discussion, reasoning, or calculationData dictionary may cover the whole organisation, a part of theorganisation or a database. In its simplest form, the datadictionary is only a collection of data element definitions. Moreadvanced data dictionary contains database schema withreference keys, still more advanced data dictionary containsentity-relationship model of the data elements or objects.

Parts of Data Dictionary

1. Data Element DefinitionsData element definitions may be independent of table defini-tions or a part of each table definition• Data element number

Data element number is used in the technical documents.• Data element name (caption)

Commonly agreed, unique data element name from theapplication domain. This is the real life name of this dataelement.

• Short descriptionDescription of the element in the application domain.

• Security classification of the data elementOrganisation-specific security classification level or possiblerestrictions on use. This may contain technical links tosecurity systems.

• Related data elements

List of closely related data element names when the relationis important.

• Field name(s)Field names are the names used for this element incomputer programs and database schemas. These are thetechnical names, often limited by the programminglanguages and systems.

• Code formatData type (characters, numeric, etc.), size and, if needed,special representation. Common programming languagenotation, input masks, etc. can be used.

• Null value allowedNull or non-existing data value may be or may not beallowed for an element. Element with possible null valuesneeds special considerations in reports and may causeproblems, if used as a key.

• Default valueData element may have a default value. Default value maybe a variable, like current date and time of the day (DoD).

• Element coding (allowed values) and intra-elementvalidation details or reference to other documentsExplanation of coding (code tables, etc.) and validationrules when validating this element alone in the applicationdomain.

• Inter-element validation details or reference to otherdocumentsValidation rules between this element and other elements inthe data dictionary.

• Database table referencesReference to tables the element is used and the role of theelement in each table. Special indication when the dataelement is the key for the table or a part of the key.

• Definitions and references needed to understand themeaning of the elementShort application domain definitions and references toother documents needed to understand the meaning anduse of the data element.

• Source of the data in the elementShort description in application domain terms, where thedata is coming. Rules used in calculations producing theelement values are usually written here.

• Validity dates for the data element definitionValidity dates, start and possible end dates, when theelement is or was used. There may be several time periodsthe element has been used.

• History referencesDate when the element was defined in present form,references to superseded elements, etc.

• External referencesReferences to books, other documents, laws, etc.

• Version of the data element documentVersion number or other indicator. This may include formalversion control or configuration management references,but such references may be hidden, depending on thesystem used.

• Date of the data element documentWritting date of this version of the data elementdocument.

LESSON 4:

DATABASE CONCEPTS I

10

DATABASE MANAGEMENT

• Quality control referencesOrganisation-specific quality control endorsements, dates,etc.

• Data element notesShort notes not included in above parts.

Table DefinitionsTable definition is usually available with SQL command helptable

Tablename

• Table name• Table owner or database name• List of data element (column) names and details• Key order for all the elements, which are possible keys• Possible information on indexes• Possible information on table organisation

Technical table organisation, like hash, heap, B+ -tree, AVL -tree, ISAM, etc. may be in the table definition.

• Duplicate rows allowed or not allowed• Possible detailed data element list with complete data

element definitions• Possible data on the current contents of the table

The size of the table and similar site specific informationmay be kept with the table definition.

• Security classification of the tableSecurity classification of the table is usually same or higherthan its elements. However, there may be views accessingparts of the table with lower security.

Database schemaIt is the overall structure is called a database schema. Databaseschema is usually graphical presentation of the whole database.Tables are connected with external keys and key colums. Whenaccessing data from several tables, database schema will beneeded in order to find joining data elements and in complexcases to find proper intermediate tables. Some databaseproducts use the schema to join the tables automatically.Database system has several schemas according to the level ofabstraction.The physical schema describes the database design atphysical level. The logical schema describes the database designat logical level.A database can also have sub-schemas(view level)that describes different views of database.

Database Instance

1. Databases change over time.2. The information in a database at a particular point in time is

called an instance of the database3. Analogy with programming languages:

• Data type definition - scheme• Value of a variable - instance

Meta-DataMeta-data is definitional data that provides information aboutor documentation of other data managed within an applicationor environment.

For example, meta-data would document data about dataelements or attributes, (name, size, data type, etc) and dataabout records or data structures (length, fields, columns, etc)and data about data (where it is located, how it is associated,ownership, etc.). Meta-data may include descriptive informationabout the context, quality and condition, or characteristics of thedata.

Data Independence

1. The ability to modify a scheme definition in one levelwithout affecting a scheme definition in a higher level iscalled data independence.

2. There are two kinds:• Physical data independence

• The ability to modify the physical scheme withoutcausing application programs to be rewritten

• Modifications at this level are usually to improveperformance

• Logical data independence

• The ability to modify the conceptual schemewithout causing application programs to berewritten

• Usually done when logical structure of database isaltered

3. Logical data independence is harder to achieve as theapplication programs are usually heavily dependent on thelogical structure of the data. An analogy is made to abstractdata types in programming languages.

Points to Ponder

• Data dictionary is a collection of data elements & itsdefinition.

• Database Schema is the overall structure of a database.• Database instance is the structure of a database at a

particular time.• Meta-data is the data about data.• The ability to modify a scheme definition in one level

without affecting a scheme definition in a higher level iscalled data independence.

Review Terms

• Database Instance• Schema

Database SchemaPhysical schemaLogical schema

• Physical data independence• Database Language

DDLDMLQuery Language

• Data dictionary• Metadata

11

DATABASE MANAGEMENT

Student Activity

1. What is difference between database Schema & databaseinstance?

2. What do you understand by the structure of a database?

3. Define physical schema and logical schema?

4. Define data independence?Explain types of dataindependence?

5. Define data dictionary, meta-data?

6. Define various elements of data dictionary?

12

DATABASE MANAGEMENT

Student Notes

13

DATABASE MANAGEMENT

Lesson Objective

• Database manager• Database user• Database administrator• Role of Database administrator• Role of Database user• Database architecture

Database ManagerThe database manager is a program module which provides theinterface between the low-level data stored in the database andthe application programs and queries submitted to the system.1. Databases typically require lots of storage space (gigabytes).

This must be stored on disks. Data is moved between diskand main memory (MM) as needed.

2. The goal of the database system is to simplify and facilitateaccess to data. Performance is important. Views providesimplification.

3. So the database manager module is responsible for• Interaction with the file manager: Storing raw data

on disk using the file system usually provided by aconventional operating system. The database managermust translate DML statements into low-level filesystem commands (for storing, retrieving andupdating data in the database).

• Integrity enforcement: Checking that updates in thedatabase do not violate consistency constraints (e.g. nobank account balance below $25)

• Security enforcement: Ensuring that users only haveaccess to information they are permitted to see

• Backup and recovery: Detecting failures due topower failure, disk crash, software errors, etc., andrestoring the database to its state before the failure

• Concurrency control: Preserving data consistencywhen there are concurrent users.

4. Some small database systems may miss some of thesefeatures, resulting in simpler database managers. (Forexample, no concurrency is required on a PC running MS-DOS.) These features are necessary on larger systems

Database AdministratorThe database administrator is a person having central controlover data and programs accessing that data. Duties of thedatabase administrator include:• Scheme definition: the creation of the original database

scheme. This involves writing a set of definitions in a DDL(data storage and definition language), compiled by theDDL compiler into a set of tables stored in the datadictionary.

• Storage structure and access method definition: writinga set of definitions translated by the data storage anddefinition language compiler

• Scheme and physical organization modification: writinga set of definitions used by the DDL compiler to generatemodifications to appropriate internal system tables (e.g. datadictionary). This is done rarely, but sometimes the databasescheme or physical organization must be modified.

• Granting of authorization for data access: grantingdifferent types of authorization for data access to varioususers

• Integrity constraint specification: generating integrityconstraints. These are consulted by the database managermodule whenever updates occur.

Database UsersThe database users fall into several categories:Application programmers are computer professionals interact-ing with the system through DML calls embedded in a programwritten in a host language (e.g. C, PL/1, Pascal).• These programs are called application programs.• The DML precompiler converts DML calls (prefaced by a

special character like $, #, etc.) to normal procedure calls in ahost language.

• The host language compiler then generates the object code.• Some special types of programming languages combine

Pascal-like control structures with control structures for themanipulation of a database.

• These are sometimes called fourth-generation languages.• They often include features to help generate forms and

display data.• Sophisticated users interact with the system without

writing programs.• They form requests by writing queries in a database query

language.• These are submitted to a query processor that breaks a

DML statement down into instructions for the databasemanager module.• Specialized users are sophisticated users writing

special database application programs. These may beCADD systems, knowledge-based and expert systems,complex data systems (audio/video), etc.

• Naive users are unsophisticated users who interactwith the system by using permanent applicationprograms (e.g. automated teller machine).

1. Database systems are partitioned into modules for differentfunctions. Some functions (e.g. file systems) may beprovided by the operating system.

LESSON 5:

DATABASE CONCEPTS II

14

DATABASE MANAGEMENT

2. Components include:• File manager manages allocation of disk space and

data structures used to represent information on disk.• Database manager: The interface between low-level

data and application programs and queries.• Query processor translates statements in a query

language into low-level instructions the databasemanager understands. (May also attempt to find anequivalent but more efficient form.)

• DML precompiler converts DML statementsembedded in an application program to normalprocedure calls in a host language. The precompilerinteracts with the query processor.

• DDL compiler converts DDL statements to a set oftables containing metadata stored in a data dictionary.

In addition, several data structures are required for physicalsystem implementation:• Data files: store the database itself.• Data dictionary: stores information about the structure of

the database. It is used heavily. Great emphasis should beplaced on developing a good design and efficientimplementation of the dictionary.

• Indices: provide fast access to data items holding particularvalues.

Database System ArchitectureDatabase systems are partitioned into modules for differentfunctions. Some functions (e.g. file systems) may be providedby the operating system.

Components Include

• File manager manages allocation of disk space and datastructures used to represent information on disk.

• Database manager: The interface between low-level dataand application programs and queries.

• Query processor translates statements in a query languageinto low-level instructions the database managerunderstands. (May also attempt to find an equivalent butmore efficient form.)

• DML precompiler converts DML statements embeddedin an application program to normal procedure calls in ahost language. The precompiler interacts with the queryprocessor.

• DDL compiler converts DDL statements to a set of tablescontaining metadata stored in a data dictionary.

In addition, several data structures are required for physicalsystem implementation:• Data files: store the database itself.• Data dictionary: stores information about the structure of

the database. It is used heavily. Great emphasis should beplaced on developing a good design and efficientimplementation of the dictionary.

• Indices: provide fast access to data items holding particularvalues.

DML PreCompiler Query processor

DDL compiler

Application Program object code

Database Manager

File Manager

Data files

Data dictionary Data

storage

Naïve Users

Application Programs

Sophisticated Users

Database Administrator

Application Interfaces

Application programs

Query Database Schema

Points to Ponder

• Database manager is a program module which providesthe interface between the low-level data stored in thedatabase and the application programs

• Database administrator is a person having central controlover data

• Database user is a person who access the database at variouslevel.

• Data files: store the database itself.• Data dictionary: stores information about the structure of

the database.• DML precompiler converts DML statements embedded

in an application program to normal procedure calls in ahost language.

• File manager manages allocation of disk space and datastructures used to represent information on disk.

Review Terms

• Database Instance• Schema

Database SchemaPhysical schemaLogical schema

• Physical data independence

15

DATABASE MANAGEMENT

• Database LanguageDDLDMLQuery Language

• Data dictionary• Metadata• Database Administrator• Database User

Student Activity

1. What are the various kinds of database users?

2. What do you understand by the structure of a database?

3. Define physical schema and logical schema?

4. Define file manager, dml compiler,data files?

16

DATABASE MANAGEMENT

Student Notes

17

DATABASE MANAGEMENT

LESSON 6:

DATA MODELS

Lesson objective

• Understsnding data models• Different types of data models• Hierchachical data model• network data model• Relational modelData models are a collection of conceptual tools for describingdata, data relationships, data semantics and data constraints.A data model is a “description” of both a container for dataand a methodology for storing and retrieving data from thatcontainer. Actually, there isn’t really a data model “thing”. Datamodels are abstractions, oftentimes mathematical algorithmsand concepts. You cannot really touch a data model. Butnevertheless, they are very useful. The analysis and design ofdata models has been the cornerstone of the evolution ofdatabases. As models have advanced so has database efficiency.There are various kinds of data models i.e. in a database recordscan be arranged in various ways.The various ways in which datacan be represented are:-1. Hierarchial data model2. Network data model3. Relational Model4. E-R-Model

The Hierarchical ModelOrganization of the records is as a collection of trees. As itsname implies, the Hierarchical Database Model defines hierarchi-cally-arranged data.Perhaps the most intuitive way to visualize this type ofrelationship is by visualizing an upside down tree of data. Inthis tree, a single table acts as the “root” of the database fromwhich other tables “branch” out.You will be instantly familiar with this relationship because thatis how all windows-based directory management systems (likeWindows Explorer) work these days.Relationships in such a system are thought of in terms ofchildren and parents such that a child may only have one parentbut a parent can have multiple children. Parents and children aretied together by links called “pointers” (perhaps physicaladdresses inside the file system). A parent will have a list ofpointers to each of their children.

If we want to create a structure where in a course variousstudents are there & these students are given certain marks inassignment.

However, as you can imagine, the hierarchical database modelhas some serious problems. For one, you cannot add a recordto a child table until it has already been incorporated into theparent table. This might be troublesome if, for example, youwanted to add a student who had not yet signed up for anycourses.Worse, yet, the hierarchical database model still creates repetitionof data within the database. You might imagine that in thedatabase system shown above, there may be a higher level thatincludes multiple course. In this case, there could be redundancybecause students would be enrolled in several courses and thuseach “course tree” would have redundant student information.Redundancy would occur because hierarchical databases handleone-to-many relationships well but do not handle many-to-many relationships well. This is because a child may only haveone parent. However, in many cases you will want to have thechild be related to more than one parent. For instance, therelationship between student and class is a “many-to-many”.Not only can a student take many subjects but a subject mayalso be taken by many students. How would you model thisrelationship simply and efficiently using a hierarchical database?The answer is that you wouldn’t.

Though this problem can be solved with multiple databasescreating logical links between children, the fix is very kludgy andawkward.Faced with these serious problems, the computer brains of theworld got together and came up with the network model.

Network DatabasesIn many ways, the Network Database model was designed tosolve some of the more serious problems with the HierarchicalDatabase Model. Specifically, the Network model solves theproblem of data redundancy by representing relationships interms of sets rather than hierarchy. The model had its origins in

18

DATABASE MANAGEMENT

the Conference on Data Systems Languages (CODASYL) whichhad created the Data Base Task Group to explore and design amethod to replace the hierarchical model.The network model is very similar to the hierarchical modelactually. In fact, the hierarchical model is a subset of the networkmodel. However, instead of using a single-parent tree hierarchy,the network model uses set theory to provide a tree-likehierarchy with the exception that child tables were allowed tohave more than one parent. This allowed the network model tosupport many-to-many relationships.Visually, a Network Database looks like a hierarchical Databasein that you can see it as a type of tree. However, in the case of aNetwork Database, the look is more like several trees whichshare branches. Thus, children can have multiple parents andparents can have multiple children.

Nevertheless, though it was a dramatic improvement, thenetwork model was far from perfect. Most profoundly, themodel was difficult to implement and maintain. Most imple-mentations of the network model were used by computerprogrammers rather than real users. What was needed was asimple model which could be used by real end users to solvereal problems.

Relational ModelThe relational model was formally introduced by Dr. E. F.Codd in 1970 and has evolved since then, through a series ofwritings. The model provides a simple, yet rigorously defined,concept of how users perceive data. Network model solves theproblem of data redundancy by representing relationships interms of sets. A relational database is a collection of two-dimensional tables. The organization of data into relationaltables is known as the logical view of the database. That is, theform in which a relational database presents data to the user andthe programmer. The way the database software physicallystores the data on a computer disk system is called the internalview. The internal view differs from product to product anddoes not concern us here.A relational database allows the definition of data structures,storage and retrieval operations and integrity constraints. Insuch a database the data and relations between them areorganised in tables. A table is a collection of records and eachrecord in a table contains the same fields.

Properties of Relational Tables

• Values Are Atomic• Each Row is Unique• Column Values Are of the Same Kind

• The Sequence of Columns is Insignificant• The Sequence of Rows is Insignificant• Each Column Has a Unique NameCertain fields may be designated as keys, which means thatsearches for specific values of that field will use indexing tospeed them up. Where fields in two different tables take valuesfrom the same set, a join operation can be performed to selectrelated records in the two tables by matching values in thosefields. Often, but not always, the fields will have the same namein both tables. For example, an “orders” table might contain(customer-ID, product-code) pairs and a “products” tablemight contain (product-code, price) pairs so to calculate a givencustomer’s bill you would sum the prices of all productsordered by that customer by joining on the product-code fieldsof the two tables. This can be extended to joining multipletables on multiple fields. Because these relationships are onlyspecified at retreival time, relational databases are classed asdynamic database management system. The RELATIONALdatabase model is based on the Relational Algebra.A basic understanding of the relational model is necessary toeffectively use relational database software such as Oracle,Microsoft SQL Server, or even personal database systems suchas Access or Fox, which are based on the relational model.

Points to Ponder

• Data models are a collection of conceptual tools fordescribing data, data relationships, data semantics and dataconstraints.

• Types of data models are:-1. Hierarchial data model2. Network data model3. Relational Model4. E-R-Model

• The Hierarchical Database Model defines hierarchically-arranged data.

• Network model solves the problem of data redundancy byrepresenting relationships in terms of sets.

• Network model solves the problem of data redundancy byrepresenting relationships in terms of sets.

Review Terms

• Data models• Hierchical data model• Network data model• Relational data model

19

DATABASE MANAGEMENT

Students Activity

1. Define data models?

2. Define hierarchichal data model?

3. Define relational data model?

4. Define relational data model?

20

DATABASE MANAGEMENT

Student Notes

21

DATABASE MANAGEMENT

LESSON 7:

RELATIONAL DATABASE MANAGEMENT SYSTEM I

Lesson Objective

• Understanding RDBMS• Understanding data structures• Understanding data manipulation• Understanding various relational algebra operation• Understanding data integrityThe relational model was proposed by E. F. Codd in 1970. Itdeals with database management from an abstract point ofview. The model provides specifications of an abstract databasemanagement system.To use the database management systemsbased on the relational model however, users do not need tomaster the theoretical foundations. Codd defined the model asconsisting of the following three components:1. Data Structure - a collection of data structure types for

building the database.2. Data Manipulation - a collection of operators that may be

used to retrieve, derive or modify data stored in the datastructures.

3. Data Integrity - a collection of rules that implicitly orexplicitly define a consistent database state or changes ofstates

Data StructureOften the information that an organisation wishes to store in acomputer and process is complex and unstructured. Forexample, we may know that a department in a university has200 students, most are full-time with an average age of 22 years,and most are females. Since natural language is not a goodlanguage for machine processing, the information must bestructured for efficient processing. In the relational model theinformation is structures in a very simple wayWe consider the following database to illustrate the basicconcepts of the relational data model.

The above database could be mapped into the followingrelational schema which consists of three relation schemes. Eachrelation scheme presents the structure of a relation by specifyingits name and the names of its attributes enclosed in parenthe-sis. Often the primary key of a relation is marked byunderlining.student(student_id, student_name, address)enrolment(student_id, subject_id)subject(subject_id, subject_name, department)An example of a database based on the above relational modelis:

The relation student

The relation enrolment

We list a number of properties of relations:1. Each relation contains only one record type.2. Each relation has a fixed numer of columns that are

explicitly named. Each attribute name within a relation isunique.

3. No two rows in a relation are the same.4. Each item or element in the relation is atomic, that is, in

each row, every attribute has only one value that cannot bedecomposed and therefore no repeating groups are allowed.

5. Rows have no ordering associated with them.6. Columns have no ordering associated with them (although

most commercially available systems do).The above properties are simple and based on practical consider-ations. The first property ensures that only one type ofinformation is stored in each relation. The second propertyinvolves naming each column uniquely. This has severalbenefits. The names can be chosen to convey what each columnis and the names enable one to distinguish between the columnand its domain. Furthermore, the names are much easier toremember than the position of the position of each column ifthe number of columns is large.The third property of not having duplicate rows appearsobvious but is not always accepted by all users and designers ofDBMS. The property is essential since no sensible context free

student_id student_name address

8656789 8700074 8900020 8801234 8654321 8712374 8612345

Peta Williams John Smith Arun Kumar Peter Chew Reena Rani Kathy Garcia Chris Watanabe

9, Davis Hall 9, Davis Hall 90, Second Hall 88, Long Hall 88, Long Hall 88, Long Hall 11, Main Street

student_id subject_id 8700074 8900020 8900020 8700074 8801234 8801234

CP302 CP302 CP304 MA111 CP302 CH001

subject_id subject_name department CP302 CP304 CH001 PH101 MA111

Database Management Software Engineering Introduction to Chemistry Physics Pure Mathematics

Comp. Science Comp. Science Chemistry Physics Mathematics

22

DATABASE MANAGEMENT

meaning can be assigned to a number of rows that are exactlythe same.The next property requires that each element in each relation beatomic that cannot be decomposed into smaller pieces. In therelation model, the only composite or compound type (datathat can be decomposed into smaller pieces) is a relation. Thissimplicity of structure leads to relatively simple query andmanipulative languages.The relation is a set of tuples and is closely related to theconcept of relation in mathematics. Each row in a relation maybe viewed as an assertion. For exaple, the relation studentasserts that a student by the name of Reena Rani hasstudent_id 8654321 and lives at 88, Long Hall. Similarly therelation subject asserts that one of the subjects offered by theDepartment of Computer Science is CP302 Database Manage-ment.In the relational model, a relation is the only compound datastructure since relation do not allow repeating groups orpointers.We now define the relational terminology:Relation - essentially a tableTuple - a row in the relationAttribute - a column in the relationDegree of a relation - number of attributes in the relationCardinality of a relation - number of tuples in the relation Domain - a set of values that an attribute is permitted to take.Same domain may be used by a number of different attributes.Primary key - as discussed in the last chapter, each relation musthave an attribute (or a set of attributes) that uniquely identifieseach tuple.Each such attribute (or a set of attributes) is called a candidatekey of the relation if it satisfies the following properties:• (a) the attribute or the set of attributes uniquely identifies

each tuple in the relation (called uniqueness), and• (b) if the key is a set of attributes then no subset of these

attributes has property (a) (called minimality).There may be several distinct set of attributes that may serve ascandidate keys. One of the candidate keys is arbitrarily chosen asthe primary key of the relation.The three relations above student, enrolment and subject havedegree 3, 2 and 3 respectively and cardinality 4, 6 and 5 respec-tively. The primary key of the the relation student is student_id,of relation enrolment is (student_id, subject_id), and finally theprimary key of relation subject is subject_id. The relationstudent probably has another candidate key. If we can assumethe names to be unique than the student_name is a candidatekey. If the names are not unique but the names and addresstogether are unique, then the two attributes (student_id,address) is a candidate key. Note that both student_id and(student_id, address) cannot be candidate keys, only one can.Similarly, for the relation subject, subject_name would be acandidate key if the subject names are unique.The relational model is the most popular data model forcommercial data processing applications.It is very much simpledue to which the programmer’s work is reduced.

Data ManipulationThe manipulative part of relational model makes set processing(or relational processing) facilities available to the user. Sincerelational operators are able to manipulate relations, the userdoes not need to use loops in the application programs.Avoiding loops can result in significant increase in the produc-tivity of application programmers.The primary purpose of a database in an enterprise is to be ableto provide information to the various users in the enterprise.The process of querying a relational database is in essence a wayof manipulating the relations that are the database. Forexample, one may wish to know1. names of all students enrolled in CP302, or2. names of all subjects taken by John Smith.

The Relational AlgebraThe relational algebra is a procedural query language. It consistsof a set of operations that take one or two relations as inputand produce a new relation as their result. The fundamentaloperations in the relational algebra are select, project, union, setdifference, Cartesian product, and rename. In addition to thefundamental operations, there are several other operations-namely, set intersection, natural join, division, and assignment.We will define these operations in terms of the fundamentaloperations.

Fundamental OperationsThe select, project, and rename operations are called unaryoperations, because they operate on one relation. The otherthree operations operate on pairs of relations and are, therefore,called binary operations.Various operations are shown as follows:

The Select Operation

The select operation selects tuples that satisfy a given condition..The argument relation is in parentheses after the σ. Thus, toselect those tuples of the loan relation where the branch is“Perryridge,” we write

σ branch-name = “Perryridge” (loan)If the loan relation is as shown , then the relation that resultsfrom the preceding query will be a different relation.We can find all tuples in which the amount lent is more than$1200 writing

σ amount>1200 (loan)

Operation Symbol

Projection

Selection

Renaming

Union

Intersection

Assignment

Operation Symbol

Cartesian product

Join

Left outer join

Right outer join

Full outer join

Semijoin

23

DATABASE MANAGEMENT

In general, we allow comparisons using =, ≠, <, ≤, >, ≥ in theselection predicate. Furthermore, we can combine severalpredicates into a larger predicate by using the connectives and(Λ), or (V), and not (¬). Thus, to find those tuples pertainingto loans of more than $1200 made by the Perryridge branch, wewrite

σ branch-name = “Perryridge” L amount>1200 (loan)

The selection predicate may include comparisons between twoattributes. To Illustrate, consider the relation loan-officer thatconsists of three attributes: customer-name, banker-name, andloan-number, which specifies that a particular banker is the loanofficer for a loan that belongs to some customer. To find allcustomers who have the same name as their loan officer, we canwrite

σ customer-name = banker-name (loan-officer)

ProjectionProjection is the operation of selecting certain attributes from arelation R to form a new relation S. For example, one may onlybe interested in the list of names from a relation that has anumber of other attributes. Projection operator may then beused. Like selection, projection is an unary operator.

II loan-number, amount (loan)

Composition of Relational OperationsThe fact that the result of a relational operation is itself arelation is important. Consider the more complicated query“Find those customers who live in Harrison.” We write:

II customer-name (σ customer-city = “Harrison” (customer))Notice that, instead of giving the name of a relation as theargument of the projection operation, we give an expressionthat evaluates to a relation.In general, since the result of a relational-algebra operation is ofthe same type (relation) as its inputs, relational-algebra opera-tions can be composed together into a

Loan number and the amount of the loan.Relational-algebra expression. Composing relational-algebraoperations into relational-algebra expressions in just likecomposing arithmetic operations (such as +, -, *, and %) intoarithmetic expressions.

Cartesian ProductThe cartesian product of two tables combines each row in onetable with each row in the other table.

Perryridge 1500

loan-number amount L-11 L-14 L-15 L-16 L-17 L-23 L-93

900 1500 1500 1300 1000 2000 500

Example: The table E (for Employee)

Example: The table D (for Department)

• Seldom useful in practice.• Can give a huge result.

The Union OperationConsider a query to find the names of all bank customers whohave either an account or a loan or both. Note that the customerrelation does not contain the information, since a customerdoes not need to have either an account or a loan at the bank.To answer this query, we need the information in the depositorrelation and in the borrower relation . We know how to findthe names of all customers with a loan in the bank:

II customer-name (borrower)We also know how to find the names of all customers with anaccount in the bank:

II customer-name (depositor)To answer the query, we need the union of these two sets; thatis, we need all customer names that appear in either or both ofthe two relations. We find these data by the binary operationunion, denoted, as in set theory, by U. So the expression neededisII customer-name (borrower) U II customer-name (depositor)

There are 10 tuples in the result, even though there are sevendistinct borrowers and six depositors. This apparent discrepancyoccurs because Smith, Jones, and Hayes are borrowers as well asdepositors. Since relations are sets, duplicate values are elimi-nated.

enr ename dept 1 Bill A 2 Sarah C 3 John A

customer-name Adams Curry Hayes Jackson Jones Smith Williams Lindsay Johnson Turner

dnr dname A Marketing B Sales C Legal

Result Relational algebra enr ename dept dnr dname E X D 1 Bill A A Marketing 1 Bill A B Sales 1 Bill A C Legal 2 Sarah C A Marketing 2 Sarah C B Sales 2 Sarah C C Legal 3 John A A Marketing 3 John A B Sales 3 John A C Legal

24

DATABASE MANAGEMENT

Names of all customers who have either a loan or an account.For a union operation r U s to be valid, we require that twoconditions hold:1. The relations r and s must have the same number of

attributes.2. The domains of the ith attribute of r and the ith attribute

of s must be the same, for all i.Note that r and s can be, in general, temporary relations that arethe result of relational-algebra expressions.

The Set-Intersection OperationThe first additional-relational algebra operation that we shalldefine is set intersection ( )I . Suppose that we wish to find allcustomers who have both a loan and an account. Using setintersection, we can write

IIcustomer-name (borrower) IIcustomer-name (deposi-tor)Note that we can rewrite any relational algebra expression thatuses set intersection by replacing the intersection operation witha pair of set-difference operations as:

Thus, set intersection is not a fundamental operation and doesnot add any power to the relational algebra. It is simply moreconvenient to write sr I than to write r – (r – s).

The Set Difference OperationThe set-difference operation, denoted by -, allows us to findtuples that are in one relation but are not in another. Theexpression r – s produces a relation containing those tuples in rbut not in s.We can find all customers of the bank who have an account butnot a loan by writingII customer-name (depositor) – II customer-name (borrower)

As with the union operation, we must ensure that set differ-ences are taken between compatible relations. Therefore, for aset difference operation r – s to be valid,we require that the relations r and s be of the same arity, andthat the domains of the ith attribute of r and the ith attributeof s be the same.

The Assignment OperationIt is convenient at times to write a relational-algebra expressionby assigning parts of it to temporary relation variables. Theassignment operation, denoted by ←, works like assignment ina programming language.

temp1 amount>1200 (loan)temp2 II loan-number, amount (loan)

result = temp1 – temp2The evaluation of an assignment does not result in any relationbeing displayed to the user. Rather, the result of the expressionto the right of the is assigned to the relation variable on theleft of the . This relation variable may be used in subsequentexpressions.With the assignment operation, a query can be written as asequential program consisting of a series of assignmentfollowed by an expression whose value is displayed as the result

of the query. For relational-algebra queries, assignment mustalways be made to a temporary relation variable.Note that theassignment operation does not provide any additional power tothe algebra. It is, however, a convenient way to express complexqueries.

Points to Ponder

• The relational model was proposed by E. F. Codd in1970.

• provides specifications of an abstract databasemanagement system

• It consists of the following three components:1. Data Structure – a collection of data structure types for

building the database.2. Data Manipulation – a collection of operators that may be

used to retrieve, derive or modify data stored in the datastructures.

3. Data Integrity – a collection of rules that implicitly orexplicitly define a consistent database state or changes of.• Relational algebra describes a set of algebraic

operations that operates on tables, & output a table asa result.

Review Terms

• Table/Relation• Tuple• Domain• Database schema• Database instance• Keys

Primary keyForeign key

• Relational algebra

Student Activity

1. Why do we use RDBMS?

2. Define relation,tuple,domain,keys?

25

DATABASE MANAGEMENT

3. What is the difference between Intersection, Union &Cartesian product?

26

DATABASE MANAGEMENT

Student Notes

27

DATABASE MANAGEMENT

LESSON 8:

RELATIONAL DATABASE MANAGEMENT SYSTEM II

Lesson Objectives

• Elaborating various other features of Relational algebra• Understanding aggregate function• Understanding joinsUnderstanding natural, outer, inner joins?

Aggregate Functions

Aggregate functions take a collection of values and return asingle value as a result. For example, the aggregate function sumtakes a collection of values and returns the sum of the values.Thus, the function sum applied on the collection

{1,1,3,4,4,11}returns the value 24. The aggregate function avg returns theaverage of the values. When applied to the preceding collection,it returns the value 4. The aggregate function count returns thenumber of the elements in the collection, and returns 6 on thepreceding collection. Other common aggregate function includemin and max, which return the minimum and maximumvalues in a collection; they return 1 and 11, respectively, on thepreceding collection.The collections on which aggregate functions operate can havemultiple occurrences of a value; the order in which the valuesappear is not relevant. Such collections are called multisets. Setsare a special case of multisets where there is only one copy ofeach element.To illustrate the concept of aggregation, we take the followingexample

G sum(salary) (pt-works)The symbol G is the letter G in calligraphic fount; read it as“calligraphic G.” The relational-algebra operation G signifiesthat aggregation is to be applied, and its subscript specifies theaggregate operation to be applied. The result of the expressionabove is a relation with a single attribute, containing a singlerow with a numerical value corresponding to the sum of all theseries of all employees working part-time in the bank.There are cases where we must eliminate multiple occurrences ofa value before computing an aggregate function. If we do wantto eliminate duplicates, we use the same function names asbefore, with the addition of the hyphenated string “distinct”appended to the end of the function name (for example, count-

Adams Brown Gopal Johnson Loreena Peterson Rao Sato

Perryridge Perryridge Perryridge Downtown Downtown Downtown Austin Austin

1500 1300 5300 1500 1300 2500 1500 1600

distinct). An example arises in the query “Find the number ofbranches appearing in the pt-works relation.” In this case, abranch name count only once, regardless of the number ofemployees working that branch. We write this query as follows:Gcount-distinct(branch-name) (pt-works)In the above figure, the result of this query is a single rowcontaining the value 3.Suppose we want to find the total salary sum of all part-timeemployees at each branch of the bank separately, rather than thesun for the entire bank. To do so, we need to partition therelation pt-works into group based on the branch, and to applythe aggregate function on each group.The following expression using the aggregation operator Gachieves the desired result:

branch-nameG sum(salary) (pt-works)

In the expression, the attribute branch-name in the left-handsubscript of G indicates that the input relation pt-works mustbe divided into groups based on the value of branch-name.Following Figure shows the resulting groups.

The expression sum(salary) in the right-hand subscript of Gindicates that for each group of tuples (that is, each branch), theaggregation function sum must be applied on the collection ofvalues of the salary attribute. The output relation consists oftuples with the branch name, and the sum of the salaries forthe branch, as shown in Figure

The general from of the aggregation operation G is as follows:

G1,G2,…,Gn G F1 (A1), F2(A2),…,Fm(Am)(E)

Where E is any relational-algebra expression; G1, G2,…, Gnconstitute a list of attributes on which to group; each Fi is anaggregate function; and each Ai is an attribute

JoinThe Natural-Join OpeationIf is often desirable to simplify certain queries that require aCartesian product. Usually, a query that involves a Cartesian

employee-name branch-name salary Rao Sato

Austin Austin

1500 1600

Johnson Loreena Peterson

Downtown Downtown Downtown

1500 1300 2500

Adams Brown Gopal

Perryridge Perryridge Perryridge

1500 1300 5300

branch-name sum of salary Austin Downtown Perryridge

3100 5300 8100

28

DATABASE MANAGEMENT

product includes a selection operation on the result of theCartesian product. Consider the query “Find the names of allcustomers who have a loan at the bank, along with the loannumber and the loan amount.” We first form the Cartesianproduct of the borrower and loan relations. Then, we selectthose tuples that pertain to only the same loan-number,followed by the projection of the resulting customer-name,loan-number, and amount:IIcustomer-name, loan.loan-number, amount(σ borrower.loan-number = loan.loan-number (borrower xloan))The natural join is a binary operation that allows us to combinecertain selections and a Cartesian product into one operation. Itis denoted by the “join” symbol ( ). The natural-join opera-tion forms a Cartesian product of its two arguments, performsa selection forcing equality on those attributes that appear inboth relation schemas, and finally removes duplicate attributes.

Outer JoinThe outer-join operation is an extension of the join operationto deal with missing information. Suppose that we have therelations with the following schemas, which contain data onfull-time employees:

employee (employee-name, street, city)ft-works (employee-name, branch-name, salary)

Suppose that we want to generate a single relation with all theinformation (street, city, branch name, and salary) about full-time employees. A possible approach would be to use thenatural join operation as follows:

employee ft-worksWe can use the outer-join operation to avoid this loss ofinformation. There are actually three forms of the operation: letouter join, denoted ; right outer join, denoted ; andfull outer join, denoted . All three forms of outer joincompute the join, and add extra tuples to the result of the join.

The result of employee ft-works.

Result of employee ft-works.The left outer join ( ) takes all tuples in the left relation thatdid not match with any tuple in the right relation, pads thetuple with null values for all other attributes from the rightrelation, and adds them to the result of the natural join. Thetuple (Smith, Revolver, Death Valley, null, null) is such a tuple.All information from the left relation is present in the result ofthe left outer join.The right outer join ( ) is symmetric with the left outer join:It pads tuples from the right relation that did not match anyfrom the left relation with nulls and adds them to the result of

the natural join. The tuple (Gates, null, null, Redmond, 5300) issuch a tuple. Thus, all information from the right relation ispresent in the result of the right outer join.The full outer join ( ) does both of those operations,padding tuples from the left relation that did not match anyfrom the right relation, as well as tuples from the right relationthat did not match any from the left relation, and adding themto the result of the join.

Result of employee ft-works.

Result of employee ft-works.

Renaming Tables and ColumnsExample: The table E (for Employee)

Example: The table D (for Department)

nr name A Marketing B Sales C Legal

We want to join these tables, but:• Several columns in the result will have the same name (nr

and name).• How do we express the join condition, when there are two

columns called nr?

Solutions

• Rename the attributes, using the rename operator.• Keep the names, and prefix them with the table name, as is

done in SQL. (This is somewhat unorthodox.)

You can use another variant of the renaming operator to changethe name of a table, for example to change the name of E to R.This is necessary when joining a table with itself (see below)..p R(E)A third variant lets you rename both the table and the columns:

.p R(enr, ename, dept)(E)

Result Relational algebra enr ename dept dnr dname 1 Bill A A Marketing 2 Sarah C C Legal 3 John A A Marketing

(p (enr, ename, dept)(E)) ? dept = dnr (p (dnr,

dname)(D))

29

DATABASE MANAGEMENT

Points to Ponder

• Aggregate functions take a collection of values and return asingle value as a result.

• Usually, a query that involves a Cartesian product includes aselection operation on the result of the Cartesian product.

Review Terms

• Aggregate functions• Joins• Natural join• Outer join• Right outer join• Left outer join

Student Notes

1. Define aggregate functions with example?

2. Define joins?What is natural join?

3. Differentiate between inner join & outer join?

4. Differentiate between left outer join & right outer join withthe help of example?

5. Define rename operators?

30

DATABASE MANAGEMENT

Student Notes

31

DATABASE MANAGEMENT

LESSON 9:

E-R MODEL - I

Lesson Objective

• Understanding entity• Understanding relationship• Understanding attribute, domain, entity set• Understanding Simple & composit Attributes• Understanding Derived Attribute• Understanding relationship set• Components of E-R-Diagrams• Designing E-R-diagramsER considers the real world to consist of entities and relation-ships among them. An Entity is a ‘thing’ which can be distinctlyidentified, for example a person, a car, a subroutine, a wire, anevent.A Relationship is an association among entities, eg personOwns caris an association between a person and a car person EATS dishIN place is an association among a person, a dish and a place.

Attribute, Value, Domain, Entity SetThe information about one entity is expressed by a set of(attribute,value) pairs, eg a car model could be:Name = R1222Power = 7.3Nseats = 5Values of attributes belong to different value-sets or domains,for example, for a car,Nseats is an integer between 1 and 12Entities defined by the same set of attributes can be groupedinto an Entity Set (abbreviated as ESet) as shown in

An Entity SetA given set of attributes may be referred to as an entity type. Allentities in a given ESet are of the same type, but sometimesthere can be more that one set of the same type. The set of allpersons who are customers at a given bank can be defined as anentity set customer. The individual entity that constitutes a setare said to be extension to entity set. So all the individual bankcustomers are the extension of entity set customer.

|-------------------------------|| ESET : CarModel ||-------------------------------|| Name | Power | Nseats ||----------|--------------------|| R1222 | 7.3 | 5 || HZ893 | 6.8 | 5 || R1293 | 5.4 | 4 ||-------------------------------|

Each entity has a value for each of its attributes.For eachattribute ,there is a set of permitted values called domain orvalue set.

Simple and Composit AttributesA simple attribute has got one value for its attribute & acomosit attribute is one which can be divided into sub-parts.For example an attribute name can be divided into first namemiddle name & last name.

Single and Multivalued AttributesAn attribute which have got only one value is known as singlevalued attribute.For ex. the loan_no attribute will have only oneloan_no. There may be cases when an attribute has a set ofvalues for a specific entity.For ex. an attribute phone_no. mayhave a value zero,one or several phone_no. This is known asmultivalued attribute.

Derived AttributeIts value is derived from value of other related attributes orentities.For ex. an attribute age can be calculated from anotherattribute date_of_birth.

Relationship, Relationship SetA relationship is an association among several entities.For ex. acustomer A is associated with loan_no L1. A relationship set isa subset of the cartesian product of entity sets. For example arelationship set (abbreviated as RSet) on the relationship‘Person HAS_EATEN Dish IN Place’ could be as shown in

Notice that an RSet is an ESet having ESets as attributes.

Components of ER-DThe overall logical structure of a database can be expressedgraphically by an E-R diagram.The various components of anE-R diagram are as folows:-1. Rectapgles, which represent entity sets.2. _ Ellipses, which represent attributes.3. Diamonds, which represents relationship sets4. Lines, which link attributes to entity sets and entity sets to

relationship sets.5. Double ellipses, which represent multi-valued attributes.6. Dashed ellipse which represents derived attributes7. Double lines,which indicate total participation of an entity

in a relationship set.

|

---------------------------------------

|

| RSet 'Person HAS_EATEN Dish IN Place' | |---------------------------------------| | Person | Dish | Place | |---------|--------|--------------------| | Steve | Duck | Kuala Lumpur | | Weiren | Duck | Beijing | | Paolo | Noodles| Naples | | Mike | Fondue | Geneva | | Paolo | Duck | Beijing | |---------------------------------------|

32

DATABASE MANAGEMENT

ENTITY RELATIONSHIP DIAGRAM NOTATIONS

Peter Chen developed ERDs in 1976. Since then Charles Bachman and James Martin have added some sligh refinements to the basic ERD principles.

Entity An entity is an object or concept about which you want to store information.

Weak Entity A weak entity is dependent on another entity to exist..

Attributes Attributes are the properties or characteristics of an entity.

Key attribute A key attribute is the unique, distinguishing characteristic of the entity. For example, an employee's social security number might be the employee's key attribute.

Multivalued attribute A multivalued attribute can have more than one value. For example, an employee entity can have multiple skill values.

Derived attribute A derived attribute is based on another attribute. For example, an employee's monthly salary is based on the employee's annual salary.

Relationships Relationships illustrate how two entities share information in the database structure.

33

DATABASE MANAGEMENT

Mapping CardinalitiesIt express the number of entities to which other entity can beassociated via a relationship set. It can be of the followingtypes:-1. One to one: An entity in A is associated with at most one

entity in B, & an entity in B is associated with at most oneentity in A.

2. One to many: An entity in A is associated with anynumber(zero or more) of entities in B.An entity in Bhowever can be associated with at most one entity in A.

3. Many to one: An entity in A is associated with at most oneentity in B.An entity in B however can be associated withany number(zero or more) of entities in A.

4. Many to many: An entity in A is associated with anynumber(zero or more) of entities in B & an entity in B isassociated with any number(zero or more) of entities in A.

Weak relationship To connect a weak entity with others, you should use a weak relationship notation.

Cardinality Cardinality specifies how many instances of an entity relate to one instance of another entity. Ordinality is also closely linked to cardinality. While cardinality specifies the occurences of a relationship, ordinality describes the relationship as either mandatory or optional. In other words, cardinality specifies the maximum number of relationships and ordinality specifies the absolute minimum number of relationships.

Recursive relationship In some cases, entities can be self-linked. For example, employees can supervise other employees.

34

DATABASE MANAGEMENT

One to many relationship

Many to one relationship

One to one relationship

Points to Ponder

• An Entity is a ‘thing’ which can be distinctly identified, forexample a person, a car, a subroutine, a wire, an event.

• A Relationship is an association among entities, eg personOwns car

• A given set of attributes may be referred to as an entity type• A simple attribute has got one value for its attribute & a

comosit attribute is one which can be divided into sub-parts.

• value is derived from value of other related attributes orentities is known as derived attribute.

• A relationship is an association among several entities• A relationship set is a subset of the cartesian product of

entity sets

Review Terms

• Entity• Entity set• Attribute• Domain• Value• Relationship• Relationship set• Cardinality• Association

Student Notes

1. Define entity, domain,value?

2. Define relationship, relationship set?

3. Differentiate between simple & composit attribute?

4. Define derived attribute?

5. Differentiate between single & multi-valued attribute?

6. Define cardinality?Explain various kinds of cardinality?

7. Define various components of E-R-Diagram?

35

DATABASE MANAGEMENT

Student Notes

36

DATABASE MANAGEMENT

LESSON 10:

E-R MODEL - II

Lesson Objective

• Designing E-R-Diagram• Understanding keys• Super key,primary key, composit keyEntity relationship diagram methodology

Developing Entity Relationship Diagrams(ERDs)

WhyEntity Relationship Diagrams are a major data modelling tooland will help organize the data in your project into entities anddefine the relationships between the entities. This process hasproved to enable the analyst to produce a good databasestructure so that the data can be stored and retrieved in a mostefficient manner.

Information

EntityA data entity is anything real or abstract about which we want tostore data. Entity types fall into five classes: roles, events,locations, tangible things or concepts. E.g. employee, payment,campus, book. Specific examples of an entity are called in-stances. E.g. the employee John Jones, Mary Smith’s payment,etc.

RelationshipA data relationship is a natural association that exists betweenone or more entities. E.g. Employees process payments.Cardinality defines the number of occurrences of one entityfor a single occurrence of the related entity. E.g. an employeemay process many payments but might not process anypayments depending on the nature of her job.

AttributeA data attribute is a characteristic common to all or mostinstances of a particular entity. Synonyms include property, dataelement, field. E.g. Name, address, Employee Number, pay rateare all attributes of the entity employee. An attribute orcombination of attributes that uniquely identifies one and onlyone instance of an entity is called a primary key or identifier. E.g.Employee Number is a primary key for Employee.

KeysDifferences between entities must be expressed in terms ofattributes.• A superkey is a set of one or more attributes which, taken

collectively, allow us to identify uniquely an entity in theentity set.

• For example, in the entity set customer, customer-name andS.I.N. is a superkey.

• Note that customer-name alone is not, as two customerscould have the same name.

• A superkey may contain extraneous attributes, and we areoften interested in the smallest superkey. A superkey forwhich no subset is a superkey is called a candidate key.

• In the example above, S.I.N. is a candidate key, as it isminimal, and uniquely identifies a customer entity.

• A primary key is a candidate key (there may be more thanone) chosen by the DB designer to identify entities in anentity set.

An entity set that does not possess sufficient attributes to forma primary key is called a weak entity set. One that does have aprimary key is called a strong entity set.

For example,

• The entity set transaction has attributes transaction-number,date and amount.

• Different transactions on different accounts could share thesame number.

• These are not sufficient to form a primary key (uniquelyidentify a transaction).

• Thus transaction is a weak entity set.For a weak entity set to be meaningful, it must be part of a one-to-many relationship set. This relationship set should have nodescriptive attributes. (Why?)The idea of strong and weak entity sets is related to theexistence dependencies seen earlier.• Member of a strong entity set is a dominant entity.• Member of a weak entity set is a subordinate entity.A weak entity set does not have a primary key, but we need ameans of distinguishing among the entities.The discriminator of a weak entity set is a set of attributes thatallows this distinction to be made.The primary key of a weak entity set is formed by taking theprimary key of the strong entity set on which its existencedepends (see Mapping Constraints) plus its discriminator.

To illustrate

• transaction is a weak entity. It is existence-dependent onaccount.

• The primary key of account is account-number.• transaction-number distinguishes transaction entities within

the same account (and is thus the discriminator).• So the primary key for transaction would be (account-

number, transaction-number).Just Remember: The primary key of a weak entity is found bytaking the primary key of the strong entity on which it isexistence-dependent, plus the discriminator of the weak entityset.

37

DATABASE MANAGEMENT

An Entity Relationship DiagramMethodology

A Simple ExampleA company has several departments. Each department has asupervisor and at least one employee. Employees must beassigned to at least one, but possibly more departments. Atleast one employee is assigned to a project, but an employeemay be on vacation and not assigned to any projects. Theimportant data fields are the names of the departments,projects, supervisors and employees, as well as the supervisorand employee number and a unique project number.

1. Identify EntitiesThe entities in this system are Department, Employee, Supervi-sor and Project. One is tempted to make Company an entity,but it is a false entity because it has only one instance in thisproblem. True entities must have more than one instance.

2. Find RelationshipsWe construct the following Entity Relationship Matrix:

3. Draw Rough ERDWe connect the entities whenever a relationship is shown in theentity Relationship Matrix.

Tips for Effective ER Diagrams

1. Make sure that each entity only appears once per diagram.2. Name every entity, relationship, and attribute on your

diagram.

1. Identify Entities Identify the roles, events, locations, tangible things or concepts about which the end-users want to store data.

2. Find Relationships Find the natural associations between pairs of entities using a relationship matrix.

3. Draw Rough ERD Put entities in rectangles and relationships on line segments connecting the entities.

4. Fill in Cardinality Determine the number of occurrences of one entity for a single occurrence of the related entity.

5. Define Primary Keys Identify the data attribute(s) that uniquely identify one and only one occurrence of each entity.

6. Draw Key-Based ERD Eliminate Many-to-Many relationships and include primary and foreign keys in each entity.

7. Identify Attributes Name the information details (fields) which are essential to the system under development.

8. Map Attributes For each attribute, match it with exactly one entity that it describes.

9. Draw fully attributed ERD Adjust the ERD from step 6 to account for entities or relationships discovered in step 8.

10. Check Results Does the final Entity Relationship Diagram accurately depict the system data?

Department Employee Supervisor Project Department is assigned run by Employee belongs to works on Supervisor Runs Project uses

3. Examine relationships between entities closely. Are theynecessary? Are there any relationships missing? Eliminateany redundant relationships. Don’t connect relationships toeach other.

4. Use colors to highlight important portions of yourdiagram.

E-R diagram with an attribute attached to a relationship set.

If a relationship set has also some attributes associated with it,then we link these ttributes to that relationship set. Forexample, we have the accessdrate descriptive attribute attachedto the relationship set depositor to specify the most ecent dateon which a customer accessed that account

38

DATABASE MANAGEMENT

E-R diagram with composite, multivalued, and derivedattributes

It shows how composite attributes can be represented in the E-R notation. Here, a composite attribute name, with componentattributes first-name, middle-initial, md last-name replaces thesimple attribute customer-name of customer. Also, a compos-te attribute address, whose component attributes are street, city,state, and zip-code) replaces the attributes customer-street andcustomer-city of customer. The attribute street is tself acomposite attribute whose component attributes are street-number, street-name, md apartment number.It also illustrates a multivalued attribute phone-number,depicted by a louble ellipse, and a derived attribute age, depictedby a dashed ellipse.

Points to Ponder

• ER considers the real world to consist of entities andrelationships among them.

• An Entity is a ‘thing’ which can be distinctly identified• A Relationship is an association among entities• The information about one entity is expressed by a set of

(attribute,value)• A given set of attributes may be referred to as an entity type• Attributes are of various types:-

Single & Multi-valuedSimple & CompositDerived

• E-R-diagrams are expressed through various symbols• Cardinality expresses the number of entities to which other

entity can be associated via a relationship set

Review Terms

• Entity.• Relationship.• Attribute.• dOMAIN• E-R-D Symbols.• Various kinds of keys.• Types of Attributes• Types of Cardinality

Student Activity

1. Explain the difference between primary key, candidate key &super key?

2. Why is redundancy a bad practice ?

3. Construct an E-R-diagram for a car insurance companywhose customer owns one or more cars each.Each car isassociated with it zero to any number of recorded student?

4. Design an E-R diagram for keeping tracks of the exploitsof your favourite sports team. You should store thematches played, scores in each match, players in each match& individual player statistics for each match. Summarystatistics should be modelled as derived attribute?

39

DATABASE MANAGEMENT

Student Notes

40

DATABASE MANAGEMENT

LESSON 11:

STRUCTURED QUERY LANGUAGE(SQL)I

Lesson Objective

• Understanding SQL• Understanding DDL, DML• Creating tables• Selecting data• Constraints• Drop• Insert• Select with conditionsSQL (pronounced “ess-que-el”) stands for Structured QueryLanguage. SQL is used to communicate with a database.According to ANSI (American National Standards Institute), itis the standard language for relational database managementsystems. SQL statements are used to perform tasks such asupdate data on a database, or retrieve data from a database.Some common relational database management systems thatuse SQL are: Oracle, Sybase, Microsoft SQL Server, Access,Ingres, etc. Although most database systems use SQL, most ofthem also have their own additional proprietary extensions thatare usually only used on their system. However, the standardSQL commands such as “Select”, “Insert”, “Update”, “Delete”,“Create”, and “Drop” can be used to accomplish almosteverything that one needs to do with a database.The SQL language has several parts:• Data-definition language (DDL). The SQL DDL

provides commands for defin-ing relation schemas, deletingrelations, and modifying relation schemas.

• Interactive data-manipulation language (DML). TheSQL DML includes a query language based on both therelational algebra and the tuple relational calculus. It includesalso commands to insert tuples into, delete tuples from, andmodify tuples in the database.

• View definition. The SQL DOL includes commands fordefining views.

• Transaction control. SQL includes commands forspecifying the beginning and ending of transactions.

• Embedded SQL and dynamic SQL. Embedded anddynamic SQL define how SQL statements can be embeddedwithin general-purpose programming lan-guages, such as C,C++, Java, PUr, Cobol, Pascal, and Fortran.

• Integrity. The SQL DDL includes commands for specifyingintegrity constraints that the data stored in the databasemust satisfy. Updates that violate integrity constraints aredisallowed.

• Authorization. The SQL DDL includes commands forspecifying access rights to relations and views.

A relational database system contains one or more objects calledtables. The data or information for the database are stored in

these tables. Tables are uniquely identified by their names andare comprised of columns and rows. Columns contain thecolumn name, data type, and any other attributes for thecolumn. Rows contain the records or data for the columns.Here is a sample table called “weather”.city, state, high, and low are the columns. The rows contain thedata for this table:

Data-Definition LanguageThe set of relations in a database must be specified to thesystem by means of a data definition language (DDL).The SQL DDL allows specification of not only a set ofrelations, but also information about each relation, including• The schema for each relation• The domain of values associated with each attribute• The integrity constraints• The set of indices to be maintained for each relation• The security and authorization information for each relation• The physical storage structure of each relation on disk

Creating TablesThe create table statement is used to create a new table. Here isthe format of a simple create table statement:create table “tablename”(“column1” “data type”, “column2” “data type”, “column3” “data type”);Format of create table if you were to use optional constraints:create table “tablename”(“column1” “data type” [constraint],“column2” “data type” [constraint],“column3” “data type” [constraint]);[ ] = optionalNote: You may have as many columns as you’d like, and theconstraints are optional.

Weather

city state high low

Phoenix Arizona 105 90

Tucson Arizona 101 92

Flagstaff Arizona 88 69

San Diego California 77 60

Albuquerque New Mexico 80 72

41

DATABASE MANAGEMENT

Example

create table employee(first varchar(15),last varchar(20),age number(3),address varchar(30),city varchar(20),state varchar(20));To create a new table, enter the keywords create table followedby the table name, followed by an open parenthesis, followedby the first column name, followed by the data type for thatcolumn, followed by any optional constraints, and followed bya closing parenthesis. It is important to make sure you use anopen parenthesis before the beginning table, and a closingparenthesis after the end of the last column definition. Makesure you seperate each column definition with a comma. AllSQL statements should end with a “;”.The table and column names must start with a letter and can befollowed by letters, numbers, or underscores - not to exceed atotal of 30 characters in length. Do not use any SQL reservedkeywords as names for tables or column names (such as“select”, “create”, “insert”, etc).Data types specify what the type of data can be for that particu-lar column. If a column called “Last_Name”, is to be used tohold names, then that particular column should have a“varchar” (variable-length character) data type.Here are the most common Data types:

ConstraintsWhen tables are created, it is common for one or more columnsto have constraints associated with them. A constraint isbasically a rule associated with a column that the data enteredinto that column must follow. For example, a “unique”constraint specifies that no two records can have the same valuein a particular column. They must all be unique. The other twomost popular constraints are “not null” which specifies that acolumn can’t be left blank, and “primary key”. A “primary key”constraint defines a unique identification of each record (or row)in a table.It’s now time for you to design and create your own table. Ifyou decide to change or redesign the table, you can either drop itand recreate it or you can create a completely different one.

Students ActivityYou have just started a new company. It is time to hire someemployees. You will need to create a table that will contain the

char(size) Fixed-length character string. Size is specified in parenthesis. Max 255 bytes.

Varchar(size) Variable-length character string. Max size is specified in parenthesis.

number(size) Number value with a max number of column digits specified in parenthesis.

Date Date value

number(size,d) Number value with a maximum number of digits of "size" total, with a maximum number of "d" digits to the right of the decimal.

number(size,d) Number value with a maximum number of digits of "size" total, with a maximum number of "d" digits to the right of the decimal.

following information about your new employees: firstname,lastname, title, age, and salary.

Drop a TableThe drop table command is used to delete a table and all rowsin the table.To delete an entire table including all of its rows, issue the droptable command followed by the tablename. drop table isdifferent from deleting all of the records in the table. Deletingall of the records in the table leaves the table including columnand constraint information. Dropping the table removes thetable definition as well as all of its rows.drop table “tablename”

Example

drop table myemployees;

Inserting into a TableThe insert statement is used to insert or add a row of data intothe table.To insert records into a table, enter the key words insert intofollowed by the table name, followed by an open parenthesis,followed by a list of column names separated by commas,followed by a closing parenthesis, followed by the keywordvalues, followed by the list of values enclosed in parenthesis.The values that you enter will be held in the rows and they willmatch up with the column names that you specify. Stringsshould be enclosed in single quotes, and numbers should not.insert into “tablename”(first_column,...last_column)values (first_value,...last_value);In the example below, the column name first will match upwith the value ‘Luke’, and the column name state will match upwith the value ‘Georgia’.

Example

insert into employee(first, last, age, address, city, state)values (‘Luke’, ‘Duke’, 45, ‘2130 Boars Nest’,‘Hazard Co’, ‘Georgia’);Note: All strings should be enclosed between single quotes:‘string’

Students Activity

1. Insert data into your new employee table.

42

DATABASE MANAGEMENT

2. Your first three employees are the following:Jonie Weber, Secretary, 28, 19500.00Potsy Weber, Programmer, 32, 45300.00Dirk Smith, Programmer II, 45, 75020.00

3. Enter these employees into your table first, and then insertat least 5 more of your own list of employees in the table

Selecting DataThe select statement is used to query the database and retrieveselected data that match the criteria that you specify. Here is theformat of a simple select statement:select “column1”[,”column2",etc]from “tablename”[where “condition”];[] = optionalThe column names that follow the select keyword determinewhich columns will be returned in the results. You can select asmany column names that you’d like, or you can use a “*” toselect all columns.The table name that follows the keyword from specifies thetable that will be queried to retrieve the desired results.The where clause (optional) specifies which data values or rowswill be returned or displayed, based on the criteria describedafter the keyword where.Conditional selections used in the where clause:= Equal

> Greater than < Less than

>= Greater than or equal <= Less than or equal <> Not equal to

LIKE *See note below

The Like pattern matching operator can also be used in theconditional selection of the where clause. Like is a very powerfuloperator that allows you to select only rows that are “like” whatyou specify. The percent sign “%” can be used as a wild card tomatch any possible character that might appear before or afterthe characters specified. For example:select first, last, cityfrom empinfowhere first LIKE ‘Er%’;This SQL statement will match any first names that start with‘Er’. Strings must be in single quotes.Or you can specify,select first, lastfrom empinfowhere last LIKE ‘%s’;This statement will match any last names that end in a ‘s’.select * from empinfowhere first = ‘Eric’;This will only select rows where the first name equals ‘Eric’exactly.

Some more examplesselect first, last, city from empinfo;select last, city, age from empinfowhere age > 30;select first, last, city, state from empinfowhere first LIKE ‘J%’;select * from empinfo;select first, last, from empinfowhere last LIKE ‘%s’;select first, last, age from empinfowhere last LIKE ‘%illia%’;select * from empinfo where first = ‘Eric’;

Sample Table: empinfo

first Last id age city state

John Jones 99980 45 Payson Arizona

Mary Jones 99982 25 Payson Arizona

Eric Edwards 88232 32 San Diego California

Mary Ann Edwards 88233 32 Phoenix Arizona

Ginger Howell 98002 42 Cottonwood Arizona

Sebastian Smith 92001 23 Gila Bend Arizona

Gus Gray 22322 35 Bagdad Arizona

Mary Ann May 32326 52 Tucson Arizona

Erica Williams 32327 60 Show Low Arizona

Leroy Brown 32380 22 Pinetop Arizona

Elroy Cleaver 32382 22 Globe Arizona

43

DATABASE MANAGEMENT

Student Activity

1. Display the first name and age for everyone that’s in thetable.

2. Display the first name, last name, and city for everyonethat’s not from Payson.

3. Display all columns for everyone that is over 40 years old.4. Display the first and last names for everyone whose last

name ends in an “ay”.

5. Display all columns for everyone whose first name equals“Mary”.

6. Display all columns for everyone whose first name contains“Mary”.

7. Select all columns for everyone whose last name ends in“ith”.

Points to Ponder

• SQL is used to communicate with a database.• The set of relations in a database must be specified to the

system by means of a data definition language (DDL).• The create table statement is used to create a new table• A constraint is basically a rule associated with a column that

the data entered into that column must follow• The drop table command is used to delete a table and all

rows in the table.• The select statement is used to query the database and

retrieve selected data that match the criteria that you specify.• The insert statement is used to insert or add a row of data

into the table.• The Like pattern matching operator can also be used in the

conditional selection of the where clause

Review Terms

• SQL• DDL• DML• Constraints• Insert• Create• Select• Drop• Constraints• Like

44

DATABASE MANAGEMENT

Student Notes

45

DATABASE MANAGEMENT

LESSON 12:

LAB

46

DATABASE MANAGEMENT

LESSON 13:

LAB

47

DATABASE MANAGEMENT

LESSON 14:

SQL-II

Lesson Objective

• Elaborating Select statement• Rename operator• Aggregate function• Having• Order-by• Group-by• IN & BetweenThe SELECT statement is the core of SQL, and it is likely thatthe vast majority of your SQL commands will be SELECTstatements.There are enormous amount of options availablefor the SELECT statement.When constructing SQL Queries(with the SELECT statement), it is very useful to know all ofthe possible options and the best or more efficient way to dothings

The Rename OperationSQL provides a mechanism for renaming both relations andattributes. It uses the as clause, taking the form:old-name as new-name

Exampleselect first as name, last, city as emp_city from empinfo

Select StatementThe SELECT statement is used to query the database andretrieve selected data that match the criteria that you specify.The SELECT statement has five main clauses to choose from,although, FROM is the only required clause. Each of the clauseshave a vast selection of options, parameters, etc. The clauses willbe listed below.Here is the format of the Select statement:Select [All | Distinct] column1[,column2]From table1[,table2][Where “conditions”][Group By “column-list”][Having “conditions][Order By “column-list” [Asc | Desc] ]

Select and From Clause ReviewSelect first_column_name, second_column_nameFrom table_nameWhere first_column_name > 1000;*The column names that follow the SELECT keyworddetermine which columns will be returned in the results. Youcan select as many column names that you’d like, or you can usea * to select all columns. The order they are specified will be theorder that they are returned in your query results.*The table name that follows the keyword FROM specifies thetable that will be queried to retrieve the desired results.

*The WHERE clause (optional) specifies which data values orrows will be returned or displayed, based on the criteriadescribed after the keyword where.

ExampleSELECT name, age, salary FROM employee WHERE age > 50;The above statement will select all of the values in the name,age, and salary columns from the employee table whose age isgreater than 50.

Comparison Operators

*Note about LIKE

Example

Select name, title, deptFrom employeeWhere title Like ‘Pro%’;The above statement will select all of the rows/values in thename, title, and dept columns from the employee table whosetitle starts with ‘Pro’. This may return job titles includingProgrammer or Pro-wrestler.All and Distinct are keywords used to select either All (default)or the “distinct” or unique records in your query results. If youwould like to retrieve just the unique records in specifiedcolumns, you can use the “Distinct” keyword. Distinct willdiscard the duplicate records for the columns you specified afterthe “SELECT” statement: For example:Select Distinct age FROM employee_info;This statement will return all of the unique ages in theemployee_info table.ALL will display “all” of the specified columns including all ofthe duplicates. The ALL keyword is the default if nothing isspecified.Note: The following two tables we will be using

= Equal > Greater than < Less than >= Greater than or equal to <= Less than or equal to <> or != Not equal to LIKE String comparison test

48

DATABASE MANAGEMENT

customerid order_date item quantity Price 10330 30-Jun-1999 Pogo stick 1 28.00 10101 30-Jun-1999 Raft 1 58.00 10298 01-Jul-1999 Skateboard 1 33.00 10101 01-Jul-1999 Life Vest 4 125.00 10299 06-Jul-1999 Parachute 1 1250.00 10339 27-Jul-1999 Umbrella 1 4.50 10449 13-Aug-1999 Unicycle 1 180.79 10439 14-Aug-1999 Ski Poles 2 25.50 10101 18-Aug-1999 Rain Coat 1 18.30 10449 01-Sep-1999 Snow Shoes 1 45.00 10439 18-Sep-1999 Tent 1 88.00 10298 19-Sep-1999 Lantern 2 29.00 10410 28-Oct-1999 Sleeping Bag 1 89.22 10438 01-Nov-1999 Umbrella 1 6.75 10438 02-Nov-1999 Pillow 1 8.50 10298 01-Dec-1999 Helmet 1 22.00 10449 15-Dec-1999 Bicycle 1 380.50 10449 22-Dec-1999 Canoe 1 280.00 10101 30-Dec-1999 Hoola Hoop 3 14.75 10330 01-Jan-2000 Flashlight 4 28.00 10101 02-Jan-2000 Lantern 1 16.00 10299 18-Jan-2000 Inflatable Mattress 1 38.00 10438 18-Jan-2000 Tent 1 79.99 10413 19-Jan-2000 Lawnchair 4 32.00 10410 30-Jan-2000 Unicycle 1 192.50

Items_ordered

Customers

Students Activity

1. From the items_ordered table, select a list of all itemspurchased for customerid 10449. Display the customerid,item, and price for this customer.

customerid firstname lastname city state 10101 John Gray Lynden Washington 10298 Leroy Brown Pinetop Arizona 10299 Elroy Keller Snoqualmie Washington 10315 Lisa Jones Oshkosh Wisconsin 10325 Ginger Schultz Pocatello Idaho 10329 Kelly Mendoza Kailua Hawaii 10330 Shawn Dalton Cannon Beach Oregon 10338 Michael Howell Tillamook Oregon 10339 Anthony Sanchez Winslow Arizona 10408 Elroy Cleaver Globe Arizona 10410 Mary Ann Howell Charleston South Carolina 10413 Donald Davids Gila Bend Arizona 10419 Linda Sakahara Nogales Arizona 10429 Sarah Graham Greensboro North Carolina 10438 Kevin Smith Durango Colorado 10439 Conrad Giles Telluride Colorado 10449 Isabela Moore Yuma Arizona

2. Select all columns from the items_ordered table forwhoever purchased a Tent.

3. Select the customerid, order_date, and item values from theitems_ordered table for any items in the item column thatstart with the letter “S”.

4. Select the distinct items in the items_ordered table. In otherwords, display a listing of each of the unique items fromthe items_ordered table.

Aggregate Functions

MIN returns the smallest value in a given column MAX returns the largest value in a given column SUM returns the sum of the numeric values in a given column AVG returns the average value of a given column COUNT returns the total number of values in a given column COUNT(*) returns the number of rows in a table

Aggregate functions are used to compute against a “returnedcolumn of numeric data” from your SELECT statement. Theybasically summarize the results of a particular column ofselected data. We are covering these here since they are requiredby the next topic, “GROUP BY”. Although they are requiredfor the “GROUP BY” clause, these functions can be usedwithout the “GROUP BY” clause. For example:SELECT AVG(salary)FROM employee;This statement will return a single result which contains theaverage value of everything returned in the salary column fromthe employee table.

Another Example

SELECT AVG(salary)FROM employee WHERE title =‘Programmer’;This statement will return the average salary for all employeeswhose title is equal to ‘Programmer’

49

DATABASE MANAGEMENT

Example

SELECT Count(*)FROM employees;This particular statement is slightly different from the otheraggregate functions since there isn’t a column supplied to thecount function. This statement will return the number of rowsin the employees table.

Students Activity

1. Select the maximum price of any item ordered in theitems_ordered table. Hint: Select the maximum price only.

2. Select the average price of all of the items ordered that werepurchased in the month of Dec.

3. What are the total number of rows in the items_orderedtable?

4. For all of the tents that were ordered in the items_orderedtable, what is the price of the lowest tent? Hint: Your queryshould return the price only.

Group by ClauseThe Group By clause will gather all of the rows together thatcontain data in the specified column(s) and will allow aggregatefunctions to be performed on the one or more columns. Thiscan best be explained by an example:GROUP BY clause syntax:

SELECT column1, SUM(column2)FROM “list-of-tables”GROUP BY “column-list”;Let’s say you would like to retrieve a list of the highest paidsalaries in each dept:SELECT max(salary), deptFROM employeeGROUP BY dept;This statement will select the maximum salary for the people ineach unique department. Basically, the salary for the person whomakes the most in each department will be displayed. Their,salary and their department will be returned.For example, take a look at the items_ordered table. Let’s sayyou want to group everything of quantity 1 together, everythingof quantity 2 together, everything of quantity 3 together, etc. Ifyou would like to determine what the largest cost item is foreach grouped quantity (all quantity 1’s, all quantity 2’s, allquantity 3’s, etc.), you would enter:SELECT quantity, max(price)FROM items_orderedGROUP BY quantity;Enter the statement in above, and take a look at the results tosee if it returned what you were expecting. Verify that themaximum price in each Quantity Group is really the maximumprice.GROUP BY - Multiple Grouping Columns - What if ?What if you ALSO want to display their lastname for the querybelow:SELECT max(salary), deptFROM employeeGROUP BY dept;What you’ll need to do is:SELECT lastname, max(salary), deptFROM employeeGROUP BY dept, lastname;This is a called “multiple grouping columns”.

Students Activity

1. How many people are in each unique state in the customerstable? Select the state and display the number of people ineach. Hint: count is used to count rows in a column, sumworks on numeric data only.

2. From the items_ordered table, select the item, maximumprice, and minimum price for each specific item in the table.Hint: The items will need to be broken up into separategroups.

50

DATABASE MANAGEMENT 3. How many orders did each customer make? Use the

items_ordered table. Select the customerid, number oforders they made, and the sum of their orders. Click theGroup By answers link below if you have any problems.

Having ClauseThe HAVING clause allows you to specify conditions on therows for each group - in other words, which rows should beselected will be based on the conditions you specify. TheHAVING clause should follow the GROUP BY clause if youare going to use it.HAVING clause syntax:SELECT column1, SUM(column2)FROM “list-of-tables”GROUP BY “column-list”HAVING “condition”;HAVING can best be described by example. Let’s say you havean employee table containing the employee’s name, department,salary, and age. If you would like to select the average salary foreach employee in each department, you could enter:SELECT dept, avg(salary)FROM employeeGROUP BY dept;But, let’s say that you want to ONLY calculate & display theaverage if their salary is over 20000:SELECT dept, avg(salary)FROM employeeGROUP BY deptHAVING avg(salary) > 20000;

Students Activity

1. How many people are in each unique state in the customerstable that have more than one person in the state? Select thestate and display the number of how many people are ineach if it’s greater than 1.

2. From the items_ordered table, select the item, maximumprice, and minimum price for each specific item in the table.Only display the results if the maximum price for one ofthe items is greater than 190.00.

3. How many orders did each customer make? Use theitems_ordered table. Select the customerid, number oforders they made, and the sum of their orders if theypurchased more than 1 item.

Order by ClauseORDER BY is an optional clause which will allow you todisplay the results of your query in a sorted order (eitherascending order or descending order) based on the columnsthat you specify to order by.ORDER BY clause syntax:SELECT column1, SUM(column2)FROM “list-of-tables”ORDER BY“column-list” [ASC | DESC];[ ] = optionalThis statement will select the employee_id, dept, name, age, andsalary from the employee_info table where the dept equals‘Sales’ and will list the results in Ascending (default) order basedon their Salary.ASC = Ascending Order - defaultDESC = Descending Order

For example

SELECT employee_id, dept, name, age, salaryFROM employee_infoWHERE dept = ‘Sales’ORDER BY salary;If you would like to order based on multiple columns, youmust seperate the columns with commas. For example:SELECT employee_id, dept, name, age, salaryFROM employee_infoWHERE dept = ‘Sales’

51

DATABASE MANAGEMENT

ORDER BY salary, age DESC;

Students Activity

1. Select the lastname, firstname, and city for all customers inthe customers table. Display the results in Ascending Orderbased on the lastname.

2. Same thing as exercise #1, but display the results inDescending order.

3. Select the item and price for all of the items in theitems_ordered table that the price is greater than 10.00.Display the results in Ascending order based on the price.

Combining conditions and BooleanOperatorsThe AND operator can be used to join two or more conditionsin the WHERE clause. Both sides of the AND condition mustbe true in order for the condition to be met and for those rowsto be displayed.SELECT column1,SUM(column2)FROM “list-of-tables”WHERE “condition1” AND “condition2”;The OR operator can be used to join two or more conditions inthe WHERE clause also. However, either side of the ORoperator can be true and the condition will be met - hence, therows will be displayed. With the OR operator, either side can betrue or both sides can be true.

For Example

SELECT employeeid, firstname, lastname, title, salaryFROM employee_infoWHERE salary >= 50000.00 AND title = ‘Programmer’;This statement will select the employeeid, firstname, lastname,title, and salary from the employee_info table where the salary is

greater than or equal to 50000.00 AND the title is equal to‘Programmer’. Both of these conditions must be true in orderfor the rows to be returned in the query. If either is false, then itwill not be displayed.Although they are not required, you can use paranthesis aroundyour conditional expressions to make it easier to read:SELECT employeeid, firstname, lastname, title, salaryFROM employee_infoWHERE (salary >= 50000.00) AND (title = ‘Programmer’);

Another Example

SELECT firstname, lastname, title, salaryFROM employee_infoWHERE (title = ‘Sales’) OR (title = ‘Programmer’);This statement will select the firstname, lastname, title, andsalary from the employee_info table where the title is eitherequal to ‘Sales’ OR the title is equal to ‘Programmer’.

Students Activity

1. Select the customerid, order_date, and item from theitems_ordered table for all items unless they are ‘SnowShoes’ or if they are ‘Ear Muffs’. Display the rows as longas they are not either of these two items.

2. Select the item and price of all items that start with theletters ‘S’, ‘P’, or ‘F’.

In and Between Conditional OperatorsSELECT col1, SUM(col2)FROM “list-of-tables”WHERE col3 IN (list-of-values);SELECT col1, SUM(col2)FROM “list-of-tables”WHERE col3 BETWEEN value1 AND value2;The IN conditional operator is really a set membership testoperator. That is, it is used to test whether or not a value (statedbefore the keyword IN) is “in” the list of values provided afterthe keyword IN.

For Example

SELECT employeeid, lastname, salary

52

DATABASE MANAGEMENT

FROM employee_infoWHERE lastname IN (‘Hernandez’, ‘Jones’, ‘Roberts’, ‘Ruiz’);This statement will select the employeeid, lastname, salary fromthe employee_info table where the lastname is equal to either:Hernandez, Jones, Roberts, or Ruiz. It will return the rows if itis ANY of these values.The IN conditional operator can be rewritten by using com-pound conditions using the equals operator and combining itwith OR - with exact same output results:SELECT employeeid, lastname, salaryFROM employee_infoWHERE lastname = ‘Hernandez’ OR lastname = ‘Jones’ ORlastname = ‘Roberts’OR lastname = ‘Ruiz’;As you can see, the IN operator is much shorter and easier toread when you are testing for more than two or three values.You can also use NOT IN to exclude the rows in your list.The BETWEEN conditional operator is used to test to seewhether or not a value (stated before the keyword BETWEEN)is “between” the two values stated after the keyword BE-TWEEN.

For example

SELECT employeeid, age, lastname, salaryFROM employee_infoWHERE age BETWEEN 30 AND 40;This statement will select the employeeid, age, lastname, andsalary from the employee_info table where the age is between 30and 40 (including 30 and 40).This statement can also be rewritten without the BETWEENoperator:SELECT employeeid, age, lastname, salaryFROM employee_infoWHERE age >= 30 AND age <= 40;You can also use NOT BETWEEN to exclude the valuesbetween your range.

Students Ativity

1. Select the date, item, and price from the items_ordered tablefor all of the rows that have a price value ranging from10.00 to 80.00.

2. Select the firstname, city, and state from the customers tablefor all of the rows where the state value is either: Arizona,Washington, Oklahoma, Colorado, or Hawaii.

Mathematical OperatorsStandard ANSI SQL-92 supports the following first four basicarithmetic operators:

+ addition - subtraction * multiplication / division % modulo

The modulo operator determines the integer remainder of thedivision. This operator is not ANSI SQL supported, however,most databases support it. The following are some more usefulmathematical functions to be aware of since you might needthem. These functions are not standard in the ANSI SQL-92specs, therefore they may or may not be available on the specificRDBMS that you are using. However, they were available onseveral major database systems that I tested. They WILL workon this tutorial.

For example

SELECT round(salary), firstnameFROM employee_infoThis statement will select the salary rounded to the nearestwhole value and the firstname from the employee_info table.

Students ActivitySelect the item and per unit price for each item in theitems_ordered table. Hint: Divide the price by the quantity.

ABS(x) returns the absolute value of x

SIGN(x) returns the sign of input x as -1, 0, or 1 (negative, zero, or positive respectively)

MOD(x,y) modulo - returns the integer remainder of x divided by y (same as x%y)

FLOOR(x) returns the largest integer value that is less than or equal to x

CEILING(x) or CEIL(x)

returns the smallest integer value that is greater than or equal to x

POWER(x,y) returns the value of x raised to the power of y

ROUND(x) returns the value of x rounded to the nearest whole integer

ROUND(x,d) returns the value of x rounded to the number of decimal places specified by the value d

SQRT(x) returns the square-root value of x

53

DATABASE MANAGEMENT

Points to Ponder

• SQL provides rename operator both relations and attributes• Aggregate functions are used to compute against a

“returned column of numeric data” from your SELECTstatement.

• The GROUP BY clause will gather all of the rows togetherthat contain data in the specified column(s) and will allowaggregate functions to be performed on the one or morecolumns.

• The HAVING clause allows you to specify conditions onthe rows for each group

• ORDER BY is an optional clause which will allow you todisplay the results of your query in a sorted order

Review Terms

• Rename operator• Aggregate function• Having• Order-by• Group-by

54

DATABASE MANAGEMENT

Student Notes

55

DATABASE MANAGEMENT

LESSON 15:

LAB

56

DATABASE MANAGEMENT

LESSON 16:

LAB

57

DATABASE MANAGEMENT

Lesson Objective

• Table joins• Tuple variable• String operators• Set operators• Views• Update• Delete

Table JoinsAll of the queries up until this point have been useful with theexception of one major limitation - that is, you’ve beenselecting from only one table at a time with your SELECTstatement. It is time to introduce you to one of the mostbeneficial features of SQL & relational database systems - the“Join”. To put it simply, the “Join” makes relational databasesystems “relational”.Joins allow you to link data from two or more tables togetherinto a single query result—from one single SELECT statement.A “Join” can be recognized in a SQL SELECT statement if ithas more than one table after the FROM keyword.

For exampleSELECT “list-of-columns”FROM table1,table2WHERE “search-condition(s)”Joins can be explained easier by demonstrating what wouldhappen if you worked with one table only, and didn’t have theability to use “joins”. This single table database is also some-times referred to as a “flat table”. Let’s say you have a one-tabledatabase that is used to keep track of all of your customers andwhat they purchase from your store:Everytime a new row is inserted into the table, all columns willbe be updated, thus resulting in unnecessary “redundant data”.For example, every time Wolfgang Schultz purchases some-thing, the following rows will be inserted into the table:

An ideal database would have two tables:1. One for keeping track of your customers2. And the other to keep track of what they purchase:“Customer_info” table:

customer_number firstname lastname address city state zip

“Purchases” table:

customer_number date item price

LESSON 17:

SQL-III

id first last Address city state zip date item price 10982 Wolfgang Schultz 300 N. 1st Ave Yuma AZ 85002 032299 snowboard 45.00 10982 Wolfgang Schultz 300 N. 1st Ave Yuma AZ 85002 091199 gloves 15.00 10982 Wolfgang Schultz 300 N. 1st Ave Yuma AZ 85002 100999 lantern 35.00 10982 Wolfgang Schultz 300 N. 1st Ave Yuma AZ 85002 022900 tent 85.00

Now, whenever a purchase is made from a repeating customer,the 2nd table, “Purchases” only needs to be updated! We’ve justeliminated useless redundant data, that is, we’ve just normal-ized this database!Notice how each of the tables have a common“cusomer_number” column. This column, which contains theunique customer number will be used to JOIN the two tables.Using the two new tables, let’s say you would like to select thecustomer’s name, and items they’ve purchased. Here is anexample of a join statement to accomplish this:SELECT customer_info.firstname, customer_info.lastname,purchases.itemFROM customer_info, purchasesWHERE customer_info.customer_number =purchases.customer_number;This particular “Join” is known as an “Inner Join” or“Equijoin”. This is the most common type of “Join” that youwill see or use.Notice that each of the columns are always preceeded with thetable name and a period. This isn’t always required, however, itis good practice so that you wont confuse which colums gowith what tables. It is required if the name column names arethe same between the two tables. I recommend preceeding allof your columns with the table names when using joins.Note: The syntax described above will work with mostDatabase Systems . However, in the event that this doesn’twork with yours, please check your specific database documenta-tion.Although the above will probably work, here is the ANSI SQL-92 syntax specification for an Inner Join using the precedingstatement above that you might want to try:SELECT customer_info.firstname, customer_info.lastname,purchases.itemFROM customer_info INNER JOIN purchasesON customer_info.customer_number =purchases.customer_number;

Another Example

SELECT employee_info.employeeid,employee_info.lastname, employee_sales.comissionFROM employee_info, employee_salesWHERE employee_info.employeeid =employee_sales.employeeid;This statement will select the employeeid, lastname (from theemployee_info table), and the comission value (from theemployee_sales table) for all of the rows where the employeeidin the employee_info table matches the employeeid in theemployee_sales table.

58

DATABASE MANAGEMENT

Students Activity

1. Write a query using a join to determine which items wereordered by each of the customers in the customers table.Select the customerid, firstname, lastname, order_date,item, and price for everything each customer purchased inthe items_ordered table.

2. Repeat exercise #1, however display the results sorted bystate in descending order

Tuple VariablesThe as clause is particularly useful in defining the notion oftuple variables, as is done in the tuple relational calculus. A tuplevariable in SQL must be associated with a particular relation.Tuple variables are defined in the from clause by way of the asclause. To illustrate, we rewrite the query “For all customers whohave a loan from the bank, find their names, loan numbers, andloan amount” as select customer-name, T. loan-number,S.amountfrom borrower as T, loan as Swhere T. loan-number = S. loan-numberNote that we define a tuple variable in the from clause byplacing it after the name of the relation with which it isassociated, with the keyword as in between (the keyword as isoptional). When we write expressions of the form relation-name. attribute-name, the relation name is, in effect, animplicitly defined tuple variable.Tuple variables are most useful for comparing two tuples in thesame relation. Recall that, in such cases, we could use the renameoperation in the relational algebra. Suppose that we want thequery “Find the names of all branches that have assets greaterthan at least one branch located in Brooklyn.” We can write theSQL expressionselect distinct T.branch-namefrom branch as T, branch as Swhere T.assets > S.assets and S.branch-city = ‘Brooklyn’Observe that we could not use the notation branch. asset, sinceit would not be clear which reference to branch is intended.

String OperationsSQL specifies strings by enclosing them in single quotes, forexample, ‘Perryridge’, as we saw earlier. A single quote characterthat is part of a string can be specified by using two single quotecharacters; for example the string “It’s right” can be specified by‘It” s right’.The most commonly used operation on strings is patternmatching using the op-erator like. We describe patterns by usingtwo special characters:• Percent (%): The % character matches any sub string.• Underscore (-): The - character matches any character.Patterns are case sensitive; that is, uppercase characters do notmatch lowercase char-acters, or vice versa. To illustrate patternmatching, we consider the following exam-ples:• ‘Perry%’ matches any string beginning with “Perry”.• ‘%idge%’ matches any string containing “idge” as a sub

string, for example,‘Perryridge’, ‘Rock Ridge’, ‘Mianus Bridge’, and ‘Ridgeway’.

• ‘- - -’ matches any string of exactly three characters.• ‘- - % matches any string of at least three characters.SQL expresses patterns by using the like comparison operator.Consider the query “Find the names of all customers whosestreet address includes the substring ‘Main’.” This query can bewritten asselect customer-namefrom customerwhere customer-street like ‘%Main%’

Joined RelationsSQL provides not only the basic Cartesian-product mechanismfor joining tuples of relations found in its earlier versions, but,SQL also provides various other mechanisms -

The loan and borrower relations.for joining relations, including condition joins and naturaljoins, as well as var-ious forms of outer joins. These additionaloperations are typically used as subquery expressions in thefrom clause.

Examples

We illustrate the various join operations by using the relationsloan and borrower in . We start with a simple example of innerjoins.loan inner join borrower on loan. loan-number - borrower.loan-numberThe expression computes the theta join of the loan and theborrower relations, with the join condition being loan. Loan-number = borrower. loan-number. The attributes of the result

Customer name Loan number Jones L-170 Smith L-230 Hayes L-155

Loan number Branch name Amout L-170 Downtown 3000 L-230 Redwood 4000 L-260 Perryridge 1700

59

DATABASE MANAGEMENT

consist of the attributes of the Right-hand-side relationfollowed by the attributes of the right-hand-side relation.Note that the attribute loan-number appears twice in the figure-the first occur-rence is from loan, and the second is fromborrower. The SQL standard does not require attribute namesin such results to be unique. An as clause should be used toassign unique names to attributes in query and subquery results.We rename the result relation of a join and the attributes of theresult relation by using an as clause, as illustrated here:loan inner join borrower on loan. loan-number =borrower.loan-number as Lb(loan-number, branch, amount,cust, cust-Loan-num)We rename the second occurrence of loan-number to cust- loan-num. The ordering of the attributes in the result of the join isimportant for the renaming.Next, we consider an example of the left outer join operation:loan left outer join borrower on loan. Loan-number = bor-rower. loan-number

Loan number Branch name Amount Customer name Loan number L-170 Downtown 3000 Jones L-170 L-230 Redwood 4000 Smith L-230

The result of loan inner join borrower onloan. loan-number = borrower .loan-number.

Loan number Branch name Amount Customer name Loan number L-170 Downtown 3000 Jones L-170 L- 230 Redwood 4000 Smith L-230 L-260 Perryridge 1700 null null

The result of loan left outer join borrower on loan. loan-number = borrower .loan-number.We can compute the left outer join operation logically asfollows. First, compute the result of the inner join as before.Then, for every tuple t in the left-hand-side relation loan thatdoes not match any tuple in the right-hand-side relationborrower in the inner join, add a tuple r to the result of thejoin: The attributes of tuple r that are derived from the left-hand-side relation are filled in with the values from tuple t, andthe remaining attributes of r are filled with null values. Thetuples (L-170, Downtown, 3000) and (L-230, Redwood, 4000)join with tuples from borrower and appear in the result of theinner join, and hence in the result of the left outer join. On theother hand, the tuple (L-260, Perryridge, 1700) did not matchany tuple from borrower in the inner join, and hence a tuple (L-260, Perryridge, 1700, null, null) is present in the result of theleft outer join.Finally, we consider an example of the natural join operation:loan natural inner join borrowerThis expression computes the natural join of the two relations.The only attribute name common to loan and borrower is loan-number. However, the attribute loan-number appears only oncein the result of the natural join, whereas it appears twice in theresult of the join with the on condition.

Join Types and ConditionsWe saw examples of the join operations permitted in SQL Joinop-erations take two relations and return another relation as the

result. Although outer -join expressions are typically used in thefrom clause, they can be used anywhere that a relation can beused.Each of the variants of the join operations in SQL consists of ajoin type and a join condition. The join condition defines whichtuples in the two relations match and what attributes arepresent in the result of the join. The join type defines howtuples in each

Loan number Branch name Amount Customer name L-170 Downtown 3000 Jones L- 230 Redwood 4000 Smith

The result of loan natural inner join borrower.

Join types Inner join Left outer join Right outer join Full outer join

Join Types and Join ConditionsThe first join type is the inner join, and the other three are theouter joins. Of the three join conditions, we have seen thenatural join and the on condition before, and we shall discussthe using condition, later in this section.The use of a join condition is mandatory for outer joins, but isoptional for inner joins (if it is omitted, a Cartesian productresults). Syntactically, the keyword natural appears before thejoin type, as illustrated earlier, whereas the on and using con-ditions appear at the end of the join expression. The keywordsinner and outer are optional, since the rest of the join typeenables us to deduce whether the join is an inner join or anouter join.The meaning of the join condition natural, in terms of whichtuples from the two relations match, is straightforward. Theordering of the attributes in the result of a natural join is asfollows. The join attributes (that is, the attributes common toboth relations) appear first, in the order in which they appear inthe left-hand-side relation. Next come all nonjoin attributes ofthe left-hand-side relation, and finally all nonjoin attributes ofthe right-hand-side relation.The right outer join is symmetric to the left outer join. Tuplesfrom the right-hand--side relation that do not match any tuplein the left-hand-side relation are padded with nulls and areadded to the result of the right outer join.Here is an example of combining the natural join conditionwith the right outer join type:loan natural right outer join borrowerThe attributes of the result are defined by the join type, which isa natural join; hence, loan-number appears only once. The firsttwo tuples in the result are from the inner natural join of loanand borrower. The tuple (Hayes, L-155) from the right-hand-side relation does not match any tuple from the left-hand-siderelation loan in the natural inner join. Hence, the tuple (L-155,null, null, Hayes) appears in the join result.

60

DATABASE MANAGEMENT

The join condition using (Ai, A2, . . . ,An) is similar to thenatural join condition, ex-cept that the join attributes are theattributes A1, A2, . . . , An, rather than all attributes that arecommon to both relations. The attributes A1, A2,…, An mustconsist of only attributes that are common to both relations,and they appear only once in the result of the join.The full outer join is a combination of the left and right outer-join types. After the operation computes the result of the innerjoin, it extends with nulls tuples from

Loan number Branch name Amount Customer name L-170 Downtown 3000 Jones L-230 Redwood 4000 Smith L-155 Null null Hayes

The result of loan natural right outer join borrower.the left-hand-side relation that did not match with any from theright-hand-side, and adds them to the result. Similarly, itextends with nulls tuples from the right-hand--side relationthat did not match with any tuples from the left-hand-siderelation and adds them to the result.loan full outer join borrower using (loan-number)

Loan number Branch name Amount Customer name L-170 Downtown 3000 Jones L- 230 Redwood 4000 Smith L-260 Perryridge 1700 null L-155 null Null Hayes

The result of loan full outer join borrower using(loan-number).

Points to Ponder

• Joins allow you to link data from two or more tablestogether into a single query result—from one singleSELECT statement.

• Tuple variables are defined in the from clause by way of theas clause

• We describe patterns by using two special characters:• Percent (%): The % character matches any sub string.• Underscore (-): The - character matches any character.

• Various kinds of joins are as follows:-

Join types Inner join Left outer join Right outer join Full outer join

Review Terms

• Table joins• Tuple variable• String operators• Set operators• Views• Update• Delete

61

DATABASE MANAGEMENT

Student Notes

62

DATABASE MANAGEMENT

LESSON 18:

LAB

63

DATABASE MANAGEMENT

LESSON 19:

LAB

64

DATABASE MANAGEMENT

Lesson Objectives

• Set operators• Union• Intersect• Except• View• Update• Delete

Set OperationsThe SQL operations union, intersect, and except operate onrelations and correspond to the relational-algebra operations U,n, and -. Like union, intersection, and set difference in relationalalgebra, the relations participating in the operations must becompatible; that is, they must have the same set of attributes.

select customer-namefrom depositor

and the set of customers who have a loan at the bank, whichcan be derived by

select customer-namefrom borrower

We shall refer to the relations obtained as the result of thepreceding queries as d and b, respectively.

The Union OperationTo find all customers having a loan, an account, or both at thebank, we write

(select customer-namefrom depositor)union(select customer-namefrom borrower)

The union operation automatically eliminates duplicates, unlikethe select clause. Thus, in the preceding query, if a customer-say,Jones-has several accounts or loans (or both) at the bank, thenJones will appear only once in the result.If we want to retain all duplicates, we must write union all inplace of union:

(select customer-namefrom depositor)union all (select Customer-namefrom borrower)

The number of duplicate tuples in the result is equal to thetotal number of duplicates that appear in both d and b. Thus,if Jones has three accounts and two loans at the bank, thenthere will be five tuples with the name Jones in the result.

The Intersect OperationTo find all customers who have both a loan and an account atthe bank, we write

(select distinct customer-namefrom depositor)intersect (select distinct customer-namefrom borrower)

The intersect operation automatically eliminates duplicates.Thus, in the preceding query, if a customer-say, Jones-hasseveral accounts and loans at the bank, then Jones will appearonly once in the result.If we want to retain all duplicates, we must write intersect all inplace of intersect:

(select customer-namefrom depositor)intersect all (select customer-namefrom borrower)

The number of duplicate tuples that appear in the result isequal to the minimum number of duplicates in both d and b.Thus, if Jones has three accounts and two loans at the bank,then there will be two tuples with the name Jones in the result.

The Except OperationTo find all customers who have an account but no loan at thebank, we write

(select distinct customer-namefrom depositor)except (select customer-namefrom borrower)

The except operation automatically eliminates duplicates. Thus,in the preceding query, a tuple with customer name Jones willappear (exactly once) in the result only if Jones has an account atthe bank, but has no loan at the bank.If we want to retain all duplicates, we must write except all inplace of except:

(select customer-namefrom depositor)except all (select customer-name from borrower)

The number of duplicate copies of a tuple in the result is equalto the number of duplicate copies of the tuple in d minus thenumber of duplicate copies of the tuple in b, provided that thedifference is positive. Thus, if Jones has three accounts and oneloan at the bank, then there will be two tuples with the nameJones in the result. If, instead, this customer has two accountsand three loans at the bank, there will be no tuple with thename Jones in the result.

LESSON 20:

SQL-IV

65

DATABASE MANAGEMENT

ViewsWe define a view in SQL by using the create view command. Todefine a view, we must give the view a name and must state thequery that computes the view. The form of the create viewcommand is

create view v as <query expression>where <query expression> is any legal query expression. Theview name is repre-sented by v.As an example, consider the view consisting of branch namesand the names of customers who have either an account or aloan at that branch. Assume that we want this view to be calledall-customer. We define this view as follows:

create view all-customer as(select branch-name, customer-namefrom depositor, accountwhere depositor.account-number = account.account-number)union(select branch-name, customer-namefrom borrower, loanwhere borrower. loan-number = loan.loan-number)

The attribute names of a view can be specified explicitly asfollows:

create view branch-total-Ioan(branch-name, total-loan)asselect branch-name, sum(amount)from loangroup by branch-name

The preceding view gives for each branch the sum of theamounts of all the loans at the branch. Since the expressionsum (amount) does not have a name, the attribute name isspecified explicitly in the view definition.View names may appear in any place that a relation name mayappear. Using the view all-customer, we can find all customersof the Perryridge branch by writing

select customer-namefrom all-customerwhere branch-name = ‘Perryridge’

Modification of the Database

UpdateThe update statement is used to update or change records thatmatch a specified criteria. This is accomplished by carefullyconstructing a where clause.update “tablename”set “columnname” =“newvalue”[,”nextcolumn” =“newvalue2”...]where “columnname”OPERATOR “value”

[and|or “column”OPERATOR “value”];[] = optional

Examples

update phone_bookset area_code = 623where prefix = 979;update phone_bookset last_name = ‘Smith’, prefix=555, suffix=9292where last_name = ‘Jones’;update employeeset age = age+1where first_name=’Mary’ and last_name=’Williams’;

Students Activity

1. Jonie Weber just got married to Bob Williams. She hasrequested that her last name be updated to Weber-Williams.

2. Dirk Smith’s birthday is today, add 1 to his age.

3. All secretaries are now called “Administrative Assistant”.Update all titles accordingly.

4. Everyone that’s making under 30000 are to receive a 3500 ayear raise.

66

DATABASE MANAGEMENT

5. Everyone that’s making over 33500 are to receive a 4500 ayear raise.

6. All “Programmer II” titles are now promoted to“Programmer III”.

7. All “Programmer” titles are now promoted to“Programmer II”.

Deleting RecordsThe delete statement is used to delete records or rows from thetable.delete from “tablename”where “columnname”OPERATOR “value”[and|or “column”OPERATOR “value”];[ ] = optional

Examples

delete from employee;Note: if you leave off the where clause, all records will bedeleted!delete from employeewhere lastname = ‘May’;delete from employeewhere firstname = ‘Mike’ or firstname = ‘Eric’;To delete an entire record/row from a table, enter “delete from”followed by the table name, followed by the where clause whichcontains the conditions to delete. If you leave off the whereclause, all records will be deleted.

Students Activity

1. Jonie Weber-Williams just quit, remove her record from thetable.

2. It’s time for budget cuts. Remove all employees who aremaking over 70000 dollars.

Points to Ponder

• The SQL operations union, intersect, and except operate onrelations

• The union operation automatically eliminates duplicates,unlike the select clause

• The intersect operation automatically eliminates duplicates.• To define a view, we must give the view a name and must

state the query that computes the view• The update statement is used to update or change records

that match a specified criteria• The delete statement is used to delete records or rows from

the table.

Review Terms

• Set operators• Union• Intersect• Except• View• Update• Delete

Students Activity

67

DATABASE MANAGEMENT

LESSON 21:

LAB

68

DATABASE MANAGEMENT

LESSON 22:

LAB

69

DATABASE MANAGEMENT

Lesson Objective

• Data constraints• Column Level Constraints• Level Constraints• Table level constraints• NULL value• Primary Key• Unique key• Default key• Foreign key• NOT NULL constraints• Check constraints

Data ConstraintsIntegrity constraints ensure that changes made to the databaseby authorized users do not result in a loss of data consistency.Thus, integrity constraints guard against accidental damage tothe database.Besides the cell name, cell length and cell data type, there areother parameters i.e. other data constraints that can be passed tothe DBA at cell creation time.These data constraints will be connected to a cell by the DBA asflags. Whenever a user attempts to load a cell with data, theDBA will check the data being loaded into the cell against thedata constraints defined at the time the cell was created. If thedata being loaded fails any of the data constraint checks fired bythe DBA, the DBA will not load the data into the cell, reject theentered record, and will flash an error message to the user.These constraints are given a constraint name and the DBAstores the constraints with its name and instructions internallyalong with the cell itselfThe constraint can either be placed at the column level or at thetable level.

Column Level ConstraintsIf the constraints are defined along with the column definition,it is called as a column level constraint. Column level constraintcan be applied to anyone column at a time i.e. they are local to aspecific column. If the constraint spans across multiplecolumns, the user will have to use table level constraints.

Table Level ConstraintsIf the data constraint attached to a specific cell in a tablereferences the contents of another cell in the table then the userwill have to use table level constraints. Table level constraints arestored as a part of the global table definition.Examples of different constraints that can be applied on thetable are as follows:

Null Value ConceptsWhile creating tables, if a row lacks a data value for a particularcolumn, that value is said to be null. Columns of any data typesmay contain null values unless the column was defined as notnull when the table was created.

Principles of Null values

• Setting a null value is appropriate when the actual value isunknown, or when a value would not be meaningful.

• A null value is not equivalent to a value of zero.• A null value will evaluate to null in any expression. e.g. null

multiplied by 10 is null.• When a column name is defined as not null, then that

column becomes a mandatory column. It implies that theuser is forced to enter data into that column.

Example: Create table client master with a not null constrainton columns client no, Name, address, address2.NOT NULL as a column constraint:CREATE TABLE client master(client_no varchar2(6) NOT NULL,name varchar2(20) NOT NULL,address 1 varchar2(30) NOT NULL,address2 varchar2(30) NOT NULL,city varchar2(15), state varchar2(15), pin code number( 6),remarks varchar2(60), bal_due number (10,2));

Primary Key ConceptsA primary key is one or more columns in a table used touniquely identify each row. in the table. Primary key values mustnot be null and must be unique across the column.A multicolumn primary key is called a composite primary key.The only function that a primary key performs is to uniquelyidentify a row and thus if one column is used it is just as goodas if multiple columns are used. Multiple columns i.e. (com-posite keys) are used only when the system designed requires aprimary key that cannot be contained in a single column.

Examples

Primary Key as a column constraint:Create client_master where client_no is the primary key.CREATE TABLE client master(client_no varchar2(6) PRIMARY KEY,name varchar2(20), add}-essl varchar2(30), address2varchar2(30),city varcbar2(15), state varchar2(15), pincode number(6),remarks varchar2(60), bal_due number (10,2));Primary Key as a table constraint: Create a sales order detailstable where

LESSON 23:

INTEGRITY AND SECURITY

70

DATABASE MANAGEMENT

Column Name: Data Type Size Attributes S_order_no varchar2 6

Primary Key

product_no varchar2 6 Primary Key qty _ordered Number 8

qty - disp Number 8 product_rate Number 8,2

CREATE TABLE sales order details(s_order_no varchar2(6), product_no varchar2(6),qty _ordered number(8), qty - disp number(8),product_rate number(8,2),PRIMARY KEY (s_order_no, product_no));

Unique Key ConceptsA unique key is similar to a primary key, except that the purposeof a unique key is to ensure that information in the column foreach record is unique, as with telephone or driver’s licensenumbers. A table may have many unique keys.Example: Create Table client_master with unique constraint oncolumn client_noUNIQUE as a column constraint:CREATE TABLE client master(client_no varchar2( 6) CONSTRAINT cnmn - ukey UNIQUE,name varchar2(20), address 1 varchar2(30), address2varchar2(30),city varchar2(15), state varchar2(l5), pincode number(6),remarks varch_2(60), bal_due number(lO,2), partpaY311char(l));UNIQUE as a table constraint:CREATE TABLE client master(client_no varchar2(6), name varchar2(20),addressl varchar2(30), address2 varchar2(30),city varchar2(15), state varchar2(15), pincode number(6),remarks varchar2(60), bal_due number(lO,2),CONSTRAINT cnmn_ukey UNIQUE (client_no));

Default Value ConceptsAt the time of cell creation a ‘default value’ can be assigned to it.When the user is loading a ‘record’ with values and leaves thiscell empty, the DBA will automatically load this cell with thedefault value specified- The data type of the default valueshould match the data type of the column. You can use thedefault clause to specify any default value you want.Create sales_order table where:

CREATE TABLE sales_order(s_order_no varchar2(6) PRIMARY KEY,s _order_date date, client_no varchar2(6),dely _Addr varchar2(25), salesman _no varchar2(6),dely_type char(l) DEFAULT ‘F’,billed_yn char( l), dely_date date,order_status varchar2(1 0)) -

Foreign Key ConceptsForeign keys represent relationships between tables. A foreignkey is a Column (or a group of columns) whose values -arederived from the primary key of the same or some other table.The existence of a foreign key implies that the table with theforeign key is related to the - primary key table from which theforeign-key is derived. A foreign key must have a correspondingprimary key value in the primary key table to have a meaning.For example, the s_order_no column is the primary key oftable sales_order. In table sales_order__details, s _order_no is aforeign key that references the s_order_.no values in table salesorder .

The Foreign Key References Constraint

• rejects an Insert or Update of a value, if a correspondingvalue does not currently exist in the primary key table

• rejects a DEL_TE, if it would invalidate a REFERENCESconstrain

• must reference a PRIMARY KEY or UNIQUE column(s)in primary key table

• will reference the PRIMARY KEY of the primary key tableif no column or group of columns is specified in theconstraint

• must reference a table, not a view or cluster;• requires that you own the primary key table, have Reference

privilege on it, or have column-level Reference privilege onthe referenced colwnns in the primary key table;

• doesn’t restrict how other constraints may reference thesame tables;

• requires that the FOREIGN KEY column(s) and theCONSTRAINT column(s) have matching data types;

• may reference the same table named in the CREATETABLE statement;

• must not reference the same column more than once (in asingle constraint).

Example: Create table sales_order _details with primary key ass_order_no and product_no and foreign key as s_order_noreferencing column s_order_no in the sales order table.

Foreign Key as a Column Constraint

CREATE TABLE sales order details(s_order_no varchar2(6) REFERENCES sales_order,product_no varchar2(6),qty _ordered number(8), qty - disp number(8), product_ratenumber(8,2),Primary Key (s_order_no, product_no));

Column Name Data Type Size Attributes

S_order no varchar2 6 Primary Key S_order date date

Client_no varchar2 6 Dely_Addr varchar2 25

Salesman_no varchar2 6 Dely_type char I

Delivery: part (P) / full (F) Billed_yn char I Default 'F' Dely_date date

Order_status varchar2 10

71

DATABASE MANAGEMENT

FOREIGN KEY as a Table Constraint

Create Table sales order details( s _order_no varchar2( 6),product_no varchar2(6),qty_ordered number(8), qty_disp number(8),product_rate number(8,2),Primary Key (s_order_no, product_no),Foreign Key (s_order_no) References sales_order);

Check Integrity Constraints

Use the Check constraint when you need to enforce integrityrules that can be evaluated based on a logical expression. Neveruse Check constraints if the constraint can be defined using thenot null, primary key or foreign key constraint.Following are a few examples of appropriate CHECK con-straints:• a Check constraint on the client no column of the client

master so that no client no value starts with ‘C’.• a Check constant on name column of the client master so

that the name is entered in upper case.• a Check constraint on the city column of the client_master

so that only the cities “Bombay”, “New Delhl “, “Mapras”and “Calcutta” are allowed.

Create T Able client master(client_no varchar2(6) Constraint ck_clientnoCheck ( client_no like ‘C%’),name varchar2(20) Constraint ck_cnameCheck (name = upper(name»,address I varchar2(30), address2 varchar2(30),city varchar2( 15) Constraint ck _cityCheck (city In Cnewdelhi’, ‘Bombay’, ‘Calcutta’, ‘Madras’)),state varchar2(l5), pin code number(6),remarks varchar2(60), bal- due number(10,2));When using CHECK constraints, consider the ANSI I ISOstandard which states that a CHECK constraint is violated onlyif the condition evaluates to False, True and unknown valuesdo not violate a check condition. Therefore, make sure that aCHECK constraint that you define actually enforces the rule youneed to enforce.For example, consider the following CHECK constraint foremp table:CHECK ( sal > 0 or comm >= 0 )At first glance, this rule may be interpreted as “do not allow arow in emp table unless the employee’s salary is greater than 0or the employee’s commission is greater than or equal to “0”.However, note that if a row is inserted with a null salary and anegative commission, the row does not violate the CHECKconstraint because the entire check condition is evaluated asunknown. In this particular case, you can account for suchviolations by placing not null integrity constraint on both thesal and comm columns.

Check with not Null Integrity Constraints

According to the ANSI I ISO standard, a not null integrityconstraint is an example of a CHECK integrity constraint,where the condition is, CHECK (column_name IS NOTNULL). Therefore, the not null integrity constraint for a singlecolumn can, in practice be written in two forms, by using thenot null constraint or by using a CHECK constrllint. For easeof use, you should always choose to define the not nullintegrity constraint instead of a CHECK constraint with the isnot null condition.Here we shall look at the method via which data constraints canbe attached to the cell so that data validation can be done attable level itself using the power of the DBA. .A constraint clause restricts the range of valid values for onecolumn (a column constraint) or for a group of columns (atable constraint). Any INSERT, UPDATE or DELETEstatement evaluates a relevant constraint; the constraint must besatisfied for the statement to succeed.Constraints can be connected to a table by CREATE TABLE orALTER TABLE command. Use ALTER T ABLE to add. ordrop the constraint from a table. Constraints are recorded in thedata dictionary. If you don’t name a constraint, it is assigned thename SYS - Cn, where n is an integer that makes the nameunique in the database.

Restrictions on Check Constraints

A Check integrity constraint requires that a condition be true orunknown for every row of the table. If a statement causes thecondition to evaluate to false; the statement is rolled back. Thecondition of a CHECK constraint has the following limita-tions:• The condition must be a Boolean expression that can be

evaluated using the values in the row being inserted orupdated.

• The condition cannot contain sub queries or sequences.• The condition cannot include the Sysdate, Uid, User or

Userenv SQL function

Defining Different Constraints on the Table

Create a sales_order _details table where

Create Table sales_order details(s_order_no varchar2(6) Contraint order_fkeyReferences sales_order,product_no varchar2(6) Constraint product_fkeyReferences product_master,qty _ordered number(8) Not Null,

Column Name: Data Type Size Attributes

S_order_ no varchar2 6 Primary Key, Foreign Key references s order no of sales order table.

product_no varchar2 6 Primary Key, Foreign Key references product_no of product_master table.

qty_ordered number 8 not null Qty_disp number 8. product_rate number 8,2 not null

72

DATABASE MANAGEMENT

qty - disp number(8),product_rate number(8,2) Not Null,Primary Key (s_order_no, product_no));

Defining Integrity Constraints in the Alter Table

Command

You can also define integrity constraints using the constraintclause in the Alter Table command. The following examplesshow the definitions of several integrity constraints:1. Add Primary Key constant on column supplier_no in table

supplier_master;Alter Table supplier_masterAdd Primary Key (suppplier_no);

2. Add Foreign Key constraint on column s order no in tablesales order details referencing table sales_order, modifycolumn qty _ordered to include Not Null constant;Alter Table sales order detailsAdd Constraint order_tkeyForeign Key (s_order_no) References sales_orderModify (qty_ordered number(8) Not Null);

Dropping Integrity Constraints in the Alter Table Cmmand

You can drop an integrity constraint if the rule that it enforces isno longer true or if the constraint is no longer needed. Dropthe constraint using the Alter Table command with .the DROPclause. The following examples illustrate the dropping ofintegrity constraints:1. Drop the Primary Key constraint from supplier_master;

Alter Table supplier_masterDrop Primary Key;

2. Drop Foreign Key constraint on column product_no intable sales_order_details;Alter Table sales order detailsDrop Constraint product- fkey;

Note : Dropping UniquE and Primary Key constraints dropsthe associated indexes.

Points to Ponder

• Integrity constraints ensure that changes made to thedatabase by authorized users do not result in a loss of dataconsistency.

• If the constraints are defined along with the columndefinition, it is called as a column level constraint.

• If the data constraint attached to a specific cell in a tablereferences the contents of another cell in the table then theuser will have to use table level constraints

• While creating tables, if a row lacks a data value for aparticular column, that value is said to be null.

• A primary key is one or more columns in a table used touniquely identify each row.

• A unique key is similar to a primary key, except that thepurpose of a unique key is to ensure that information inthe column for each record is unique.

• When the user is loading a ‘record’ with values and leavesthis cell empty, the DBA will automatically load this cellwith the default value specified.

• Foreign keys represent relationships between tables. Aforeign key is a Column (or a group of columns) whosevalues -are derived from the primary key of the same orsome other table.

• Use the CHECK constraint when you need to enforceintegrity rules that can be evaluated based on a logicalexpression

• PL/SQL is Oracle’s procedural language extension to SQL.PL/SQL enables you to mix SQL statements withprocedural constructs

• Procedures, functions, and packages are all examples of PL/SQL program units.

• Database triggers are procedures that are stored in thedatabase and are implicitly executed (fired) when thecontents of a table are changed

Review Terms

• Integrity constraints• Table level constraints• Column level constraint• Check constraints• NULL constraints• Primary key• Foreign key• Unique key

Students Activity

1. What is constraint?How many kinds of constraints arethere?

2. Define primary key,unique key,foreign key?

3. Define check integrity constraints?

73

DATABASE MANAGEMENT

4. Differentiate between table level & column level constraint?

5. Define Not Null constraint with the help of an example?

74

DATABASE MANAGEMENT

Student Notes

75

DATABASE MANAGEMENT

LESSON 24:

LAB

76

DATABASE MANAGEMENT

LESSON 25:

LAB

77

DATABASE MANAGEMENT

Lesson objectives

• Difference between SQL & PL/SQL• Stored procedures• Packages• Stored procedures• FunctionsPL/SQL is Oracle’s procedural language extension to SQL. PL/SQL enables you to mix SQL statements with proceduralconstructs. With PL/SQL, you can define and execute PL/SQLprogram units such as procedures, functions, and packages. PL/SQL program units generally are categorized as anonymousblocks and stored procedures.An anonymous block is a PL/SQL block that appears withinyour application and it is not named or stored in the database.In many applications, PL/SQL blocks can appear wherever SQLstatements can appear.A stored procedure is a PL/SQL block that Oracle stores in thedatabase and can be called by name from an application. Whenyou create a stored procedure, Oracle parses the procedure andstores its parsed representation in the database. Oracle alsoallows you to create and store functions (which are similar toprocedures) and packages (which are groups of procedures andfunctions).

An Introduction to Stored Procedures and PackagesOracle allows you to access and manipulate database informa-tion using procedural schema objects called PL/SQL programunits. Procedures, functions, and packages are all examples ofPL/SQL program units.PL/SQL is Oracle’s procedural language extension to SQL. Itextends SQL with flow control and other statements that makeit possible to write complex programs in it. The PL/SQL engineis the tool you use to define, compile, and execute PL/SQLprogram units. This engine is a special component of manyOracle products, including Oracle Server.While many Oracle products have PL/SQL components, thischapter specifically covers the procedures and packages that canbe stored in an Oracle database and processed using the OracleServer PL/SQL engine. The PL/SQL capabilities of each Oracletool are described in the appropriate tool’s documentation.

Stored Procedures and FunctionsProcedures and functions are schema objects that logically groupa set of SQL and other PL/SQL programming languagestatements together to perform a specific task. Procedures andfunctions are created in a user’s schema and stored in a databasefor continued use. You can execute a procedure or functioninteractively using an Oracle tool, such as SQL*Plus, or call itexplicitly in the code of a database application, such as an OracleForms or Precompiler application, or in the code of anotherprocedure or trigger.

PackagesA package is a group of related procedures and functions,together with the cursors and variables they use, stored togetherin the database for continued use as a unit. Similar tostandalone procedures and functions, packaged procedures andfunctions can be called explicitly by applications or users.

Procedures and FunctionsA procedure or function is a schema object that consists of a setof SQL statements and other PL/SQL constructs, groupedtogether, stored in the database, and executed as a unit to solvea specific problem or perform a set of related tasks. Proceduresand functions permit the caller to provide parameters that canbe input only, output only, or input and output values.Procedures and functions allow you to combine the ease andflexibility of SQL with the procedural functionality of astructured programming language.For example, the following statement creates theCredit_Account procedure, which credits money to a bankaccount:Create Procedure credit_account(acct Number, credit Number) AS/* This procedure accepts two arguments: an accountnumber and an amount of money to credit to the specifiedaccount. If the specified account does not exist, anew account is created. */old_balance Number;new_balance Number;BeginSelect balance Into old_balance From accountsWhere acct_id = acctFor Update Of balance;new_balance := old_balance + credit;Update accounts Set balance = new_balanceWhere acct_id = acct;Commit;ExceptionWhen No_Data_Found ThenInsert Into accounts (acct_id, balance)Values(acct, credit);When Others ThenRollback;End credit_account;Notice that the Credit_Account procedure includes both SQLand PL/SQL statements.

LESSON 26:

PL/SQL

78

DATABASE MANAGEMENT

Procedure GuidelinesUse the following guidelines to design and use all storedprocedures:• Define procedures to complete a single, focused task. Do

not define long procedures with several distinct subtasks,because subtasks common to many procedures might beduplicated unnecessarily in the code of several procedures.

• Do not define procedures that duplicate the functionalityalready provided by other features of Oracle. For example,do not define procedures to enforce simple data integrityrules that you could easily enforce using declarative integrityconstraints.

Benefits of ProceduresProcedures provide advantages in the following areas.

SecurityStored procedures can help enforce data security. You can restrictthe database operations that users can perform by allowingthem to access data only through procedures and functions.For example, you can grant users access to a procedure thatupdates a table, but not grant them access to the table itself.When a user invokes the procedure, the procedure executes withthe privileges of the procedure’s owner. Users who have onlythe privilege to execute the procedure (but not the privileges toquery, update, or delete from the underlying tables) can invokethe procedure, but they cannot manipulate table data in anyother way.

PerformanceStored procedures can improve database performance in severalways:• The amount of information that must be sent over a

network is small compared with issuing individual SQLstatements or sending the text of an entire PL/SQL blockto Oracle, because the information is sent only once andthereafter invoked when it is used.

• A procedure’s compiled form is readily available in thedatabase, so no compilation is required at execution time.

• If the procedure is already present in the shared pool of theSGA, retrieval from disk is not required, and execution canbegin immediately.

Memory AllocationBecause stored procedures take advantage of the sharedmemory capabilities of Oracle, only a single copy of theprocedure needs to be loaded into memory for execution bymultiple users. Sharing the same code among many usersresults in a substantial reduction in Oracle memory require-ments for applications.

ProductivityStored procedures increase development productivity. Bydesigning applications around a common set of procedures,you can avoid redundant coding and increase your productivity.For example, procedures can be written to insert, update, ordelete rows from the EMP table. These procedures can then becalled by any application without rewriting the SQL statementsnecessary to accomplish these tasks. If the methods of data

management change, only the procedures need to be modified,not all of the applications that use the procedures.

IntegrityStored procedures improve the integrity and consistency of yourapplications. By developing all of your applications around acommon group of procedures, you can reduce the likelihood ofcommitting coding errors.For example, you can test a procedure or function to guaranteethat it returns an accurate result and, once it is verified, reuse itin any number of applications without testing it again. If thedata structures referenced by the procedure are altered in any way,only the procedure needs to be recompiled; applications that callthe procedure do not necessarily require any modifications.

Anonymous PL/SQL Blocks vs. Stored ProceduresA stored procedure is created and stored in the database as aschema object. Once created and compiled, it is a named objectthat can be executed without recompiling. Additionally,dependency information is stored in the data dictionary toguarantee the validity of each stored procedure.As an alternative to a stored procedure, you can create ananonymous PL/SQL block by sending an unnamed PL/SQLblock to Oracle Server from an Oracle tool or an application.Oracle compiles the PL/SQL block and places the compiledversion in the shared pool of the SGA, but does not store thesource code or compiled version in the database for reusebeyond the current instance. Shared SQL allows anonymousPL/SQL blocks in the shared pool to be reused and shareduntil they are flushed out of the shared pool.In either case, moving PL/SQL blocks out of a databaseapplication and into database procedures stored either in thedatabase or in memory, you avoid unnecessary procedurerecompilations by Oracle at runtime, improving the overallperformance of the application and Oracle.

External ProceduresA PL/SQL procedure executing on an Oracle8 Server can call anexternal procedure or function that is written in the C program-ming language and stored in a shared library. The C routineexecutes in a separate address space from that of the OracleServer.

PackagesPackages encapsulate related procedures, functions, andassociated cursors and variables together as a unit in thedatabase.You create a package in two parts: the specification and the body.A package’s specification declares all public constructs of thepackage and the body defines all constructs (public and private)of the package. This separation of the two parts provides thefollowing advantages:• The developer has more flexibility in the development cycle.

You can create specifications and reference public procedureswithout actually creating the package body.

• You can alter procedure bodies contained within the packagebody separately from their publicly declared specifications inthe package specification. As long as the procedurespecification does not change, objects that reference the

79

DATABASE MANAGEMENT

altered procedures of the package are never marked invalid;that is, they are never marked as needing recompilation

The following example creates the specification and body for apackage that contains several procedures and functions thatprocess banking transactions.Create Package bank_transactions (null) ASminimum_balance Constant Number := 100.00;Procedure apply_transactions;Procedure enter_transaction (acct Number, kind Char,amount Number);END bank_transactions;Create Package Body bank_transactions AS/* Package to input bank transactions */new_status CHAR(20); /* Global variable to record status oftransaction being applied. Used for update inApply_Transactions. */Procedure do_journal_entry (acctNumber, kind CHAR) IS/* Records a journal entry for each bank transaction appliedby the Apply_Transactions procedure. */BeginInsert Into journalValues (acct, kind, sysdate);IF kind = ‘D’ THENnew_status := ‘Debit applied’;ELSIF kind = ‘C’ THENnew_status := ‘Credit applied’;ELSEnew_status := ‘New account’;END IF;END do_journal_entry;Procedure credit_account (acct Number, credit Number) IS/* Credits a bank account the specified amount. If the accountdoes not exist, the procedure creates a new account first. */old_balance NUMBER;new_balance NUMBER;BEGINSELECT balance INTO old_balance FROM accountsWHERE acct_id = acctFOR UPDATE OF balance; /* Locks account for credit update*/new_balance := old_balance + credit;UPDATE accounts SET balance = new_balanceWHERE acct_id = acct;do_journal_entry(acct, ‘C’);EXCEPTIONWHEN NO_DATA_FOUND THEN /* Create new accountif not found */INSERT INTO accounts (acct_id, balance)VALUES(acct, credit);do_journal_entry(acct, ‘N’);When Others Then /* Return other errors to application */new_status := ‘Error: ‘ || SQLERRM(SQLCODE);END credit_account;PROCEDURE debit_account (acct NUMBER, debit NUM-BER) IS/* Debits an existing account if result is greater than theallowed minimum balance. */old_balance NUMBER;

new_balance NUMBER;insufficient_funds EXCEPTION;BEGINSELECT balance INTO old_balance FROM accountsWHERE acct_id = acctFOR UPDATE OF balance;new_balance := old_balance - debit;IF new_balance >= minimum_balance THENUPDATE accounts SET balance = new_balanceWHERE acct_id = acct;do_journal_entry(acct, ‘D’);ELSERAISE insufficient_funds;END IF;EXCEPTIONWHEN NO_DATA_FOUND THENnew_status := ‘Nonexistent account’;WHEN insufficient_funds THENnew_status := ‘Insufficient funds’;WHEN OTHERS THEN /* Returns other errors to applica-tion */new_status := ‘Error: ‘ || SQLERRM(SQLCODE);END debit_account;PROCEDURE apply_transactions IS/* Applies pending transactions in the table TRANSACTIONSto theACCOUNTS table. Used at regular intervals to update bankaccounts without interfering with input of new transactions. *//* Cursor fetches and locks all rows from the TRANSAC-TIONStable with a status of ‘Pending’. Locks released after all pendingtransactions have been applied. */CURSOR trans_cursor ISSELECT acct_id, kind, amount FROM transactionsWHERE status = ‘Pending’ORDER BY time_tagFOR UPDATE OF status;BEGINFOR trans IN trans_cursor LOOP /* implicit open and fetch*/IF trans.kind = ‘D’ THENdebit_account(trans.acct_id, trans.amount);ELSIF trans.kind = ‘C’ THENcredit_account(trans.acct_id, trans.amount);ELSEnew_status := ‘Rejected’;END IF;/* Update TRANSACTIONS table to return result of applyingthis transaction. */UPDATE transactions SET status = new_statusWHERE CURRENT OF trans_cursor;END LOOP;COMMIT; /* Release row locks in TRANSACTIONS table. */END apply_transactions;PROCEDURE enter_transaction (acct NUMBER, kindCHAR, amount NUMBER) IS/* Enters a bank transaction into the TRANSACTIONS table.A new transaction is always put into this ‘queue’ before being

80

DATABASE MANAGEMENT

applied to the specified account by theAPPLY_TRANSACTIONSprocedure. Therefore, many transactions can be simultaneouslyinput without interference. */BEGININSERT INTO transactionsVALUES (acct, kind, amount, ‘Pending’, sysdate);COMMIT;END enter_transaction;END bank_transactions;Packages allow the database administrator or applicationdeveloper to organize similar routines. They also offer increasedfunctionality and database performance.

Benefits of PackagesPackages are used to define related procedures, variables, andcursors and are often implemented to provide advantages in thefollowing areas:• encapsulation of related procedures and variables• declaration of public and private procedures, variables,

constants, and cursors• better performance

EncapsulationStored packages allow you to encapsulate (group) related storedprocedures, variables, datatypes, and so forth in a single named,stored unit in the database. This provides for better organiza-tion during the development process.Encapsulation of procedural constructs in a package also makesprivilege management easier. Granting the privilege to use apackage makes all constructs of the package accessible to thegrantee.

Public and Private Data and ProceduresThe methods of package definition allow you to specify whichvariables, cursors, and procedures are

public Directly accessible to the user of a package. private Hidden from the user of a package.

For example, a package might contain ten procedures. You candefine the package so that only three procedures are public andtherefore available for execution by a user of the package; theremainder of the procedures are private and can only be accessedby the procedures within the package.

Performance ImprovementAn entire package is loaded into memory when a procedurewithin the package is called for the first time. This load iscompleted in one operation, as opposed to the separate loadsrequired for standalone procedures. Therefore, when calls torelated packaged procedures occur, no disk I/O is necessary toexecute the compiled code already in memory.A package body can be replaced and recompiled withoutaffecting the specification. As a result, objects that reference apackage’s constructs (always via the specification) need not berecompiled unless the package specification is also replaced. Byusing packages, unnecessary recompilations can be minimized,resulting in less impact on overall database performance.

How Oracle Stores Procedures and PackagesWhen you create a procedure or package, Oracle• compiles the procedure or package• stores the compiled code in memory• stores the procedure or package in the database

Points to Ponder

• PL/SQL enables you to mix SQL statements withprocedural constructs.

• Procedures and functions are schema objects that logicallygroup a set of SQL and other PL/SQL programminglanguage statements together to perform a specific task.

• A package is a group of related procedures and functions

Review Terms

• Difference between SQL & PL/SQL• Stored procedures• Packages• Stored procedures• Functions

Students Activity

1. Differentiate between SQL & PL-SQL?

2. Define stored procedures & functions?

3. Define packages?

81

DATABASE MANAGEMENT

Student Notes

82

DATABASE MANAGEMENT

LESSON 27:

LAB

83

DATABASE MANAGEMENT

LESSON 28:

LAB

84

DATABASE MANAGEMENT

Lesson Objectives

• Database triggers• Parts of triggers• Types of triggers• Executing triggerDatabase triggers are procedures that are stored in the databaseand are implicitly executed (fired) when the contents of a tableare changed.

IntroductionOracle allows the user to define procedures that are implicitlyexecuted (i.e.” executed by, Oracle itself), when an insert, updateor delete is issued against a table from SQL * Plus or throughan application. These procedures are called database triggers.The major point that make these triggers stand alone is thatthey are fired implicitly (i.e. internally) by Oracle itself and notexplicitly called by the user, as done in normal procedures. Atrigger can include SQL and PL/SQL statements to execute as aunit and can invoke stored procedures. However, proceduresand triggers differ in the way that they are invoked. A procedureis explicitly executed by a user, application, or trigger. Triggers(one or more) are implicitly fired (executed) by Oracle when atriggering INSERT, UPDATE, or DELETE statement isissued, no matter which user is connected or which applicationis being used.

Use Of Database TriggersTriggers can supplement the standard capabilities of Oracle toprovide a highly customized database management system. Forexample, a trigger can restrict DML operations against a table tothose issued during regular business hours. A trigger could alsorestrict DML operations to occur only at certain times duringweekdays. Other uses for triggers are to• automatically generate derived column values• prevent invalid transactions• enforce complex security authorizations• enforce referential integrity across nodes in a distributed

database• enforce complex business rules• provide transparent event logging• provide sophisticated auditing• maintain synchronous table replicates• gather statistics on table access

Parts of a TriggerA trigger has three basic parts:• a triggering event or statement• a trigger restriction• a trigger action

Triggering Event or StatementA triggering event or statement is the SQL statement thatcauses a trigger to be fired. A triggering event can be an Insert,Update, or Delete statement on a table.. . . Update of parts_on_hand on inventory . . .which means: when the Parts_On_Hand column of a row inthe INventory table is updated, fire the trigger. Note that whenthe triggering event is an Update statement, you can include acolumn list to identify which columns must be updated to firethe trigger. You cannot specify a column list for Insert andDelete statements, because they affect entire rows of informa-tion.A triggering event can specify multiple DML statements, as in. . . Insert or Update or Delete of inventory . . .which means: when an Insert, Update, or Delete statement isissued against the Inventory table, fire the trigger. Whenmultiple types of DML statements can fire a trigger, you can useconditional predicates to detect the type of triggering statement.In this way, you can create a single trigger that executes differentcode based on the type of statement that fires the trigger.

Trigger RestrictionA trigger restriction specifies a Boolean (logical) expression thatmust be True for the trigger to fire. The trigger action is notexecuted if the trigger restriction evaluates to FALSE orUnknown. In the example, the trigger restriction isnew.parts_on_hand < new.reorder_point

Trigger ActionA trigger action is the procedure (PL/SQL block) that containsthe SQL statements and PL/SQL code to be executed when atriggering statement is issued and the trigger restrictionevaluates to TRUE.Like stored procedures, a trigger action can contain SQL andPL/SQL statements, define PL/SQL language constructs(variables, constants, cursors, exceptions, and so on), and callstored procedures. Additionally, for row triggers (described inthe next section), the statements in a trigger action have accessto column values (new and old) of the current row beingprocessed by the trigger. Two correlation names provide accessto the old and new values for each column.

Types of TriggersThis section describes the different types of triggers: row andstatement triggers; and Before, After, and Instead-of triggers.

Row vs. Statement TriggersWhen you define a trigger, you can specify the number of timesthe trigger action is to be executed: once for every row affectedby the triggering statement (such as might be fired by anUPDATE statement that updates many rows), or once for thetriggering statement, no matter how many rows it affects.

LESSON 29:

DATABASE TRIGGERS

85

DATABASE MANAGEMENT

Row TriggersA row trigger is fired each time the table is affected by thetriggering statement. For example, if an UPDATE statementupdates multiple rows of a table, a row trigger is fired once foreach row affected by the UPDATE statement. If a triggeringstatement affects no rows, a row trigger is not executed at all.Row triggers are useful if the code in the trigger action dependson data provided by the triggering statement or rows that areaffected.

Statement TriggersA statement trigger is fired once on behalf of the triggeringstatement, regardless of the number of rows in the table thatthe triggering statement affects (even if no rows are affected).For example, if a DELETE statement deletes several rowsfrom a table, a statement-level DELETE trigger is fired onlyonce, regardless of how many rows are deleted from the table.Statement triggers are useful if the code in the trigger actiondoes not depend on the data provided by the triggeringstatement or the rows affected. For example, if a trigger makes acomplex security check on the current time or user, or if a triggergenerates a single audit record based on the type of triggeringstatement, a statement trigger is used.

Before vs. After TriggersWhen defining a trigger, you can specify the trigger timing-whether the trigger action is to be executed before or after thetriggering statement. Before and After apply to both statementand row triggers

Before TriggersBefore triggers execute the trigger action before the triggeringstatement is executed. This type of trigger is commonly used inthe following situations:• When the trigger action should determine whether the

triggering statement should be allowed to complete. Usinga BEFORE trigger for this purpose, you can eliminateunnecessary processing of the triggering statement and itseventual rollback in cases where an exception is raised in thetrigger action.

• To derive specific column values before completing atriggering INSERT or UPDATE statement.

After TriggersAFTER triggers execute the trigger action after the triggeringstatement is executed. AFTER triggers are used in the followingsituations:• When you want the triggering statement to complete before

executing the trigger action.• If a BEFORE trigger is already present, an AFTER trigger

can perform different actions on the same triggeringstatement.

CombinationsUsing the options listed above, you can create four types oftriggers:••••• BEFORE statement trigger

Before executing the triggering statement, the trigger actionis executed.

••••• Before row triggerBefore modifying each row affected by the triggeringstatement and before checking appropriate integrityconstraints, the trigger action is executed provided that thetrigger restriction was not violated.

••••• After statement triggerAfter executing the triggering statement and applying anydeferred integrity constraints, the trigger action is executed.

••••• After row triggerAfter modifying each row affected by the triggeringstatement and possibly applying appropriate integrityconstraints, the trigger action is executed for the current rowprovided the trigger restriction was not violated. UnlikeBefore row triggers, After row triggers lock rows.

You can have multiple triggers of the same type for the samestatement for any given table. For example you may have twoBefore statement triggers for Update statements on the EMPtable. Multiple triggers of the same type permit modularinstallation of applications that have triggers on the sametables. Also, Oracle snapshot logs use After row triggers, so youcan design your own After row trigger in addition to the Oracle-defined After row trigger.You can create as many triggers of the preceding different typesas you need for each type of DML statement (Insert, Update, orDelete). For example, suppose you have a table, SAL, and youwant to know when the table is being accessed and the types ofqueries being issued. The example below contains a samplepackage and trigger that tracks this information by hour andtype of action (for example, Update, Delete, or Insert) on tableSAL. A global session variable, Stat.Rowcnt, is initialized tozero by a Before statement trigger. Then it is increased each timethe row trigger is executed. Finally the statistical information issaved in the table Stat_Tab by the After statement trigger.

Sample Package and Trigger for SAL Table

Drop Table stat_tab;Create Table stat_tab(utype CHAR(8),rowcnt Integer, uhour Integer);Create or Replace Package stat isrowcnt Integer;END;/Create Trigger bt Before Update or Delete or Insert on salBeginstat.rowcnt := 0;END;/Create Trigger rt Before Update or Delete Oor Insert on salfor Each Row Beginstat.rowcnt := stat.rowcnt + 1;END;/Create Trigger at After Update or Delete or Insert on salDeclaretyp CHAR(8);hour Number;BeginIF updating

86

DATABASE MANAGEMENT

THEN typ := ‘update’; END IF;IF deleting THEN typ := ‘delete’; END IF;IF inserting THEN typ := ‘insert’; END IF;hour := TRUNC((SYSDATE - TRUNC(SYSDATE)) * 24);UPDATE stat_tabSET rowcnt = rowcnt + stat.rowcntWHERE utype = typAND uhour = hour;IF SQL%ROWCOUNT = 0 THENINSERT INTO stat_tab VALUES (typ, stat.rowcnt, hour);END IF;EXCEPTIONWHEN dup_val_on_index THENUPDATE stat_tabSET rowcnt = rowcnt + stat.rowcntWHERE utype = typAND uhour = hour;END;/

Instead of TriggersInstead of triggers provide a transparent way of modifyingviews that cannot be modified directly through SQL DMLstatements (Insert, Update, and Delete). These triggers are calledInstead of triggers because, unlike other types of triggers,Oracle fires the trigger instead of executing the triggeringstatement. The trigger performs update, insert, or deleteoperations directly on the underlying tables.Users write normal INSERT, DELETE, and UPDATEstatements against the view and the INSTEAD OF triggerworks invisibly in the background to make the right actions takeplace. By default, INSTEAD OF triggers are activated for eachrow.

Example of an Instead of Trigger

The following example shows an Instead of trigger forinserting rows into the Manager_Info view.Create View manager_info ASSELECT e.name, e.empno, d.dept_type, d.deptno, p.level,p.projnoFROM emp e, dept d, project pWHERE e.empno = d.mgr_noAND d.deptno = p.resp_dept;Create Trigger manager_info_insertInstead of Insert on manager_infoEferencing New As n — new manager informationFor Each RowBeginIf Not Exists Select * From empWHERE emp.empno = :n.empnoTHENINSERT INTO empVALUES(:n.empno, :n.name);ELSEUPDATE emp SET emp.name = :n.nameWHERE emp.empno = :n.empno;END IF;If Not Exists Select * From deptWhere dept.deptno = :n.deptno

ThenInsert Into deptValues(:n.deptno, :n.dept_type);ElseUpdate dept SET dept.dept_type = :n.dept_typeWhere dept.deptno = :n.deptno;End If;If Not Exists Select * From projectWhere project.projno = :n.projnoThenInsert Into projectValues(:n.projno, :n.project_level);ELSEUpdate project SET project.level = :n.levelWhere project.projno = :n.projno;End If;END;The actions shown for rows being inserted into theManager_Info view first test to see if appropriate rows alreadyexist in the base tables from which Manager_Info is derived.The actions then insert new rows or update existing rows, asappropriate. Similar triggers can specify appropriate actions forUPDATE and Delete.

Trigger ExecutionA trigger can be in either of two distinct modes:

enabled An enabled trigger executes its trigger action if a triggering statement is issued and the trigger restriction (if any) evaluates to TRUE.

disabled A disabled trigger does not execute its trigger action, even if a triggering statement is issued and the trigger restriction (if any) would evaluate to TRUE.

For enabled triggers, Oracle automatically• executes triggers of each type in a planned firing sequence

when more than one trigger is fired by a single SQLstatement

• performs integrity constraint checking at a set point in timewith respect to the different types of triggers and guaranteesthat triggers cannot compromise integrity constraints

• provides read-consistent views for queries and constraints• manages the dependencies among triggers and objects

referenced in the code of the trigger action• uses two-phase commit if a trigger updates remote tables

in a distributed databasefires multiple triggers in an unspecified order, if more than onetrigger of the same type exists for a given statement

Points to Ponder

• Allowing the user to define procedures that are implicitlyexecuted is known as triggers

• Triggers can supplement the standard capabilities of Oracleto provide a highly customized database managementsystem

• A trigger restriction specifies a Boolean (logical) expressionthat must be TRUE for the trigger to fire.

87

DATABASE MANAGEMENT

••••• A trigger has three basic parts:1. a triggering event or statement2. a trigger restriction3. a trigger action

Review Terms

• Database triggers• Parts of triggers• Types of triggers• Executing trigger

Students Activity

1. Explain the use of database triggers?Explain its need?

2. What are the various parts of database trigger?

3. Differentiate between before & after trigger?

4. Differentiate between row & after trigger?

5. Define INSTEAD OF Trigger with the help of an example?

6. Define various restriction of trigger?

88

DATABASE MANAGEMENT

Student Notes

89

DATABASE MANAGEMENT

LESSON 30:

LAB

90

DATABASE MANAGEMENT

LESSON 31:

LAB

91

DATABASE MANAGEMENT

Lesson Objectives

• Defining cursor• Use of cursor• Explicit cursor• Implicit cursor• Parametrized cursorWhen ever an SQL statement is executed, Oracle DBA performsthe following tasks:-• Reserves an area in memory called private SQL Area.• Populates this area with the appropriate data• Processes the data in memory area.• Frees the memory area when the execution is complete.

What is Cursor ?Oracle DBA uses a work area for its internal processing. Thiswork area is private to SQL’s operation and is called a cursor.The data that is stored is called Active Data Set. The size of thecursor in memory is the size required to hold the number ofrows in the Active Data set.

ExampleWhen a user first select statement asSelect empno,job,salary from Employeewhere dept_no=20The resultant data set is as follows:-

Active Data Set

3456 IVAN MANAGER 10000 3459 PRADEEP - ANALYST 7000 3446 MITA PROGRMR 4000 3463 VIJAY CLERK 2000 3450 ALDRIN ACCTANT 3000

Contents -of a cursor.When a query returns multiple rows, in addition to the dataheld in, the cursor, Oracle will also open and maintain a rowpointer. Depending on user requests to view data the rowpointer will be relocated within the cursor’s Active Data Set.Additionally Oracle also maintains cursor variables loaded withthe value of the total no. of rows fetched from the active dataset.

Use of Cursors in PL/SQLWhile SQL is the natural language of the DBA, it does not haveany procedural capabilities such as’ condition checking, loopingand branching. For this, Oracle provides the PL/SQL. Program-mers can use it to create programs for validation andmanipulation of table data. PL/SQIJ adds to the power ofSQL and provides the user with all the functionality of aprogramming environment.A PL/SQL block of code includes the Procedural code forlooping and branching along with the SQL statement. If

LESSON 32:

DATABASE CURSORS

records from a record, set created using a select statement are tobe evaluated & processed one at a time, then the only methodavailable is by using Explicit Cursors.Also cursors can be used to evaluate the success of updates anddeletes and the number of this affected (Implicit ).

Explicit CursorYou can explicitly declare a cursor to process the rows individu-ally. A cursor declared by the user is called Explicit Cursor. Forqueries that return more than one row, you must declare acursor explicitly.

Why Use an Explicit Cursor?Cursors can be’ used when the users wants to process data onerow at a time.

Example

Update an Acctmast table und set a value in its Balance amountcolumn depending upon whetherthe account has an amount debited or credited in the Accttrantable. The records from the Accttran table will be fetched one ata time and. updated in the Acctmast table depending uponwhether the account is debited or credited.PL/SQL raises an error if an embedded select statementretrieves more than one row. Such an error forces an abnormaltermination of the PL/SQL block. Such an error can beeliminated by using a cursor.

Explicit Cursor Management

• The steps involved in declaring a cursor and manipulatingdata in the active set are : Declare a cursor that specifies theSQL select statement that you want to process

• Open a cursor.• Fetch data from the cursor one row at a time.• Close the cursor.A cursor is defined in the declarative part of a PL/SQL block bynaming it and specifying a query. Then, three commands areused to control the cursor: open, fetch and close.First, initialize the cursor with the open statement, this ;defines a private SQL area executes a query associated with ‘“thecursor populates the Active Data Set.Sets the Active Data Set1s row pointer to ‘the first record.The fetch statement retrieves the current row and advances thecursor to the next row. You can execute fetch repeatedly until allrows have been retrieved.When the last row has been processed, close the cursor with theclose statement. This will release the memory occupied by thecursor and its Data Set.Focus: The HRD manager has decided to raise the salary for allthe employees in department No. 20 by 0.05. Whenever anysuch raise is given to the. employees, a record for the same is

92

DATABASE MANAGEMENT

maintained in the emp_ raise table. It includes the employeenumber, the date when the raise was given and the actual raise.Write a PL/SQL block to update the salary of each employeeand insert a record in the emp _raise table;-The table definition is as follows:Table name: employee

Emp _code and raise_date together form a composite primarykey.

Declaring a CursorTo do the above via a PL/SQL block it is necessary to declare acursor and associate it with a query before referencing it in anystatement within the PL/SQL block. This is because forwardreferences to object are not allowed in PL/SQL.Syntax : CURSOR cursorname IS

SQL statement;Example: DECLARE

/* Declaration of the cursor named c - empThe active data set wi# include the names, departmentnumbers and salaries of all the employees belongingto department 20 */cursor c - emp isselect emp _code, salary from employeewhere deptno = 20;

The cursor name is not a PL/SQL variable;. it is used only toreference the query. It cannot be assigned any values or be usedin an expression.

Opening a CursorOpening the cursor executes the query and identifies the activeset, that contains all the rows which meet the query searchcriteria.SYntax : OPEN cursorname ;Example: DECLARE

cursor c - emp is select emp _code, salary fromemployeewhere deptno = 20;BEGIN/* Opening cursor c - emp */open c_emp;END;

Open statements retrieves the records frO1n the database andplaces it in the cursor (private SQL area).

Column name Data Type Size Attributes emp _code varchar 10 Primary Key, via which we shall seek data in the table. Ename varchar 20 The first name of the candidate. Depmo number 5 The department number. Job varchar 20 Employee job details. Sal number 8,2 The current salary of the employee. - emp _code varchar 10 is the part of a composite key via which we shall seek data in the table. raise date date The date on which the raise was given raise amt number 8,2 The raise given to the employee.

Fetching a Record from the CursorThe fetch statement retrieves the rows from the active set to thevariables one at a time. Each time a fetch is executed, The focusof the DBA cursor advances to the next row in the Active Set.One can make use of any loop structure (Loop-End Loopalong with While, For, IF-End If ’) to fetch the records fromthe cursor into variables one row at a time.Syntax : FETCH cursorname INTO variable1, variable2, ... ,For each column value returned by the query associated cursor,there must be a corresponding variable in the into list. And,their datatypes must match. These variables will be declared inthe DECLARE section of the PL/SQL block.Example : DECLARE

cursor c - emp is select emp _code, salary fromemployeewhere deptno = 20;/* Declaration of memory variable that holds datafetched from the cursor */str - emp _code employee.emp - code%type;num _salary employee.salary%type;BEGINopen c_emp;/* infinite loop to fetch data from cursor c - emp onerow at a time */loopfetch c - emp into str - emp _code, num _salary;/* Updating the salary in the employee table ascurrent salary + raise */update employee set salary = num_salary +(num_salary * .05)where emp _code = str - emp _code;/*insert a record in the emp _raise table */ insert intoemp _raise values(str_emp_code, sysdate, num_salary * 0.05) end loop;commit;END;

Note that the current program will result into an indefinite loopas there is no exit provided from the loop. The Exit from theloop can be provided using cursor variables as explained onpage 101 section Explicit Cursor Variables. Also note if youexecute a fetch and there are no more rows left in the active dataset, the values of the Explicit cursor ,variables are indetermi-nate.

Closing a CursorThe close statement disables the cursor and the active setbecomes undefined. This will release the memory occupied bythe cursor and its Data Set. Once a cursor is closed, the user canreopen the cursor using the open statement.Syntax : CLOSE cursorname ;

93

DATABASE MANAGEMENT

Example : DECLAREursor c - emp is select _mp _code, salary fromemployeewhere deptno = 20;str- emp _code employee.emp - code%type;num _salary employee.salary%type;BEGINopen c_emp;loopfetch c - emp into str - emp _code, num_salary;update employee set salary = num_salary +(num_salary * .05)where emp _code = str - emp _code;insert into emp _raise values(str - emp code, sysdate, num_salary * 0.05)end loop;commit;/* Close cursor c - emp */close c_emp;END;

Explicit Cursor AttributesOracle provides certain attributes / cursor variables to controlthe execution of the cursor. Whenever any cursor (explicit orimplicit) is opened and used Oracle creates a set of four system.variables via which Oracle keeps track of the ‘Current status ofthe cursor. You can access these cursor variables. They aredescribed below.• %Notfound : evaluates to t11,le, if the last fetch has failed

because no more rows were available; or to false, if the lastfetch returned a row.Svntax : cursorname%NOTFOUNDExample: DECLAREcursor c - emp is select emp _code, salary from employeewhere deptno = 20;str - emp _code employee.emp -=.code%type;num_salary employee. salary%type;BEGINopen c_emp;loopfetch c_emp into str_emp_code, num_salary;/ * If no. of records retrieved is 0 or if all the records arefetched then exit the loop. */exit when c_emp%notfound ;update employee set salary = num_salary + (num_salary *.05)where -emp _code = str - emp _code;insert into emp _raise values(str_emp_code, sysdate, num_salary * O_05)end loop;

commit;close c - emp ;END;

• %FOUND: is the logical opposite of %notfound. Itevaluates to true, if the last fetch succeeded because a rowwas available; or to false, if the last fetch failed because nomore rows were available.Syntax : cursorname%FoundExample: The PL/SQL block will be as follows:Declarecursor c - emp is select emp _code, salary from employeewhere deptno = 20;str - emp _code employee.emp - code%type;num_salary employee.salary_type;Beginopen c_emp;loopfetch c - emp into str - emp _code, num _salary;* If no. of record” retrieved:> 0 thenprocess the dataelse exit the loop. */if c_emp%)found thenupdate employee set salary = num_salary + (num_salary *.05)”where emp _code =str - emp _code;insel1 into emp _raise values(str_emp_code, sysdate, num_salary * 0.05)elseexit;end if;end loop;commit;close c - emp ;END;

• %ISOPEN : evaluates to true, if an explicit cursor is open;or to false, if it is closed.Syntax : Cursorname%ISOPENExample : Declarecursor c - emp is select emp _code, salary from employeewhere deptno = 20;str - emp _code employee.emp - code%type;num _salary employee. salary%type;Beginopen c - emp;/* If the cursor is opencontinue with the data processingelsedisplay an appropriate error message */

94

DATABASE MANAGEMENT

if c_emp%isopen thenloopfetch c - emp into str - emp _code, num _salary;exit when c - emp%notfound ;update employee set salary = num_salary + (num_salary *.05}where emp _code = str - emp _code;insert into emp _raise values(str - emp _code, sysdate, num _salary * 0.05)end loop;commit;close c - emp ;elsedbms_output. put_line (‘Unable to open Cursor’);end if ;END;%Rowcount : returns the number of rows fetched from theactive set. It is set to zero when the cursor is opened.Syntax : cursomaine%RowcountExample: Display the names, department name and salaryof the first 10 employees getting the highest salary.Declarecursor c - emp isselect ename, deptno, salary from employee, deptmasterwhere deptmaster.deptno = employee.deptnoorder by salary desc ;str - ename employee.ename%type ;num - deptno employee.deptno%type ;num_salary employee.salary%type;Beginopen c_emp;dbms_output.Put_line (‘Name Department Salary’);dbms - output. Put _line (‘—— ———— ——-‘);-loopfetch c - emp into str - ename, num - deptno, num _salary;dbms -output.Put_line (str - ename II ‘ , IInum - deptno II ‘exit when c_emp%rowcount = 10 ;end loop; ‘.‘II num_salary);END;

Cursor for Loops

In most situations that require an explicit cursor, you cansimplify coding by using a cursor for loop instead of the open,fetch and close statements.A cursor for loop implicitly declares its loop index as a%rowtype record, opens a cursor, repeatedly fetches rows ofvalues from the active set into items in the record, and closes thecursor when all rows have been processed.

Example : ‘The PL/SQL block will be rewritten as follows:Declarecursor c - emp is select emp _code, salary from employeewhere deptno = 20;Beginfor emp_rec in c_emploopupdate employeeset salary = emp_rec.salary + (emp_rec.salary * .05)where emp _code = emp _rec.emp _code;insert into emp _raise values(emp_rec.emp_code, sysdate, emp_rec.salary *.05)end loop;commit;End;

When you use a cursor for loop, the cursor for loop• implicitly declares emp _rec as belonging to type c _

emp%rowtype and retrieves the records as declared in thecursor c - emp.

• The sequence of statements inside the loop is executed oncefor every row that satisfies the query associated with thecursor. with each iteration, a record will be fetched from c_emp into emp _rec. Dot notation should be used to makea reference to individual items.Example: emp yec.emp _codewhere emp _rec is a row type variable and emp _code is thename of the field.

• When you leave the loop, the cursor is closed automatically.This is true even if you use an exit or goto statement toleave the loop prematurely, or if an exception is raised insidethe loop. Thus, When you exit the loop it closes the cursor c- emp.

Note: The record is defined only inside the loop. You cannotrefer to it’s item outside the loop. The reference in the followingexample is illegal:

Beginfor c1rec in c1 loop………end loop;result:= clrec.n2 + 3; /* referencing c 1rec outside for loop isillegal */End;

Focus: A bank has an ‘ACCTMAST’ table where it holds thecurrent status of a ,client’s bank account ( i.e. currently what theclient has in the savings bank account.)Another table called the ACCTTRAN table holds each transac-tion as it occurs at the back. i.e. Deposits / Withdrawals ofclients. A client can deposit money which must be then‘ADDED’ to the amount held against that specific clients namein the ACCTMAST table. This is referred to as a ‘CREDIT’ typetransaction.

95

DATABASE MANAGEMENT

A client may withdraw money from his account. This must be‘SUBTRACTED’ from the amount held against that specificclient’s name in the ACCTMAST table. This is referred to as a‘DEBIT type transaction.The ACCTTRAN table must therefore hold a ‘flag’ thatindicates whether the transaction type was ‘CREDIT or‘DEBIT’.Based on this flag define a cursor which will update theACCTMAST ‘Balance’ field contents.Write a PL/SQL block that updates the acctmast table and setsthe balance depending upon whether the account is debited ‘OTcredited. The updation should be done only for those recordsthat are not processed i.e. the processed flag is ‘N’ in the accttrantable.1. Create the following tablesTable name: acctmast

Column name Data Type Size Attributes

Acctno varchar2 4 'Primary Key Name varchar2 20 Account Name

Balance Number 8 The balance in the account.

Table name : Accttran


Acctno varchar2 4 is the foreign key field which references table acttmast. TrnDate date The Transaction Date: deb crd char 1 The Dr / Cr Flag. Amount number 7,2 Processed char 1 A flag indicating whether the record is

processed or not

The following PL/SQL code updates the acctmast tabledepending upon the daily transactions entered in the accttrantable.Declare

cursor acc-updt isselect acctno, deb - crd, amount from accttranwhere processed = ‘N’;acctnum char(4);db_cd char( 1);amt number(7,2);

Beginopen acc - updt ;/* perform the updation for all the records retrieved by thecursor */loopfetch acc - updt into acctnum, db - cd,amt;/* if the account is debited then update theacctmast table as balance = balance - amt */if db_cd = ‘d’ thenupdate acctmastset balance = (balance - amt)

where acctno = acctnum;else/* if the account is credited then update theacctmast table as balance = balance + amt */ updateacctmastset balance = (balance + amt)where acctno = acctnum ;end if;update accttran set processed = ‘Y’where acctno = acctnum;exit when acc - updt%notfound ;

end loop;close acc - updt;commit;End;

Implicit CursorOracle implicitly opens a cursor to process each SQL statementnot associated with an explicitly declared cursor. PL/SQL letsyou refer to the most recent implicit cursor as the SQL cursor.So, although you cannot use the open, fetch, and close state-ments to control an implicit cursor, you can still use cursorattributes to access “information about 1:\1e most recentlyexecuted SQL statement.

Implicit Cursor AttributesThe SQL cursor has four attributes as described below. Whenappended to the cursor name (i.e. SQL), these attributes let youaccess information about the execution of insert, update, deleteand single-row select statements. Implicit cursor attributesreturn the boolean null value, until they are set by a cursoroperation.The values of the cursor attributes always refer to the mostrecently executed SQL statement, wherever the statementappears. It might be in a different scope (in a sub-block). So, ifyou want to save an attribute value for later use, assign it aboolean variable immediately.• %Notfound : evaluates to true, if an insert, update or delete

affected no rows, or a single -row select returns no rows.Otherwise, it evaluates to false.Syntax : QL %NotfoundExample : The HRD manager has decided to raise the salaryof employees by 0.15. Write a PL/SQL block to accept theemployee number and update the salary of that employee.Display appropriate message based on the existence of therecord in the employee table.Beginupdate employee set salary = salary * 0.15where emp _code = &emp _code;if sq1%Botfound- thendbms_output.put_line(‘Employee No. Does not Exist’);elsedbms_- output.put_line(‘Employee Record ModifiedSuccessfully’);

96

DATABASE MANAGEMENT

end if;END;

• %FOUND : is the logical opposite of %notfound. Note,however that both attributes evaluate to null until they areset by an implicit or explicit cursor operation. %foundevaluates to’ true, if an insert, update or delete affected oneor more rows, or a single-row select returned one or moretows. Otherwise, it evaluates to false.Syntax: SQL%NotfoundExample: The example in sq1%notfound will be written asfollows:Beginupdate employee set salaty = salaty * 0.15where emp _code = &emp _code;if sq1%found thendbms - output.put _line(‘Employee RecordModified Successfully’);elsedbms_output.put_line(‘Employee No. Does not Exist’);end if;END;

• %Rowcount : returns the number of rows affected by aninsert, update or delete, or select into statement.Example: The HRD manager has decided to raise the salaryof employees working as ‘Programmers’ by 0.15. Write aPL/SQL block to accept the employee number and updatethe salary of that employee. Display appropriate messagebased on the existence of the record in the employee table.Declarerows_affected char( 4) ;Beginupdate employee set salary = salary * 0.15where job = ‘Programmers’ ;rows_affected := to_char(sq1%rowcount)if sq1%rowcount > 0 thendbms - output.put_line(rows _affected II ‘EmployeeRecordsModified Successfully’) ;elsedbms_output.put_line(‘There are no Employeesworking as Programmers’);end if;END;

• %ISOPEN : Oracle automatically closes the SQL cursor afterexecuting its associated SQL statement. As a result,sq1%isopen always evaluates to false.

Parameterized CursorsSo far we have used cursors querying all the records from a table.Sometimes records are brought in memory selectively. Whiledeclaring a cursor? the select statement must include a whereclause to retrieve data conditionally.

We should be able to pass a value to cursor only when it isbeing opened. For $at the cursor must be declared in such a waythat it recognizes that it will receive the requested value( s) at thetime of opening the cursor. Such a cursor is known as Param-eterized Cursor.Syntax : CURSOR cursor_name (variable_name datatype) is

select statement...The scope of cursor parameters are local to that cursor, whichmetros that they can be referenced only within the query declaredin the cursor declaration. The values of cursor parameters areused by the associated query when the cursor is opened.For example.cursor c_emp (num_deptno number) isselect job, ename from emp where deptno > num - deptn;The parameters to a cursor can be passed in the open statement.They can either be constant values or the contents of a memoryvariable.For exampleOPEN c_emp (30)OPEN c_emp (num_deptno)Note: The memory variable should be declared in the DE-CLARE section and the value should be assigned to thatmemory variable.Each parameter in the declaration must have a correspondingvalue in the open statement. Remember that the parameters ofa cursor cannot return values.Example :Allow insert, update and delete for the table itemmast on thebases of the table itemtran table..1. Create the following tablesTable name: itemmast


itemid Number 4 Primary Key description Varchar 20 The item description

Bal stock Number 3 The balance stock for an item

Table name: ItemTran

Column name Data Type Size Attributes Itemid Number 4 Foreign key via which we shall seek data in the table. Description Varchar 30 Item description Operation Char 1 The kind of operation on Item Mast table i.e. Insert Update Delete(I U, D) Qty Number 3 The Qty sold Status Varchar 30 The status of the Operation.

Based on the value in the operation column of table itemtranthe records for table itemmast is either inserted updated ordeleted. On the basis of success/failure of insert, update anddelete operation the status column in the table itemtran isupdated with appropriate text indicating success or reason forfailure. Following are the 3 cases which are to be taken care of.1. If Operation = ‘ I ‘ then the itemid against along with

description and qty is inserted into the required columns of

97

DATABASE MANAGEMENT

the table itemmast. If insert is successful then the statusfield of itemtran table is updated to’ ‘ Successful ‘- else it isupdated as ‘Item Already Exist’.

2. If Operation = ‘ U ‘ then the qty against this operationcolumn is added to bal:...:.stock column of the tableitemmast where itemid of table itemmast is same as that ofitemtran. If update is successful then the status column ofitemtran table is updated to ‘ Successful’ else it is updated as‘ Item Does Not Exist’.

3. If Operation”=’ D ‘ , then a Tow from itemmast is deletedwhose itemid is equal to the itemid in the table itemtranwith the operation column having the value’ D ‘. If delete issuccessful then the status column of itemtran table isupdated to ‘Successful’ else it is updated as ‘Item Does NotExist’.

The following PL/SQL code takes care of the above three cases.Declare/* Cursor scantable retrieves all the records of table itemtran */cursor scantable isselect itemchk operation, qty, description from itemtran;/* Cursor Itemchk accepts the value of item id from the currentraw of cursor

scantable */cursor itemchk(mastitemid number) isselect itemid from item_mastwhere itemid = mastitemid;

/* variable_ that hold data from the cursor scantable */itemidno number( 4);de scrip varchar2(30);oper char(l);quantity number(3);/* variable that hold data from the cursor itemchk */dummyitem number( 4);Begin

/* open the scantable cursor */open scantable;loop’/ * Fetch the records from the scan table cursor */fetchscantable into itemidno, oper, quantity, ‘descrip;/* Open the itemchk cursorNote that the value of variable passed to the itemchk cursoris set to the value of item id in the current row of cursorscantable */open itemchk(itemidno);fetch itemchk into dummyitem;/* if the record is not found and the operation is insertthen insert the new record and set the status to ‘Successful’*/if itemchk%notfound thenif oper = ‘T’ then insert into item - mast(itemid, bal- stock,description)values(itemidno; quantity, descrip);

update itemtranset itemtran.status = ‘SUCCESSFUL’where itemid = itemidno;/* if the record is notfound and the operation is update/delete then set the status to ‘Item Not Present’ */elsif oper = ‘U’ or oper = ‘D’ thenupdate itemtranset itemtran.status = ‘Item Not Present’where itemid = itemidno;end if;else/* if the record is found and the operation is insert then setthe status to ‘Item Already exists’ *,/if oper = ‘I’ thenupdate itemtranset itemtran.status = ‘Item Already Exists’where itemid = itemidno;/* if the record is found and the operation is update/deletethen perform the update or delete operationset the status to ‘Successful’ */ .elsif oper = ‘D’ thendelete from item_mast where item - mast.itemid =itemidno;update itemtranset itemtran.status = ‘Successful’where itemid = itemidno;elsif oper = ‘U’ thenupdate item_mastset item_mast. bal- stock = quantitywhere itemid = itemidno;update itemtran -set itemtran.status = ‘Successful’where itemid = itemidno;end if;close itemchk;exit when scantable%notfound;

end loop;close scantable;commit;END;

Points to Ponder

• Oracle DBA uses a work area for its internal processing.• You can explicitly declare a cursor to process the rows

individually. A cursor declared by the user is called ExplicitCursor

• Opening the cursor executes the query and identifies theactive set, that contains all the rows which meet the querysearch criteria.

98

DATABASE MANAGEMENT

• The fetch statement retrieves the rows from the active set tothe variables one at a time

• The close statement disables the cursor and the active setbecomes undefined

• Oracle implicitly opens a cursor to process each SQLstatement not associated with an explicitly declared cursor.

Review Terms

• Defining cursor• Use of cursor• Explicit cursor• Implicit cursor• Parametrized cursor

Students Activity

1. Define cursors?Explain its usage?

2. Differentiate between explicit & implicit cursor with the helpof an example?

99

DATABASE MANAGEMENT

Student Notes

100

DATABASE MANAGEMENT

LESSON 33:

LAB

101

DATABASE MANAGEMENT

LESSON 34:

LAB

102

DATABASE MANAGEMENT

Lesson Objectives

• Normalisation• First Normal Form• Functional dependencies• Closure of a set of functional dependenciesWhen deciding upon the structure of data to be stored in afile(s), or a database, the two main issues to be considered are:1. removing data duplication from the files/database2. avoiding data confusion, where different versions of the

same information is located in different places in the samefile

Data Normalisation is the process of determining the correctstructure for data in files or databases so that the problemsmentioned cannot occurData is structured by following a series of steps. Each stepremoves the potential for a particular problem to occur in thedata, e.g. duplication, and each step builds upon the previoussteps

First Normal FormThe first of the normal forms that we study, first normal form,imposes a very basic requirement on relations.A domain is atomic if elements of the domain are consideredto be indivisible units. We say that a relational schema R is infirst normal form (lNF) if the domains of all attributes of Rare atomic.A set of names is an example of a non atomic value. Forexample, if the schema of a relation employee included anattribute children whose domain elements are sets of names,the schema would not be in first normal form.Composite attributes, such as an attribute address withcomponent attributes street and city, also have nonatomicdomains.Integers are assumed to be atomic, so the set of integers is anatomic domain; the set of all sets of integers is a nonatomicdomain. The distinction is that we do not normally considerintegers to have subparts, but we consider-sets of integers tohave subparts-namely, the integers making up the set. But theimportant issue is not what the domain itself is, but ratherhow we use domain elemeents in our database. The domain ofall integers would be nonatomic if we considered each integer tobe an ordered list of digits.As a practical illustration of the above point, consider anorganization that as-signs employees identification numbers ofthe following form: The first two letters specify the departmentand the remaining four digits are a unique number within thedepartment for the employee. Examples of such numberswould be CS0012 and EE1127. Such identification numbers canbe divided into smaller units, and are there-fore nonatomic. If a

relation schema had an attribute whose domain consists ofiden-tification numbers encoded as above, the schema wouldnot be in first normal form.First normal form (1NF) sets the very basic rules for anorganized database:• Eliminate duplicative columns from the same table.• Create separate tables for each group of related data and

identify each row with a unique column or set of columns(the primary key).

Functional DependenciesFunctional dependencies plays key role in designing gooddatabase. A functional dependency is a type of constraint that isa generalization of the notion of key.Functional dependencies are constraints. on the set of legalrelations. They. allow us to express facts about the enterprisethat we are modeling with our database.If we define the notion of a superkey as follows. Let R be arelation schema. A subset K of R is a superkey of R if, in anylegal relation r(R), for all pairs tl and t2 of tuples in r such thattl≠t2, then t1[K] ≠t2[K_] . That is, no two tuples in any legalrelation r(R) may have the same value on attribute set K.The notion of functional dependency generalizes the notion ofsuperkey. Consider a relation schema R, and let α ⊆ R and β⊆ R.The functional dependency

C → α holds on schema R if, in any legal relation r(R), for all pairs oftuples t1 and t2 in r such that t1(α) = t2 (α) it is also the casethat t1(β) = t2(β)Using the functional-dependency notation, we say that K is asuperkey of R if K→ R. That is, K is a superkey if, whenevert1[K) = t2[K], it is also the case that t1[R]= t2[R] (that is,t1=t2).Functional dependencies allow us to express constraints that wecannot express with superkeys. Consider the schemaLoan-info-schema = (loan-number, branch-name, customer-name, amount) which is simplification of the Lending-schemathat we saw earlier. The set of functional dependencies that weexpect to hold on this elation schema isloan-nurnber → amount loan-number → branch-nameWe would not, however, expect the functional dependencyloan-number → customer-nameto hold, since, in general, a given loan can be made to more thanone customer (for example, to both members of a husband-wife pair).

LESSON 35:

NORMALISATION - I

103

DATABASE MANAGEMENT

We shall use functional dependencies in two ways:1. To test relations to see whether they are legal under a given

set of functional dependencies. If a relation r is legal undera set F of functional dependencies, we say that r satisfies F.

2. To specify constraints on the set of legal relations. We shallthus concern our-selves with only those relations that satisfya given set of functional dependen-cies. If we wish toconstrain ourselves to relations on schema R that satisfy aset F of functional dependencies, we say that F holds on R.

To see which functional dependencies are satisfied. Observe thatA → C is satisfied. There are two tuples that have an A valueof a1. These tuples have the same C value-namely, C1. Similarly,the two tu-ples with an A value of a2 have the same C value,C2. There are no other pairs of distinct tuples that have thesame A value. The functional dependency C → A is notsatisfied, however. To see that it is not, consider the tuples t1 =(a2, b3, C2, d3) and

a1 B1 C1 d1 a1 B2 c1 d2 a2 B2 c2 d2

a2 B2 C2 d3 a3 B3 C2 d4

Sample relation r.t2= (a3, b3, C2, d4). These two tuples have the same C values,C2, but they have dif-ferent A values, a2 and a3, respectively.Thus, we have found a pair of tuples t1 and t2 such that t1(C)= t2 [C], but t1 [A] ≠ t2 [A].Many other functional dependencies are satisfied by r, including,for example, the functional dependency AB → D. Note that weuse AB as a shorthand for {A,B}, to conform with standardpractice. Observe that there is no pair of distinct tuples tl and t2such that tl [AB] = t2 [AB]. Therefore, if t1 [AB] = t2 [AB], itmust be that t1 = t2 and, thus, t1 [D] = t2[D]. So, r satisfiesAB→ D.Some functional dependencies are said to be trivial because theyare satisfied by all relations. For example, A → A is satisfied byall relations involving attribute A. Reading the definition offunctional dependency literally, we see that, for all tuples t1 andt2 such that t1(A] = t2 [A], it is the case that h[A] = t2 [A].Similarly, AB → A is satisfied by all relations involving attributeA. In general, a functional dependency of the form α → β istrivial if β ⊆ α.To distinguish between the concepts of a relation satisfying adependency and a dependency holding on a schema, we returnto the banking example. If we consider the customer relation(on Customer-schema) in Figure 7.3, we see that customer-street → customer-city is satisfied. However, we believe that, inthe real world, two cities

Customer-name Customer-street Customer-city

The customer relation

The loan relation.can have streets with the same name. Thus, it is possible, atsome time, to have an instance of the customer relation inwhich customer-street → customer-city is not satis-fied. So, wewould not include customer-street → customer-city in the setof functional dependencies that hold on Customer-schema.In the loan relation (on Loan-schema), we see that the depen-dency loan--number → amount is satisfied. In contrast to thecase of customer-city and customer-street in Customer-schema,we do believe that the real-world enterprise that we are model-ing requires each loan to have only one amount. Therefore, wewant to require that loan-number → amount be satisfied by theloan relation at all times. In other words, we require that theconstraint loan-number → amount hold on Loan-schema.In the branch relation, we see that branch-name → assets issatisfied, as is assets → branch-name. We want to require thatbranch-name → assets hold on Branch-schema. However, wedo not wish to require that assets → branch-name hold, since itis possible to have several branches that have the same assetvalue.In what follows, we assume that, when we design a relationaldatabase, we first list those functional dependencies that mustalways hold. In the banking example, our list of dependenciesincludes the following:

Jones Main Harrison Smith North Rye Hayes Main Harrison Curry North Rye

Lindsay Park Pittsfield Tufl1et Putnam Stamford

Williams Nassau Princeton Adams Spring Pittsfield

Johnson Alma Palo Alto Glenn Sand Hill Woodside Brooks Senator Brooklyn Green Walnut Stamford

L-17 Downtown 1000 L-23 Redwood 2000 L-15 Perryridge 1500 L-14 Downtown 1500 L-93 Mianus 500 L-11 Round Hill 900 L-29 Pownal 1200 L-16 North Town 1300 L-18 Downtown 2000 L-25 Perryridge 2500 L-10 Brighton 2200

Downtown Brooklyn 9000000 Redwood Palo Alto 2100000 Perryridge Horseneck 1700000

Mianus Horseneck 400000 Round Hill Horseneck 8000000

Pownal Bennington 300000 North Town Rye 3700000

Brighton Brooklyn 7100000

104

DATABASE MANAGEMENT

• On Branch-schema:branch-name → branch-citybranch-name → assets

• On Customer-schema:customer-name → customer-citycustomer-name → customer-street

• On Loan-schema:loan-number → amountloan-nll1nber → branch-name

• On Borrower-schema:No functional dependencies

• On Account-schema:account-number → branch-nameaccount-number → balance

• On Depositor-schema:No functional dependencies

Closure of a Set of FunctionalDependenciesIt is not sufficient to consider the given set of functionaldependencies. Rather, we need to consider all functionaldependencies that hold. We shall see that, given a set F offunctional dependencies, we can prove that certain otherfunctional dependencies hold. We say that such functionaldependencies are “logically implied” by F.More formally, given a relational schema R, a functionaldependency f on R is log-ically implied by a set of functionaldependencies F on R if every relation instance r(R) that satisfiesF also satisfies f.Suppose we are given a relation schema R = (A, B, C, G, H, I)and the set of functional dependencies

A → BA → C

CG → HCG → IB → H

The functional dependencyA → H

is logically implied. That is, we can show that, whenever ourgiven set of functional dependencies holds on a relation, A→Hmust also hold on the relation. Suppose that tl and t2 are tuplessuch that

t1(A] = t2[A]Since we are given that A → B, it follows from the definition offunctional dependency that

t1(B] = t2[B]Then, since we are given that B ® H, it follows from thedefinition of functional dependency that

t1(H] = t2[H]

Therefore, we have shown that, whenever tl and t2 are tuplessuch that tl [A] = t2 [A], it must be that tl [H] = t2[H]. Butthat is exactly the-definition of A → H.Let F be a set of functional dependencies. The closure of F,denoted by F+, is the set of all functional dependencies logicallyimplied by F. Given F, we can compute F+ directly from theformal definition of functional dependency. If F were large, thisprocess would be lengthy and difficult. Such a computation ofF+ requires argu-ments of the type just used to show thatA→H is in the closure of our example set of dependencies.Axioms, or rules of inference, provide a simpler technique forreasoning about functional dependencies. In the rules thatfollow, we use Greek letters (α , β, γ . . .) for sets of attributes,and uppercase Roman letters from the beginning of thealphabet for individual attributes. We use αβ to denote αUβ.We can use the following three rules to find logically impliedfunctional dependencies. By applying these rules repeatedly, wecan find all of F+ given F. This collection of rules is calledArmstrong’s axioms in honor of the person who first pro-posed it.••••• Reflexivity rule. If α is a set of attributes and β ⊆ α then

a→β holds.••••• Augmentation rule. If α → β holds and γ is a set of

attributes, then γα → γβ holds.••••• Transitivity rule. If α → β holds and β → γ holds, then

α→γ holds.Armstrong’s axioms are sound, because they do not generateany incorrect func-tional dependencies. They are complete,because, for a given set F of functional de-pendencies, theyallow us to generate all F+. The bibliographical notes provideref-erences for proofs of soundness and completeness.Although Armstrong’s axioms are complete, it is tiresome touse them directly for the computation of F+. To simplifymatters further, we list additional rules. It is pos-sible to useArmstrong’s axioms to prove that these rules are correct••••• Union rule. If α →β holds and α → γ holds, then α→βγ••••• Decomposition rule. If α → βγ holds, then α →β holds

and α → γ holds.••••• Pseudotransitivity rule. If α → β holds and γβ → δ

holds, then αγ → δ holds.Let us apply our rules to the example of schema R = (A, B, C,G, H, I) and the set F of functional dependencies {A → B, A →C, CG → H, CG → I, B → H}. We list several members of F+here:A → H. Since A → B and B → H hold, we apply the transitiv-ity rule. Observe that it was much easier to use Armstrong’saxioms to show that A → H holds than it was to argue directlyfrom the definitions, as we did earlier in this section.• CG → HI. Since CG → and CG → I, the union rule

implies that CG → HI.• AG → I. Since A → C and CG → I, the pseudotransitivity

rule implies that A G → I holds.Another way of finding that AG → I holds is as follows. Weuse the aug-mentation rule on A → C to infer AG → CG.

105

DATABASE MANAGEMENT

Applying the transitivity rule to this dependency and CG → I,we infer AG → I.

Points To Ponder

• Normalisation is the process of determining the correctstructure for data in files or databases

• A relational schema R is in first normal form (lNF) if thedomains of all attributes of R are atomic.

• Functional dependencies plays key role in designing gooddatabase. A functional dependency is a type of constraintthat is a generalization of the notion of key.

• The bad design of a database suggests that we shoulddecompose a relation schema that has many attributes intoseveral schemas with fewer attributes.

Review Terms

• Normalisation• First Normal Form• Functional dependencies• Closure of a set of functional dependencies

Students Activity

1. Define Normalisation?

2. How normalisation can improve the redundancy of data?

3. Define 1NF?

4. Define functional dependecy?

5. Define Closure of a set of functional dependency?

106

DATABASE MANAGEMENT

Student Notes

107

DATABASE MANAGEMENT

Lesson objectives

• Decomposition• Properties of decomposition• Lossless join decomposition• 2NF

DecompositionThe bad design of a database suggests that we should decom-pose a relation schema that has many attributes into severalschemas with fewer attributes. Careless decom-position,however, may lead to another form of bad design.Consider an alternative design in which we decompose Lend-ing-schema into the following two schemas:Branch-customer-schema = (branch-name, branch-city, assets,customer-name)Customer-loan-schema = (customer-name, loan-number,amount)Using the lending relation of Figure 7.1, we construct our newrelations branch-customer (Branch-customer) and customer-loan (Customer-loan-schema):branch-customer = Π branch-name, branch-city, assets,customer-name (lending)customer-loan = Π customer-name, loan-number, amount(lending)Of course, there are cases in which we need to reconstruct theloan relation. For example, suppose that we wish to find allbranches that have loans with amounts less than $1000. Norelation in our alternative database contains these data. We needto reconstruct the lending relation. It appears that we can do soby writingbranch-customer _ customer-loan

Branch name Branch city Assets Customer name Downtown Brooklyn 9000000 Jones Redwood Palo Alto 2100000 Smith Perryridge Horseneck 1700000 Hayes Downtown Brooklyn . 9000000 Jackson

Mianus Horseneck 400000 Jones Round Hill Horseneck 8000000 Turner

Pownal Bennington 300000 Williams North Town Rye 3700000 Hayes Downtown Brooklyn 9000000 Johnson Perryridge Horseneck 1700000 Glenn Brighton Brooklyn . 7100000 Brooks

LESSON 36:

NORMALISATION - II

the relation branch customer

The relation customer-loan.The result of computing branch-customer [><Icustomer-loan that are not in lending. In our example, branch-customer [><I customer-loan has the following additionaltuples(Downtown, Brooklyn, 9000000, Jones, L-93, 500)(Perryridge, Horseneck, 1700000, Hayes, L-16, 1300)(Mianus, Horseneck, 400000, Jones, L-17, 1000)(North Town, 3700000, Hayes, L-15, 1500)Consider the query “Find all bank branches that have made aloan in an amount less than $1000.” . we see that the onlybranches with loan amounts less than $1000 are Mianus andRound Hill However, when we apply the expressionΠ branch-name ((J amount < 1000 (branch-customer [><Icustomer-loan))we obtain three branch names: Mianus, Round Hill, andDowntown.A closer examination of this example shows why. If a customerhappens to have several loans from different branches, wecannot tell which loan belongs to which branch. Thus, when wejoin branch-customer and customer-loan, we obtain not onlythe tuples we had originally in lending, but also several addi-tional tuples. Although we have more tuples inbranch-customer [><I customer-loan, we actually have less in-formation. We are no longer able, in general, to represent in thedatabase information about which customers are borrowersfrom which branch. Because of this loss of information, we callthe decomposition of Lending-schema into Branch-customer-schema and customer-loan-schema a lossy decomposition, or alossy-join decomposition. Decomposition that is not a lossy-join decomposition is a lossless-join decomposing.

Customer name Loan number Amount Jones Smith Hayes

Jackson Jones

Turner Williams Hayes

Johnson Glenn Brooks

L-17 L-23 L-15 L-14 L-93 L-11 L-29 L-16 L-18 L-25 L-10

1000 2000 1500 1500 500 900

1200 1300 2000 2500 2200

108

DATABASE MANAGEMENT

The relation branch-customer [><I customer-loan.It should be clear from our example that a lossy-join decompo-sition is, in gen-eral, a bad database design.Why is the decomposition lossy? There is one attribute incommon between Branch-customer-schema and Customer-loan-schema:-Branch-customer-schema n Customer-loan-schema = {cus-tomer-name}The only way that we can represent a relationship between, forexample, loan-number and branch-name is through customer-name. This representation is not adequate because a customermay have several loans, yet these loans are not necessarilyobtained from the same branch.Let us consider another alternative design, in which we decom-pose Lending-schema into the following two schemas:Branch-schema = (branch-name, branch-city, assets)Loan-info-schema = (branch-name, customer-name, loan-number, amount)There is one attribute in common between these two schemas:Branch-loan-schema n Customer-loan-schema = {branch-name}Thus, the only way that we can represent a relationship’between, for example, customer-name and assets is throughbranch-name. The difference between this exam-ple and thepreceding one is that the assets of a branch are the same,regardless of the customer to which we are referring; whereasthe lending branch associated with a certain loan amount doesdepend on the customer to which we are referring. For a givenbranch-name, there is exactly one assets value and exactly onebranch-city;Where as a similar statement cannot be made for customer-name. That is, the functional dependencybranch-name → assets branch-cityholds, but customer-name does not functionally determineloan-number.The notion of loss less joins is central to much of relational-database design. There-fore, we restate the preceding examplesmore concisely and more formally. Let. R be a relation schema.A set of relation schemas {R1, R2,..., Rn} is a decompositionof R ifR = Rl U R2 U ... U Rn

Branch name Branch city Assets Customer name

Laon number Amount

Downtown Brooklyn 9000000 Jones L-17 1000 Downtown Brooklyn 9000000 Jones L-93 500 Redwood Palo Alto 2100000 Smith L-23 2000 Perryridge Horseneck 1700000 Hayes L-15 1500 Perryridge Horseneck 1700000 Hayes L-16 1300 Downtown Brooklyn 9000000 Jackson L-14 1500

Mianus Horseneck 400000 Jones L-17 1000 Mianus Horseneck 400000 Jones L-93 500

Round Hill Horseneck 8000000 Turner L-11 900 Pownal Bennington 300000 Williams L-29 1200

North Town Rye 3700000 Hayes L-15 1500 North Town Rye 3700000 Hayes L-16 1300 Downtown Brooklyn 9000000 Johnson L-18 2000 Perryridge Horseneck 1700000 Glenn L-25 2500 Brighton Brooklyn 7100000 Brooks L-10 2200

That is, {R1, R2,..., Rn} is a decomposition of R if, for i =1,2,..., n, each Ri is a subset of R, and every attribute in Rappears in at least one Ri.

Let r be a relation on schema R, and let Ti = Π Ri (T) for i =1,2,..n. That is, {r1, r2, r3……rn} is the database that resultsfrom decomposing R into {R1, R2" . . , Rn}.It is always the case that

r⊆ r1 [><I rn

To see that this assertion is true, consider a tuple t in relation r.When we compute the relations r1, r2, r3, ... rn , the tuple t givesrise to one tuple ti in each ri, i = 1,2, . . . , n. These n tuplescombine to regenerate t when we compute r1, [><Ir2…[><Ir2…[><I…. [><I rn. As an illustration,consider our earlier example, in which• n = 2.• R. = Lending-schema.• R 1 = Branch-customer-schema.• R2 = Customer-Loan-schema.• r = the relation shown in Figure 7.1.• r1 = the relation shown in Figure 7.2• r2 = the relation shown in Figure 7.10.• r1 [><I r2 = the relation shown in Figure 7.11.To have a lossless-join decomposition, we need to imposeconstraints on the set of possible relations. We found that thedecomposition of Lending-schema into Branch-schema andLoan-info-schema is lossless because the functional dependencybranch-name → branch:...city assetshold on branch schema.We shall show how to test whether decomposition is lossless-join decomposition in the next few sections. A major part ofthis chapter deals with the questions of how to specify con-straints on the database, and how to obtain lossless-joindecom-positions that avoid the pitfalls represented by theexamples of bad database designs that we have seen in thissection.

Desirable Properties of DecompositionWe can use a given set of functional dependencies in designing arelational database in which most of the undesirable propertiesdo not occur. When we design such systems, it may becomenecessary to decompose a relation into several smaller relations.In this section, we outline the desirable properties of a decom-position of a rela-tional schema. In later sections, we outlinespecific ways of decomposing a relational schema to get theproperties we desire. We illustrate our concepts with theLending -schemaLending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount)The set F of functional dependencies that we require to hold onLending-schema arebranch-name -+ branch-city assetsloan-number -+ amount branch-name

109

DATABASE MANAGEMENT

Lending-schema is an example of a bad database design.Assume that we decompose it to the following three relations:Branch-schema = (branch-name, branch-city, assets)Loan-schema = (loan-number, branch-name, amount)Borrower-schema = (customer-name, loan-number)We claim that this decomposition has several desirable proper-ties, which we discuss next.

Lossless-join DecompositionWhen we decompose a relation into a number of smallerrelations, it is crucial that the decomposition be lossless. Wemust first present a criterion for determining whether adecomposition is lossy.Let R be a relation schema, and let F be a set of functionaldependencies on R. Let R1 and R2 form a decomposition of R.This decomposition is a lossless-join decom-position of R if atleast one of the following functional dependencies is in F+:

R1 ∩ R2 → RlR1 ∩ R2 → R2

In other words, if R1 ∩ R2 forms a superkey of either R1 orR2, the decomposition of R is a lossless-join decomposition.We can use attribute closure to efficiently test for superkeys, aswe have seen earlier.We now demonstrate that our decomposition of Lending-schema is a lossless-join decomposition by showing a sequenceof steps that generate the decomposition. We begin bydecomposing Lending-schema into two schemas:Branch-schema = (branch-name, branch-city, assets)Loan-info-schema = (branch-name, customer-name, loan-number, amount)Since branch-name → branch-city assets, the augmentation rulefor functional depen-dencies implies that

branch-name → branch-name branch-city assetsSince Branch-schema n Loan-info-schema = {branch-name}, itfollows that our initial decomposition is a lossless-joindecomposition.Next, we decompose Loan-info-schema intoLoan-schema = (loan-number, branch-name, amount)Borrower-schema = (customer-name, loan-number)This step results in a lossless-join decomposition, since loan-number is a common at-tribute and loan-number → amountbranch-name.For the general case of decomposition of a relation intomultiple parts at once, the test for lossless join decompositionis more complicated. See the bibliographical notes for referenceson the topic.While the test for binary decomposition is clearly a sufficientcondition for lossless join, it is a necessary condition only if allconstraints are functional dependencies. We shall see other typesof constraints later (in particular, a type of constraint calledmultivalued dependencies), that can ensure that a decomposi-tion is lossless join even if no functional dependencies arepresent.

Second Normal FormThe general requirements of 2NF are:• Remove subsets of data that apply to multiple rows of a

table and place them in separate rows.• Create relationships between these new tables and their

predecessors through the use of foreign keys.These rules can be summarized in a simple statement: 2NFattempts to reduce the amount of redundant data in a table byextracting it, placing it in new table(s) and creating relationshipsbetween those tables.Let’s look at an example. Imagine an online store that main-tains customer information in a database. Their Customerstable might look something like this:

A brief look at this table reveals a small amount of redundantdata. We’re storing the “Sea Cliff, NY 11579” and “Miami, FL33157” entries twice each. Now, that might not seem like toomuch added storage in our simple example, but imagine thewasted space if we had thousands of rows in our table.Additionally, if the ZIP code for Sea Cliff were to change, we’dneed to make that change in many places throughout thedatabase.In a 2NF-compliant database structure, this redundant informa-tion is extracted and stored in a separate table. Our new table(let’s call it ZIPs) might look like this:

ZIP City State 11579 Sea Cliff NY 33157 Miami FL 46637 South Bend IN

If we want to be super-efficient, we can even fill this table inadvance — the post office provides a directory of all valid ZIPcodes and their city/state relationships. Surely, you’ve encoun-tered a situation where this type of database was utilized.Someone taking an order might have asked you for your ZIPcode first and then knew the city and state you were callingfrom. This type of arrangement reduces operator error andincreases efficiency.Now that we’ve removed the duplicative data from theCustomers table, we’ve satisfied the first rule of second normalform. We still need to use a foreign key to tie the two tablestogether. We’ll use the ZIP code (the primary key from theZIPs table) to create that relationship. Here’s our new Custom-ers table:

CustNum FirstName LastName Address ZIP 1 John Doe 12 Main Street 11579 2 Alan Johnson 82 Evergreen Tr 11579 3 Beth Thompson 1912 NE 1st St 33157 4 Jacob Smith 142 Irish Way 46637 5 Sue Ryan 412 NE 1st St 33157

CustNum FirstName LastName Address City State ZIP 1 John Doe 12 Main Street Sea Cliff NY 11579 2 Alan Johnson 82 Evergreen Tr Sea Cliff NY 11579 3 Beth Thompson 1912 NE 1st St Miami FL 33157 4 Jacob Smith 142 Irish Way South Bend IN 46637 5 Sue Ryan 412 NE 1st St Miami FL 33157

110

DATABASE MANAGEMENT

We’ve now minimized the amount of redundant informationstored within the database and our structure is in secondnormal form.

Points to Ponder

• The bad design of a database suggests that we shoulddecompose a relation schema that has many attributes intoseveral schemas with fewer attributes.

• when we decompose a relation into a number of smallerrelations, it is crucial that the decomposition be lossless

• 2NF attempts to reduce the amount of redundant data in atable by extracting it, placing it in new table(s) and creatingrelationships between those tables.

• We can use a given set of functional dependencies indesigning a relational database in which most of theundesirable properties do not occur

• When we decompose a relation into a number of smallerrelations, it is crucial that the decomposition be lossless

• 2NF remove subsets of data that apply to multiple rows ofa table and place them in separate rows.

Review Terms

• Decomposition• Properties of decomposition• Lossless join decomposition• 2NF

Students Activity

1. Define decomposition? Why it is required?

2. Define desirable properties of decomposition?

3. Define Lossless join decomposition?

4. Define 2NF with the help of an example?

111

DATABASE MANAGEMENT

Student Notes

112

DATABASE MANAGEMENT

Lesson Objectives

• BCNF• 3NF• Comparison of BCNF & 3NF• 4NF

Boyce-Codd Normal FormUsing functional dependencies, we can define several normalforms that represent “good” database designs.

DefinitionOne of the more desirable normal forms that we can obtain isBoyce-Codd normal form (BCNF). A relation schema R is inBCNF with respect to a set F of functional dependencies if, forall functional dependencies in F+ of the form α →β, where α ⊆R and β ⊆ R, at least one of the following holds:• α → β is a trivial functional dependency (that is, β ⊆ α).

α is a superkey for schema R.A database design is in BCNF if each member of the set ofrelation schemas that constitutes the design is in BCNF.As an illustration, consider the following relation schemasand their respective functional dependencies:

• Customer-schema = (customer-name, customer-street,customer-city)customer-name -t customer-street customer-city

• Branch-schema = (branch_name, assets, branch-city)branch-name -t assets branch-city

• Loan-info-schema = (branch-name, customer-name, loan-number, amount)loan-number -t amount branch-name

We claim that Customer-schema is in BCNF. We note that acandidate key for the, schema is cust9mer-name. The onlynontrivial functional dependencies that hold on Customer-schema have customer-name on the left side of the arrow. Sincecustomer-name is a candidate key, functional dependencies withcustomer-name on the left side do not violate the definition ofBCNF. Similarly, it can be shown easily that the relation schemaBranch-schema is in BCNF.The schema Loan-info-schema, however, is not in BCNF. First,note that loan-number is not a superkey for Loan-info-schema,since we could have a pair of tuples represent-ing a single loanmade to two people - for example,

(Dowhtown, John Bell, L-44, 1000)(Downtown, Jane Bell, L-44, 1000)

Because we did not list functional dependencies that rule outthe preceding case, loan--number is not a candidate key.However, the functional dependency loan-nurnber → amountis nontriviaL Therefore, Loan-info-schema does not satisfy thedefinition of BCNF.

We claim that Loan-info-schema is not in a desirable form, sinceit suffers from the problem of repetition of information. Weobserve that, if there are several customer names associatedwith a loan, in a relation on Loan-info--schema, then we areforced to repeat the branch name and the amount once for eachcustomer. We can eliminate this redundancy by redesigning ourdatabase such that all schemas are in BCNF. One approach tothis problem is to take the existing non--BCNF design as astarting point, and to decompose those schemas that are not inBCNF. Consider the decomposition of Loan-info-schema intotwo schemas:Loan-schema = (loan-number, branch-name, amount)Borrower-schema = (customer-name, loan-number)This decomposition is a lossless-join decomposition.To determine whether these schemas are in BCNF, we need todetermine what functional dependencies apply to them. In thisexample, it is easy to see thatloan-number → amount branch-nameapplies to the Loan-schema, and that only trivial functionaldependencies apply to Borrower-schema. Although loan-number is not a superkey for Loan-info-schema, it is a candidatekey for Loan-schema. Thus, both schemas of our decomposi-tion are in SCNF.It is now possible to avoid redundancy in the case where thereare several cus-tomers associated with a loan. There is exactlyone tuple for each loan in the relation on Loan-schema, and onetuple for each customer of each loan in the relation on Bor-rower-schema. Thus, we do not have to repeat the branch nameand the amount once for each customer associated with a loan.Often testing of a relation to see if it satisfies BCNF can besimplified:• To check if a nontrivial dependency α → β causes a violation

of SCNF, comput α+ (the attribute closure of α), andverify that it includes all attributes of R; that is, it is asuperkey of R.

• To check if a relation schema R is in SCNF, it suffices tocheck only the dependencies in the given set F for violation‘of BCNF, rather than check all depen-dencies in F+.

We can show that if none of the dependencies in F causes aviolation of SCNF, then none of the dependencies in F+ willcause a violation of SCNF either.Unfortunately, the latter procedure does not work when arelation is decomposed. That is it does not suffice to use Fwhen we test a relation Ri in a decomposition of R, forviolation of BCNF. For example” consider relation schema R(A, B, C, D, E), with functional dependencies F containing A →Band BC → D. Suppose this were decomposed into R1(A, B)and R2(A, C, D, E). Now, neither of the dependencies in Fcontains only attributes from (A, C, D, E) so we might be

LESSON 37:

NORMALISATION - III

113

DATABASE MANAGEMENT

misled into thinking R2 satisfies BCNF. In fact, there is adependency AC → D in F+ (which can be inferred using thepseudotransitivity rule from the two dependencies in F), whichshows that R2 is not in BCNF. Thus, we may need a depen-dency that is in F+, but is not in F, to show that a decomposedrelation is not in BCNF.An alternative BCNF test is sometimes easier than computingevery dependency in F+. To check if a relation Ri in a decompo-sition of R is in BCNF, we apply this test:. For every subset a of attributes in R.i, check that a+ (theattribute closure of a under F) either includes no attribute of Ri- a, or includes all attributes of Ri.If the condition is violated by some set of attributes a in Ri,consider the following functional dependency, which can beshown to be present in F+:

α → (α+ - α)∩ RiThe above dependency shows that Ri violates BCNF, and is a“witness” for the viola-tion. The BCNF decompositionalgorithm, which we shall see in Section 7.6.2, makes use of thewitness.

Third Normal FormAs we saw earlier, there are relational schemas where a BCNFdecomposition cannot be dependency preserving. For suchschemas, we have two alternatives if we wish to check if anupdate violates any functional dependencies:• Pay the extra cost of computing joins to test for violations.• Use an alternative decomposition, third normal form

(3NF), which we present below, which makes testing ofupdates cheaper. Unlike BCNF, 3NF decompo-sitions maycontain some redundancy in the decomposed schema.

We shall see that it is always possible to find a lossless-join,dependency-preserving decomposition that is in 3NF. Which ofthe two alternatives to choose is a design decision to be madeby the database designer on the basis of the application require-ments.

DefinitionBCNF requires that all nontrivial dependencies be of the formα → β, where α is a super key. 3NF relaxes this constraintslightly by allowing nontrivial functional de-pendencies whoseleft side is not a superkey.A relation schema R is in third normal form (3NF) with respectto a set F of func-tional dependencies if, for all functionaldependencies in F+ of the form α → β, where α ⊆ R and β ⊆R, at least one of the following holds:• α → β is a trivial functional dependency.• α is a superkey for R.• Each attribute A in β - α is contained in a candidate key for

R.Note that the third condition above does not say that a singlecandidate key should contain all the attributes in β - α; eachattribute A in β - α may be contained in a different candidatekey.The first two alternatives are the same as the two alternatives inthe definition of BCNF. The third alternative of the 3NF

definition seems rather unintuitive, and it is not obvious why itis useful. It represents, in some sense, a minimal relaxation ofthe BCNF conditions that helps ensure that every schema has adependency-preserving decomposition into 3NF. Its purposewill become more clear later, when we study decomposition into3NF.Observe that any schema that satisfies BCNF also satisfies 3NF,since each of its functional dependencies would satisfy one ofthe first two alternatives. BCNF is there-fore a more restrictiveconstraint than is 3NF.The definition of 3NF allows certain functional dependenciesthat are not allowed in BCNF. A dependency α → β thatsatisfies only the third alternative of the 3NF definition is notallowed in BCNF, but is allowed in 3NF.Let us return to our Banker-schema example (Section 7.6). Wehave shown that this relation schema does not have a depen-dency-preserving, lossless-join decomposition into BCNF. Thisschema, however, turns out to be in 3NF. To see that it is, wenote that {customer-name, branch-name} is a candidate key forBanker-schema, so the only attribute not contained in acandidate key for Banker-schema is banker-name. The onlynontrivial functional dependencies of the form

α → banker-nameinclude {customer-name, branch-name} as part of Q. Since{customer-name, branch-name} is a candidate key, thesedependencies do not violate the definition of 3NF.As an optimization when testing for 3NF, we can consider onlyfunctional depen-dencies in the given set F, rather than in F+.Also, we can decompose the dependen-cies in F so that their.right-hand side consists of only single attributes, and use theresultant set in place of F.Given a dependency α → β, we can use the same attribute-closure-based tech-nique that we used for BCNF to check if α isa superkey. If α is not a superkey, we have to verify whether eachattribute in β is contained in a candidate key of R; this test israther more expensive, since it involves finding candidate keys.In fact, test-ing for 3NF has been .shown to be NP-hard; thus,it is very unlikely that there is a polynomial time complexityalgorithm for the task.

Comparison of BCNF and 3NFOf the two normal forms for relational-database schemas, 3NFand BCNF, there are advantages to 3NF in that we know that itis always possible to obtain a 3NF design without sacrificing alossless join or dependency preservation. Nevertheless, there aredisadvantages to 3NF: If we do not eliminate all transitiverelations schema depen-dencies, we may have to use null valuesto represent some of the possible meaningful relationshipsamong data items, and there is the problem of repetition ofinformation.As an illustration of the null value problem, consider again theBanker-schema and its associated functional dependencies. Sincebanker-name → branch-name, we may want to representrelationships between values for banker-name and values forbranch--name in our database. If we are to do so, however,either there must be a correspond-ing value for customer-name,or we must use a null value for the attribute customer--name‘1.

114

DATABASE MANAGEMENT

Customer-name bank-name branch-name

Jones Johnson Perryridge Smith Johnson Perryridge Hayes Johnson Perryridge

Jackson Johnson Perryridge Curry Johnson Perryridge Turner Johnson Perryridge

An instance of Banker-schema.As an illustration of the repetition of information problem,consider the instance of Banker-schema. Notice that theinformation indicating that Johnson is working at thePerryridge branch is repeated.Recall that our goals of database design with functionaldependencies are:1. BCNF2. Lossless join3. Dependency preservationSince it is not always possible to satisfy all three, we may beforced to choose between BCNF and dependency preservationwith 3NF.It is worth noting that SQL does not provide a way ofspecifying functional depen-dencies, except for the special caseof declaring superkeys by using the primary key or uniqueconstraints. It is possible, although a little complicated, to writeassertions that enforce a functional dependency, unfortunatelytesting the assertions would be very expensive in most databasesystems. Thus even if we had a dependency-preservingdecomposition, if we use standard SQL we would not be ableto efficiently test a functional dependency whose left-hand sideis not a key.Although testing functional dependencies may involve a join ifthe decomposition is not dependency preserving, we can reducethe cost by using materialized views, which many databasesystems support. Given a BCNF decomposition that is not de-pendency preserving, we consider each dependency in aminimum cover Fc that is not preserved in the decomposition.For each such dependency α → β, we define a materialized viewthat computes a join of all relations in the decomposition, andprojects the result on αβ. The functional dependency can beeasily tested on the ma-terialized view, by means of a constraintunique (α). On the negative side, there is a space and timeoverhead due to the materialized view, but on the positive side,the application programmer need not worry about writing codeto keep redundant data consistent on updates; it is the job ofthe database system to maintain the material-ized view, that is,keep up up to date when the database is updatedThus, in case we are not able to get a dependency-preservingBCNF decomposition, it is generally preferable to opt forBCNF, and use techniques such as materialized views to reducethe cost of checking functional dependencies.

Fourth Normal FormSome relation schemas, even though they are in BCNF, do notseem to be sufficiently normalized, in the sense that they stillsuffer from the problem of repetition of infor-mation.

Consider again our banking example. Assume that, in analternative design for the bank database schema, we have theschemaBC-schema = (loan-number, customer-name, customer-street,customer-city)The astute reader will recognize this schema as a non-BCNFschema because of the functional dependency

customer-name → customer-street customer-citythat we asserted earlier, and because customer-name is not a keyfor BC-schema. How-ever, assume that our bank is attractingwealthy customers who have several ad-dresses (say, a winterhome and a summer home). Then, we no longer wish to en-force the functional dependency customer-name →customer-street customer-city. If we remove this functionaldependency, we find BC-schema to be in BCNF with respect toour modified set of functional dependencies. Yet, even thoughBC-schema is now in BCNF, we still have the problem ofrepetition of information that we had earlier.To deal with this problem, we must define a new form ofconstraint, called a mul-tivalued dependency. As we did forfunctional dependencies, we shall use multivalued dependenciesto define a normal form for relation schemas. This normalform, called fourth normal form (4NF), is more restrictive thanBCNF. We shall see that every 4NF schema is also in BCNF, butthere are BCNF schemas that are not in 4NF.

Multivalued DependenciesFunctional dependencies rule out certain tuples from being in arelation. If A → B, then we cannot have two tuples with thesame A value but different B values. Mul-tivalued dependencies,on the other hand, do not rule out the existence of certaintuples. Instead, they require that other tuples of a certain formbe present in the rela-tion. For this reason, functional depen-dencies sometimes are referred to as equality- generatingdependencies, and multivalued dependencies are referred to astuple- generating dependencies.Let R be a relation schema and let α ⊆ R and β ⊆ R. Themultivalued dependency

α ⊆→→βholds on R if, in any legal relation r(R), for all pairs of tuples t1and t2 in r such that t1[α] = t2[α], there exist tuples t3 and t4 inr such that

tr[α] = t2[α] = t3[α] = t4[α]t3 [β] = td{β]

t3[R - β] = t2[R - β]t4 [β] .= t2 [β]

t4[R - β] = t1(R - β]

α β R- α - β T1 T2

A1 …..ai A1 …..ai

ai +1 ….aj bi + 1 … bj

Aj +1 …an Bj +1 ….bn

T3 T4

A1 …..ai A1 …..ai

Ai+1 …. aj Bi +1 ….bj

Bj +1 ….bn Aj +1 … an

Tabular representation of α →→ β

115

DATABASE MANAGEMENT

This definition. is less complicated than it appears to be.Figure’7.16 gives a tabular picture of t1, t2, t3 and t4. Intuitively,the multivalued dependency α →→ β says that the relationshipbetween α and α is independent of the relationship between αand R - β. If the multivalued dependency α →→ β is satisfiedby all relations on schema R, then α →→ β is a trivialmultivalued dependency on schema R. Thus α →→ β is trivialif β ⊆ α or β U α = R.To illustrate the difference between functional and multivalueddependencies, we consider the BC-schema again, and therelation bc (BC-schema). We must repeat the loan number oncefor each address a customer has, and we must repeat the addressfor each loan a customer has. This repetition is unnecessary,since the relationship between a customer and his address isindependent of the relationship between that customer and aloan. If a customer (say, Smith) has a loan (say, loan number L-23), we want that loan tobe associated with all Smith’s addresses. Thus, the relation ofFigure 7.18 is illegal. To make this relation legal, we need to addthe tuples (L-23, Smith, Main, Manchester) and (L-27, Smith,North, Rye) to the bc relationComparing the preceding example with our definition ofmultivalued dependency, we see that we want the multivalueddependencycustomer-name →→ customer-street customer-cityto hold. (The multivalued dependency customer-name →→loan-number will do as well. We shall soon see that they areequivalent.)As with functional dependencies, we shall use multivalueddependencies in two ways:1. To test relations to determine whether they are legal under a

given set of func-tional and multivalued dependencies2. To specify constraints on the set of legal relations; we shall

thus concern our-selves with only those relations that satisfya given set of functional and mul-tivalued dependencies

Loan-number L-23

Customer-name Smith

Customer-street

North

Customer-city Rye

L-23 Smith Main Manchester L-93 Curry Lake Horseneck

Relation bc: An example of redundancy in a BCNF relation.

Loan-number Customer-name Customer-street customercity L-23 L-27

Smith Smith

North Main

Rye Manchester

An illegal bc relation.Note that, if a relation r fails to satisfy a given multivalueddependency, we can con-struct a relation r’ that does satisfy themultivalued dependency by adding tuples to r.Let D denote a set of functional and multivalued dependencies.The closure D+ of D is the set of all functional andmultivalued dependencies logically implied by o. As we did forfunctional dependencies, we can compute D+ from 0, using theformal definitions of functional dependencies and multivalueddependencies. We can man-age with such reasoning for verysimple multivalued dependencies. Luckily, multi-valued

dependencies that occur in practice appear to be quite simple.For complex dependencies, it is better to reason about sets ofdependencies by using a system of inference rules.From the definition of multivalued dependency, we can derivethe following rule:

α → β, then α →→ β In other words, every functional dependency is also amultivalued dependency.

Definition of Fourth Normal FormConsider again our BC-schema example in which themultivalued dependency customer-name ®® customer-streetcustomer-city holds, but no nontrivial functional de-pendencieshold. We saw in the opening paragraphs of Section 7.8 that,although BC-schema is in BCNF, the design is not ideal, sincewe must repeat a customer’s address information lor each loan.We shall see that we can use the given multivalued de-pendencyto improve the database design, by decomposing BC-schemainto a fourth normal form decomposition.A relation schema R is in fourth normal form (4NF) withrespect to a set 0 of functional and multivalued dependencies if,for all multivalued dependencies in D+ of the form α →→ β,where α ⊆ R and β ⊆ R, at least one of the following holds• α →→ β is a trivial multivalued dependency.• α is a superkey for schema R.A database design is in 4NF if each member of the set ofrelation schemas that consti-tutes the design is in 4NENote that the definition of 4NF differs from the definition ofBCNF in only the use of multivalued dependencies instead offunctional dependencies. Every 4NF schema is inBCNF. To see this fact, we note that, if a schema R is not inBCNF, then there isresult := {R}; done := false;compute D+; Given schema Ri, let Di denote the restriction ofD+ to Ri while (not done) doif (there is a schema Ri in result that is not in 4NF W.r.t.- Di)then beginlet α →→ β a nontrivial multivalued dependency that holds onRi such that α → Ri is not in Di, and α ∩ β = φ;result := (result - Ri) U (Ri - β U (α , β);endelse done := true;4NF decomposition algorithm.a nontrivial functional dependency α → β holding on R, wherea is not a superkey. Since α → β implies α →→ β, R cannot bein 4NF.Let R be a relation schema, and let R1, R2,. . . , Rn be a decom-position of R. To check if each relation schema Ri in thedecomposition is in 4NF, we need to find what multivalueddependencies hold on each Ri. Recall that, for a set F offunctional dependencies, the restriction Fi of F to Ri is allfunctional dependencies in F+ that include only attributes of Ri.Now consider a set D of both functional and multivalued

116

DATABASE MANAGEMENT

dependencies. The restriction of D to Ri is the set Di consistingo f1. All functional dependencies in D+ that include only

attributes of Ri2. All multivalued dependencies of the form

α →→ β ∩ Riwhere α ⊆ Ri and α →→ β is in D+.

Points to Ponder

• A relation schema R is in BCNF with respect to a set F offunctional dependencies if, for all functional dependencies inF+ of the form α →β

• third normal form (3NF), which we present below, whichmakes testing of updates cheaper

• Mul-tivalued dependencies do not rule out the existence ofcertain tuples Instead, they require that other tuples of acertain form be present in the rela-tion.

• Every 4NF schema is also in BCNF, but there are BCNFschemas that are not in 4NF.

Review Terms

• BCNF• 3NF• Comparison of BCNF & 3NF• 4NF

Students Activity

1. Define BCNF?

2 Define 3NF?

3. Differentiate between BCNF & 3NF?

4. Define Multivalued dependencies?

5. Define 4NF?

117

DATABASE MANAGEMENT

Student Notes

118

DATABASE MANAGEMENT

Lesson objectives

• Physical Storage Media• Performance Measures of Disks• Optimization of Disk-Block Access• Fixed-Length Records• Variable-Length Records• Sequential File Organization• Clustering File Organization• Data Dictionary Storage

Overview of Physical Storage Media

1. Several types of data storage exist in most computersystems. They vary in speed of access, cost per unit of data,and reliability.• Cache: most costly and fastest form of storage.

Usually very small, and managed by the operatingsystem.

• Main Memory (MM): the storage area for dataavailable to be operated on.• General-purpose machine instructions operate on

main memory.• Contents of main memory are usually lost in a

power failure or “crash”.• Usually too small (even with megabytes) and too

expensive to store the entire database.• Flash memory: EEPROM (electrically erasable

programmable read-only memory).• Data in flash memory survive from power failure.• Reading data from flash memory takes about 10

nano-secs (roughly as fast as from main memory),and writing data into flash memory is morecomplicated: write-once takes about 4-10 microsecs.

• To overwrite what has been written, one has to firsterase the entire bank of the memory. It maysupport only a limited number of erase cycles (104

to 106).• It has found its popularity as a replacement for

disks for storing small volumes of data (5-10megabytes).

• Magnetic-disk storage: primary medium for long-term storage.• Typically the entire database is stored on disk.• Data must be moved from disk to main memory

in order for the data to be operated on.• After operations are performed, data must be

copied back to disk if any changes were made.

• Disk storage is called direct access storage as it ispossible to read data on the disk in any order(unlike sequential access).

• Disk storage usually survives power failures andsystem crashes.

• Optical storage: CD-ROM (compact-disk read-onlymemory), WORM (write-once read-many) disk (forarchival storage of data), and Juke box (containing afew drives and numerous disks loaded on demand).

• Tape Storage: used primarily for backup and archivaldata.• Cheaper, but much slower access, since tape must

be read sequentially from the beginning.• Used as protection from disk failures!

2. The storage device hierarchy is presented , where the higherlevels are expensive (cost per bit), fast (access time), but thecapacity is smaller.

Storage-device hierarchy3. Another classification: Primary, secondary, and tertiary

storage.1. Primary storage: the fastest storage media, such as cash

and main memory.2. Secondary (or on-line) storage: the next level of the

hierarchy, e.g., magnetic disks.3. Tertiary (or off-line) storage: magnetic tapes and optical

disk juke boxes.4. Volatility of storage. Volatile storage loses its contents when

the power is removed. Without power backup, data in thevolatile storage (the part of the hierarchy from mainmemory up) must be written to nonvolatile storage forsafekeeping.

Performance Measures of DisksThe main measures of the qualities of a disk are capacity, accesstime, data transfer rate, and reliability,

LESSON 38:

FILE ORGANIZATION METHOD - I

119

DATABASE MANAGEMENT

1. Access time: the time from when a read or write request isissued to when data transfer begins. To access data on agiven sector of a disk, the arm first must move so that it ispositioned over the correct track, and then must wait for thesector to appear under it as the disk rotates. The time forrepositioning the arm is called seek time, and it increaseswith the distance the arm must move. Typical seek timerange from 2 to 30 milliseconds.Average seek time is the average of the seek time, measuredover a sequence of (uniformly distributed) randomrequests, and it is about one third of the worst-case seektime.Once the seek has occurred, the time spent waiting for thesector to be accesses to appear under the head is calledrotational latency time. Average rotational latency time isabout half of the time for a full rotation of the disk.(Typical rotational speeds of disks ranges from 60 to 120rotations per second).The access time is then the sum of the seek time and thelatency and ranges from 10 to 40 milli-sec.

2. Data transfer rate, the rate at which data can be retrievedfrom or stored to the disk. Current disk systems supporttransfer rate from 1 to 5 megabytes per second.

3. Reliability, measured by the mean time to failure. The typicalmean time to failure of disks today ranges from 30,000 to800,000 hours (about 3.4 to 91 years).

Optimization of Disk-Block Access

1. Data is transferred between disk and main memory in unitscalled blocks.

2. A block is a contiguous sequence of bytes from a singletrack of one platter.

3. Block sizes range from 512 bytes to several thousand.4. The lower levels of file system manager covert block

addresses into the hardware-level cylinder, surface, andsector number.

5. Access to data on disk is several orders of magnitude slowerthan is access to data in main memory. Optimizationtechniques besides buffering of blocks in main memory.• Scheduling: If several blocks from a cylinder need to

be transferred, we may save time by requesting them inthe order in which they pass under the heads. Acommonly used disk-arm scheduling algorithm is theelevator algorithm.

• File organization. Organize blocks on disk in a waythat corresponds closely to the manner that we expectdata to be accessed. For example, store relatedinformation on the same track, or physically closetracks, or adjacent cylinders in order to minimize seektime. IBM mainframe OS’s provide programmers finecontrol on placement of files but increaseprogrammer’s burden.UNIX or PC OSs hide disk organizations from users.Over time, a sequential file may become fragmented.To reduce fragmentation, the system can make a back-up copy of the data on disk and restore the entire disk.

The restore operation writes back the blocks of eachfile continuously (or nearly so). Some systems, such asMS-DOS, have utilities that scan the disk and thenmove blocks to decrease the fragmentation.

• Nonvolatile write buffers. Use nonvolatile RAM (suchas battery-back-up RAM) to speed up disk writesdrastically (first write to nonvolatile RAM buffer andinform OS that writes completed).

• Log disk. Another approach to reducing write latencyis to use a log disk, a disk devoted to writing asequential log. All access to the log disk is sequential,essentially eliminating seek time, and severalconsecutive blocks can be written at once, makingwrites to log disk several times faster than randomwrites.

File Organization

1. A file is organized logically as a sequence of records.2. Records are mapped onto disk blocks.3. Files are provided as a basic construct in operating systems,

so we assume the existence of an underlying file system.4. Blocks are of a fixed size determined by the operating

system.5. Record sizes vary.6. In relational database, tuples of distinct relations may be of

different sizes.7. One approach to mapping database to files is to store

records of one length in a given file.8. An alternative is to structure files to accommodate variable-

length records. (Fixed-length is easier to implement.)

Fixed-Length Records

1. Consider a file of deposit records of the form:aaaaaaaaaaaa¯type deposit = recordbname : char(22);account# : char(10);balance : real;end• If we assume that each character occupies one byte, an

integer occupies 4 bytes, and a real 8 bytes, our depositrecord is 40 bytes long.

• The simplest approach is to use the first 40 bytes forthe first record, the next 40 bytes for the second, andso on.

• However, there are two problems with this approach.• It is difficult to delete a record from this structure.• Space occupied must somehow be deleted, or we need

to mark deleted records so that they can be ignored.• Unless block size is a multiple of 40, some records will

cross block boundaries.• It would then require two block accesses to read or

write such a record.2. When a record is deleted, we could move all successive

records up one, which may require moving a lot of records.

120

DATABASE MANAGEMENT

• We could instead move the last record into the “hole”created by the deleted record

• This changes the order the records are in.• It turns out to be undesirable to move records to

occupy freed space, as moving requires block accesses.• Also, insertions tend to be more frequent than

deletions.• It is acceptable to leave the space open and wait for a

subsequent insertion.• This leads to a need for additional structure in our file

design.3. So one solution is:

• At the beginning of a file, allocate some bytes as a fileheader.

• This header for now need only be used to store theaddress of the first record whose contents are deleted.

• This first record can then store the address of thesecond available record, To insert a new record, we usethe record pointed to by the header, and change theheader pointer to the next available record.

• If no deleted records exist we add our new record tothe end of the file.

4. Note: Use of pointers requires careful programming. If arecord pointed to is moved or deleted, and that pointer isnot corrected, the pointer becomes a dangling pointer.Records pointed to are called pinned.

5. Fixed-length file insertions and deletions are relativelysimple because “one size fits all”. For variable length, this isnot the case.

Variable-Length RecordsVariable-length records arise in a database in several ways:• Storage of multiple items in a file.• Record types allowing variable field size• Record types allowing repeating fields

Organization of Records in FilesThere are several ways of organizing records in files.• heap file organization. Any record can be placed anywhere

in the file where there is space for the record. There is noordering of records.

• sequential file organization. Records are stored insequential order, based on the value of the search key ofeach record.

• hashing file organization. A hash function is computedon some attribute of each record. The result of the functionspecifies in which block of the file the record should beplaced — to be discussed in chapter 11 since it is closelyrelated to the indexing structure.

• clustering file organization. Records of several differentrelations can be stored in the same file. Related records ofthe different relations are stored on the same block so thatone I/O operation fetches related records from all therelations.

Sequential File Organization

1. A sequential file is designed for efficient processing ofrecords in sorted order on some search key.• Records are chained together by pointers to permit fast

retrieval in search key order.• Pointer points to next record in order.• Records are stored physically in search key order (or as

close to this as possible).• This minimizes number of block accesses.

2. It is difficult to maintain physical sequential order as recordsare inserted and deleted.• Deletion can be managed with the pointer chains.• Insertion poses problems if no space where new

record should go.• If space, use it, else put new record in an overflow

block.• Adjust pointers accordingly.• Problem: we now have some records out of physical

sequential order.• If very few records in overflow blocks, this will work

well.• If order is lost, reorganize the file.• Reorganizations are expensive and done when system

load is low.3. If insertions rarely occur, we could keep the file in physically

sorted order and reorganize when insertion occurs. In thiscase, the pointer fields are no longer required.

Clustering File Organization

1. One relation per file, with fixed-length record, is good forsmall databases, which also reduces the code size.

2. Many large-scale DB systems do not rely directly on theunderlying operating system for file management. One largeOS file is allocated to DB system and all relations are storedin one file.

3. To efficiently execute queries involving, one may store the depositor tuple

for each cname near the customer tuple for thecorresponding cname

4. This structure mixes together tuples from two relations,but allows for efficient processing of the join.

5. If the customer has many accounts which cannot fit in oneblock, the remaining records appear on nearby blocks. Thisfile structure, called clustering, allows us to read many of therequired records using one block read.

6. Our use of clustering enhances the processing of aparticular join but may result in slow processing of othertypes of queries, such as selection on customer.For example, the queryaaaaaaaaaaaa¯select *from customernow requires more block accesses as our customer relation isnow interspersed with the deposit relation.

121

DATABASE MANAGEMENT

7. Thus it is a trade-off, depending on the types of query thatthe database designer believes to be most frequent. Carefuluse of clustering may produce significant performance gain.

Data Dictionary Storage

1. The database also needs to store information about therelations, known as the data dictionary. This includes:• Names of relations.• Names of attributes of relations.• Domains and lengths of attributes.• Names and definitions of views.• Integrity constraints (e.g., key constraints).plus data on the system users:• Names of authorized users.• Accounting information about users.plus (possibly) statistical and descriptive data:• Number of tuples in each relation.• Method of storage used for each relation (e.g., clustered

or non-clustered).2. When we look at indices we’ll also see a need to store

information about each index on each relation:• Name of the index.• Name of the relation being indexed.• Attributes the index is on.• Type of index.

3. This information is, in itself, a miniature database. We canuse the database to store data about itself, simplifying theoverall structure of the system, and allowing the full powerof the database to be used to permit fast access to systemdata.

Points to Ponder

• Several types of data storage exist in most computersystems as main memory,cache, Flash memory, Magnetic-disk storage,optical storage, tape storage.

• The main measures of the qualities of a disk are capacity,access time, data transfer rate, and reliability

• A sequential file is designed for efficient processing ofrecords in sorted order on some search key

• Clustering structure mixes together tuples from tworelations, but allows for efficient processing of the join.

• The database also needs to store information about therelations, known as the data dictionary.

Review Terms

• Storage device• Sequential file storage• Clustering file storage• Data dictionary storage• Search key

Students Activity

1. Define various types of storage media?

2. Define qualities of a disk?

3. Define sequential file storage?

4. Define clustering file storage?

5. Define data dictionary storage?

122

DATABASE MANAGEMENT

Student Notes

123

DATABASE MANAGEMENT

Lesson objectives

• Indexing & Hashing• Ordered Indices• Dense and Sparse Indices• Multi-Level Indices• Index Update• Secondary Indices• Static Hashing• Hash functions• Bucket overflow• Dynamic hashing

Indexing and Hashing

1. Many queries reference only a small proportion of records ina file. For example, finding all records at Perryridge branchonly returns records where bname = “Perryridge”.

2. We should be able to locate these records directly, ratherthan having to read every record and check its branch-name.We then need extra file structuring.

Basic Concepts

1. An index for a file works like a catalogue in a library. Cardsin alphabetic order tell us where to find books by aparticular author.

2. In real-world databases, indices like this might be too largeto be efficient. We’ll look at more sophisticated indexingtechniques.

3. There are two kinds of indices.• Ordered indices: indices are based on a sorted ordering

of the values.• Hash indices: indices are based on the values being

distributed uniformly across a range of buckets. Thebuckets to which a value is assigned is determined by afunction, called a hash function.

4. We will consider several indexing techniques. No onetechnique is the best. Each technique is best suited for aparticular database application.

5. Methods will be evaluated on:1. Access Types — types of access that are supported

efficiently, e.g., value-based search or range search.2. Access Time — time to find a particular data item or

set of items.3. Insertion Time — time taken to insert a new data

item (includes time to find the right place to insert).4. Deletion Time — time to delete an item (includes

time taken to find item, as well as to update the indexstructure).

5. Space Overhead — additional space occupied by anindex structure.

6. We may have more than one index or hash function for afile. (The library may have card catalogues by author, subjector title.)

7. The attribute or set of attributes used to look up records ina file is called the search key (not to be confused withprimary key, etc.).

Ordered Indices

1. In order to allow fast random access, an index structure maybe used.

2. A file may have several indices on different search keys.3. If the file containing the records is sequentially ordered, the

index whose search key specifies the sequential order of thefile is the primary index, or clustering index. Note: Thesearch key of a primary index is usually the primary key, butit is not necessarily so.

4. Indices whose search key specifies an order different fromthe sequential order of the file are called the secondaryindices, or nonclustering indices.

Dense and Sparse Indices

1. There are two types of ordered indices:Dense Index:

• An index record appears for every search key value infile.

• This record contains search key value and a pointer tothe actual record.

Sparse Index:

• Index records are created only for some of the records.• To locate a record, we find the index record with the

largest search key value less than or equal to the searchkey value we are looking for.

• We start at that record pointed to by the index record,and proceed along the pointers in the file (that is,sequentially) until we find the desired record.

LESSON 39:

FILE ORGANISATION METHOD - II

124

DATABASE MANAGEMENT

Dense index.2. Notice how we would find records for Perryridge branch

using both methods. (Do it!)

Sparse index.3. Dense indices are faster in general, but sparse indices require

less space and impose less maintenance for insertions anddeletions. (Why?)

4. A good compromise: to have a sparse index with one entryper block.Why is this good?• Biggest cost is in bringing a block into main memory.• We are guaranteed to have the correct block with this

method, unless record is on an overflow block (actuallycould be several blocks).

• Index size still small.

Multi-Level Indices

1. Even with a sparse index, index size may still grow toolarge. For 100,000 records, 10 per block, at one index recordper block, that’s 10,000 index records! Even if we can fit 100index records per block, this is 100 blocks.

2. If index is too large to be kept in main memory, a searchresults in several disk reads.• If there are no overflow blocks in the index, we can use

binary search.• This will read as many as 1+log2(b) blocks (as many as

7 for our 100 blocks).• If index has overflow blocks, then sequential search

typically used, reading all b index blocks.3. Solution: Construct a sparse index on the index

4. Two-level sparse index.5. Use binary search on outer index. Scan index block found

until correct index record found. Use index record as before- scan block pointed to for desired record.

6. For very large files, additional levels of indexing may berequired.

7. Indices must be updated at all levels when insertions ordeletions require it.

8. Frequently, each level of index corresponds to a unit ofphysical storage (e.g. indices at the level of track, cylinder anddisk).

Index UpdateRegardless of what form of index is used, every index must beupdated whenever a record is either inserted into or deletedfrom the file.1. Deletion

• Find (look up) the record• If the last record with a particular search key value,

delete that search key value from index.• For dense indices, this is like deleting a record in a file.• For sparse indices, delete a key value by replacing key

value’s entry in index by next search key value. If thatvalue already has an index entry, delete the entry.

2. Insertion

• Find place to insert.• Dense index: insert search key value if not present.• Sparse index: no change unless new block is created. (In

this case, the first search key value appearing in the newblock is inserted into the index).

Secondary Indices

1. If the search key of a secondary index is not a candidate key,it is not enough to point to just the first record with eachsearch-key value because the remaining records with thesame search-key value could be anywhere in the file.Therefore, a secondary index must contain pointers to allthe records.

125

DATABASE MANAGEMENT

Sparse secondary index on cname.2. We can use an extra-level of indirection to implement

secondary indices on search keys that are not candidate keys.A pointer does not point directly to the file but to a bucketthat contains pointers to the file.• To perform a lookup on Peterson, we must read all

three records pointed to by entries in bucket 2.• Only one entry points to a Peterson record, but three

records need to be read.• As file is not ordered physically by cname, this may take

3 block accesses.3. Secondary indices must be dense, with an index entry for

every search-key value, and a pointer to every record in thefile.

4. Secondary indices improve the performance of queries onnon-primary keys.

5. They also impose serious overhead on databasemodification: whenever a file is updated, every index mustbe updated.

Designer must decide whether to use secondary indices or not.

Static HashingIndex schemes force us to traverse an index structure. Hashingavoids this.

Hash File Organization

1. Hashing involves computing the address of a data item bycomputing a function on the search key value.

2. A hash function h is a function from the set of all searchkey values K to the set of all bucket addresses B.• We choose a number of buckets to correspond to the

number of search key values we will have stored in thedatabase.

• To perform a lookup on a search key value iK , wecompute

)iK(h

, and search the bucket with that address.• If two search keys i and j map to the same address,

because

)jK(h)iK(h =

, then the bucket at the addressobtained will contain records with both search keyvalues.

• In this case we will have to check the search key value ofevery record in the bucket to get the ones we want.

• Insertion and deletion are simple.

Hash Functions

1. A good hash function gives an average-case lookup that is asmall constant, independent of the number of search keys.

2. We hope records are distributed uniformly among thebuckets.

3. The worst hash function maps all keys to the same bucket.4. The best hash function maps all keys to distinct addresses.5. Ideally, distribution of keys to addresses is uniform and

random.6. Suppose we have 26 buckets, and map names beginning

with ith letter of the alphabet to the ith bucket.• Problem: this does not give uniform distribution.• Many more names will be mapped to “A” than to “X”.• Typical hash functions perform some operation on the

internal binary machine representations of characters ina key.

• For example, compute the sum, modulo # of buckets,of the binary representations of characters of thesearch key.

• using this method for 10 buckets (assuming the ithcharacter in the alphabet is represented by integer i).

Handling of Bucket Overflows

1. Open hashing occurs where records are stored in differentbuckets. Compute the hash function and search thecorresponding bucket to find a record.

2. Closed hashing occurs where all records are stored in onebucket. Hash function computes addresses within thatbucket. (Deletions are difficult.) Not used much in databaseapplications.

3. Drawback to our approach: Hash function must bechosen at implementation time.• Number of buckets is fixed, but the database may

grow.• If number is too large, we waste space.• If number is too small, we get too many “collisions”,

resulting in records of many search key values being inthe same bucket.

• Choosing the number to be twice the number ofsearch key values in the file gives a good space/performance tradeoff.

Hash Indices

1. A hash index organizes the search keys with their associatedpointers into a hash file structure.

2. We apply a hash function on a search key to identify abucket, and store the key and its associated pointers in thebucket (or in overflow buckets).

3. Strictly speaking, hash indices are only secondary indexstructures, since if a file itself is organized using hashing,there is no need for a separate hash index structure on it.

126

DATABASE MANAGEMENT

Dynamic Hashing

1. As the database grows over time, we have three options:• Choose hash function based on current file size. Get

performance degradation as file grows.• Choose hash function based on anticipated file size.

Space is wasted initially.• Periodically re-organize hash structure as file grows.

Requires selecting new hash function, recomputing alladdresses and generating new bucket assignments.Costly, and shuts down database.

2. Some hashing techniques allow the hash function to bemodified dynamically to accommodate the growth orshrinking of the database. These are called dynamic hashfunctions.• Extendable hashing is one form of dynamic hashing.• Extendable hashing splits and coalesces buckets as

database size changes.• This imposes some performance overhead, but space

efficiency is maintained.• As reorganization is on one bucket at a time, overhead

is acceptably low.3. How does it work?

General extendable hash structure.• We choose a hash function that is uniform and

random that generates values over a relatively largerange.

• Range is b-bit binary integers (typically b=32).• is over 4 billion, so we don’t generate that many

buckets!• Instead we create buckets on demand, and do not use

all b bits of the hash initially.• At any point we use i bits where .• The i bits are used as an offset into a table of bucket

addresses.• Value of i grows and shrinks with the database.• Note that the i appearing over the bucket address table

tells how many bits are required to determine thecorrect bucket.

• It may be the case that several entries point to the samebucket.

• All such entries will have a common hash prefix, butthe length of this prefix may be less than i.

• So we give each bucket an integer giving the length ofthe common hash prefix.

• Number of bucket entries pointing to bucket j is then2(i–ij).

4. To find the bucket containing search key value Ki:• Compute h(Kt).• Take the first i high order bits of h(Kt).• Look at the corresponding table entry for this i-bit

string.• Follow the bucket pointer in the table entry.

5. We now look at insertions in an extendable hashing scheme.• Follow the same procedure for lookup, ending up in

some bucket j.• If there is room in the bucket, insert information and

insert record in the file.• If the bucket is full, we must split the bucket, and

redistribute the records.• If bucket is split we may need to increase the number

of bits we use in the hash.6. Two cases exist:

1. If i = ij, then only one entry in the bucket address tablepoints to bucket j.• Then we need to increase the size of the bucket

address table so that we can include pointers to thetwo buckets that result from splitting bucket j.

• We increment i by one, thus considering more ofthe hash, and doubling the size of the bucketaddress table.

• Each entry is replaced by two entries, eachcontaining original value.

• Now two entries in bucket address table point tobucket j.

• We allocate a new bucket z, and set the secondpointer to point to z.

• Set ij and iz to i.• Rehash all records in bucket j which are put in either

j or z.• Now insert new record.• It is remotely possible, but unlikely, that the new

hash will still put all of the records in one bucket.• If so, split again and increment i again.

2. If i>ij , then more than one entry in the bucket addresstable points to bucket j.• Then we can split bucket j without increasing the

size of the bucket address table (why?).• Note that all entries that point to bucket j

correspond to hash prefixes that have the samevalue on the leftmost ij bits.

127

DATABASE MANAGEMENT

• We allocate a new bucket z, and set ij and iz to theoriginal ij value plus 1.

• Now adjust entries in the bucket address table thatpreviously pointed to bucket j.

• Leave the first half pointing to bucket j, and makethe rest point to bucket z.

• Rehash each record in bucket j as before.• Reattempt new insert.

7. Note that in both cases we only need to rehash records inbucket j.

8. Deletion of records is similar. Buckets may have to becoalesced, and bucket address table may have to be halved.

9. Insertion is illustrated for the example deposit file• 32-bit hash values on bname are shown An initial

empty hash structure is shown• We insert records one by one.• We (unrealistically) assume that a bucket can only hold

2 records, in order to illustrate both situationsdescribed.

• As we insert the Perryridge and Round Hill records,this first bucket becomes full.

• When we insert the next record (Downtown), we mustsplit the bucket.

• Since i = i0, we need to increase the number of bits weuse from the hash.

• We now use 1 bit, allowing us 21 = 2 buckets.• This makes us double the size of the bucket address

table to two entries.• We split the bucket, placing the records whose search

key hash begins with 1 in the new bucket, and thosewith a 0 in the old bucket

• Next we attempt to insert the Redwood record, andfind it hashes to 1.

• That bucket is full, and i = i1.• So we must split that bucket, increasing the number of

bits we must use to 2.• This necessitates doubling the bucket address table

again to four entries (F• We rehash the entries in the old bucket.• We continue on for the deposit records , obtaining the

extendable hash structure8. Advantages:

• Extendable hashing provides performance that doesnot degrade as the file grows.

• Minimal space overhead - no buckets need be reservedfor future use. Bucket address table only contains onepointer for each hash value of current prefix length.

9. Disadvantages:

• Extra level of indirection in the bucket address table• Added complexity

Points to Ponder

• An index for a file works like a catalogue in a library• In order to allow fast random access, an index structure may

be used.• Indices whose search key specifies an order different from

the sequential order of the file are called the secondaryindices

• Dense Index appears for every search key value in file. Thisrecord contains search key value and a pointer to the actualrecord.

• Sparse Index: are created only for some of the records.• Regardless of what form of index is used, every index must

be updated whenever a record is either inserted into ordeleted from the file.

• Hashing involves computing the address of a data item bycomputing a function on the search key value.

Review Terms

• Index• Ordered index• Dense Index• Sparse index• Hashing• Hash function• Static hashing• Dynamic hashing

Students Activity

1. Define index and hashing?

2. Define ordered indices?

3. Differentiate between sparse index & dense index?

128

DATABASE MANAGEMENT

4. Define secondary index & multilevel index?

5. Define Hashing?What is hash function?

6. Differentiate between Static hashing & dynamic hashing?

129

DATABASE MANAGEMENT

Student Notes

130

DATABASE MANAGEMENT

Lesson Objectives

• Transactions• ACID properties• Transaction state• Implementation of ACID propertiesCollections of operations that form a single logical unit ofwork are called transac-tions. A database system must ensureproper execution of transactions despite fail-ures-either theentire transaction executes, or none of it does. For example, atransfer of funds from a checking account to a savings accountis a single operation from the customer’s standpoint; within thedatabase system, however, it consists of several operations.Clearly, it is essential that all these operations occur, or that, incase of a failure, none occur. Furthermore, it must manageconcurrent execution of transactions in a way that avoids theintroduction of inconsistency. In our funds-transfer example, atransaction computing the customer’s total money might seethe checking-account balance before it is debited by the funds -transfer transaction, but see the savings balance after it iscredited. As a result, it would obtain an incorrect result.

TransactionA transaction is a unit of program execution that accesses andpossibly updates var-ious data items. Usually, a transaction isinitiated by a user program written in a high-level data-manipu-lation language or programming language (for example, SQL,COBOL, C, C++, or Java), where it is delimited by statements(or function calls) of the form begin transaction and endtransaction. The transaction consists of all opera-tions executedbetween the begin transaction and end transaction.To ensure integrity of the data, we require that the databasesystem maintain the following properties of the transactions.These properties are often called the ACID properties, theacronym is derived from the first letter of each of the fourproperties.• Atomicity. Either all operations of the transaction are

reflected properly in the database, or none are.• Consistency. Execution of a transaction in isolation (that

is, with no other transaction executing concurrently)preserves the consistency of the database.

• Isolation. Even though multiple transactions may executeconcurrently, the system guarantees that, for every pair oftransactions Ti and Tj, it appears to Ti that either Tj finishedexecution before Ti started, or Tj started execu-tion after Tifinished. Thus, each transaction is unaware of othertransactions executing concurrently in the system.

• Durability. After a transaction completes successfully, thechanges it has made to the database persist, even if there aresystem failures.

To gain a better understanding of ACID properties and theneed for them, con-sider a simplified banking system consistingof several accounts and a set of trans-actions that access andupdate those accounts. For the time being, we assume that thedatabase permanently resides on disk, but that some portion ofit is temporarily residing in main memory.Transactions access data using two operations:• read(X), which transfers the data item X from the database

to a local buffer belonging to the transaction that executedthe read operation.

• write(X), which transfers the data item X from the the localbuffer of the trans action that executed the write back to thedatabase.

In a real database system, the write operation does not necessar-ily result in the imme-diate update of the data on the disk; thewrite operation may be temporarily stored in memory andexecuted on the disk later. For now, however, we shall assumethat the write operation updates the database immediately.Let Ti be a transaction that transfers $50 from account A toaccount B. This trans-action can be defined as

Ti: read(A);A := A-50;write(A);read(B);

B := B + 50;write(B).

Let us now consider each of the ACID requirements. (For easeof presentation, we consider them in an order different fromthe order A-C-I-D).• Consistency: The consistency requirement here is that the

sum of A and B be unchanged by the execution of thetransaction. Without the consistency requirement, moneycould be created or destroyed by the transaction It can beverified easily that, if the database is consistent before anexecution of the transaction, the database remainsconsistent after the execution of the transac-tion.

Ensuring consistency for an individual transaction is theresponsibility of the application programmer who codes thetransaction. This task may be facil-itated by automatic testing ofintegrity constraints• Atomicity: Suppose that, just before the execution of

transaction Ti the values of accounts A and Bare $1000 and$2000, respectively. Now suppose that, dur-ing theexecution of transaction Ti, a failure occurs that prevents Tifrom com-pleting its execution successfully. Examples ofsuch failures include power failures, hardware failures, andsoftware errors. Further, suppose that the fail-ure happenedafter the write (A) operation but before the write(B)

LESSON 40:

TRANSACTIONS MANAGEMENT

131

DATABASE MANAGEMENT

operation. In this case, the values of accounts A and Breflected in the database are $950 and $2000. The systemdestroyed $50 as a result of this failure. In particular, wenote that the sum A + B is no longer preserved.

Thus, because of the failure, the state of the system no longerreflects a real state of the world that the database is supposed tocapture. We term such a state an inconsistent state. We mustensure that such inconsistencies are not visible in a databasesystem. Note, however, that the system must at some point bein an inconsistent state. Even if transaction Ti is executed tocomple-tion, there exists a point at which the value of accountA is $950 and the value of account B is $2000, which is clearlyan inconsistent state. This state, how-ever, is eventually replacedby the consistent state where the value of account A is $950,and the value of account B is $2050. Thus, if the transactionnever started or was guaranteed to complete, such an inconsis-tent state would not be visible except during the execution ofthe transaction. That is the reason for the atomicity requirement:If the atomicity property is present, all actions of the transac-tion are reflected in the database, or none are.The basic idea behind ensuring atomicity is this: The databasesystem keeps track (on disk) of the old values of any data onwhich a transaction performs a write, and, if the transactiondoes not complete its execution, the database sys-tem restoresthe old values to make it appear as though the transaction neverexecuted.• Durability: Once the execution of the transaction

completes successfully, and the user who initiated thetransaction has been notified that the transfer of funds hastaken place, it must be the case that no system failure willresult in a loss of data corresponding to this transfer offunds.

The durability property guarantees that, once a transactioncompletes suc-cessfully, all the updates that it carried out on thedatabase persist, even if there is a system failure after thetransaction completes execution.We assume for now that a failure of the computer system mayresult in loss of data in main memory, but data written to diskare never lost. We can guarantee durability by ensuring thateither1. The updates carried out by the transaction have been written

to disk be-fore the transaction completes.2. Information about the updates carried out by the

transaction and written to disk is sufficient to enable thedatabase to reconstruct the updates when the databasesystem is restarted after the failure.

Ensuring durability is the responsibility of a component of thedatabase sys-tem called the recovery-management component.The transaction-manage-ment component and the recovery-management component are closely re-lated.• Isolation: Even if the consistency and atomicity properties

are ensured for each transaction, if several transactions areexecuted concurrently, their oper-ations may interleave insome undesirable way, resulting in an inconsistent state.

For example, as we saw earlier, the database is temporarilyinconsistent while the transaction to transfer funds from A to B

is executing, with the de-ducted total written to A and theincreased total yet to be written to B. If a second concurrentlyrunning transaction reads A and B at this intermediate pointand computes A + B, it will observe an inconsistent value.Furthermore, if this second transaction then performs updateson A and B based on the in-consistent values that it read, thedatabase may be left in an inconsistent state even after bothtransactions have completed.A way to avoid the problem of concurrently executing transac-tions is to execute transactions serially-that is, one after theother. However, concur-rent execution of transactions providessignificant performance benefits Other solutions have thereforebeen developed; they allow multiple transactions to executeconcurrently.The isolation property of a transaction ensures that the concur-rent execution of transactions results in a system state that isequivalent to a state that could have been obtained had thesetransactions executed one at a time in some order. Ensuring theisolation property is the responsibility of a component of thedatabase system called the concurrency-control component.

Transaction StateIn the absence of failures, all transactions complete successfully.However, as we noted earlier, a transaction may not alwayscomplete its execution successfully. Such a transaction is termedaborted. If we are to ensure the atomicity property, an abortedtransaction must have no effect on the state of the database.Thus, any changes that the aborted transaction made to thedatabase must be undone. Once the changes caused by anaborted transaction have been undone, we say that the transac-tion has been rolled back. It is part of the responsibility of therecovery scheme to manage transaction aborts.A transaction that completes its execution successfully is said tobe committed. A committed transaction that has performedupdates transforms the database into a new consistent state,which must persist even if there is a system failure.Once a transaction has committed, we cannot undo its effects byaborting it. The only way to undo the effects of a committedtransaction is to execute a compensating transaction. Forinstance, if a transaction added $20 to an account, the compen-sating transaction would subtract $20 from the account.However, it is not always possible to create such a compensatingtransaction. Therefore, the responsibility of writing andexecuting a compensating transaction is left to the user, and isnot handled by the database system.We need to be more precise about what we mean by successfulcompletion of a trans-action. We therefore establish a simpleabstract transaction model. A transaction must be in one of thefollowing states:• Active, the initial state; the transaction stays in this state

while it is executing . Partially committed, after the finalstatement has been executed

• Failed, after the discovery that normal execution can nolonger proceed

• Aborted, after the transaction has been rolled back and thedatabase has been restored to its state prior to the start ofthe transaction

132

DATABASE MANAGEMENT

• Committed, after successful completionThe state diagram corresponding to a transaction appears inFigure 23.1 . We say that a transaction has committed only if ithas entered the committed state. Simi-larly, we say that atransaction has aborted only if it has entered the aborted state.A transaction is said to have terminated if has either committedor aborted.A transaction starts in the active state. When it finishes its finalstatement, it enters the partially committed state. At this point,the transaction has completed its exe-cution, but it is stillpossible that it may have to be aborted, since the actual outputmay still be temporarily residing in main memory, and thus ahardware failure may preclude its successful completion.The database system then writes out enough information todisk that, even in the event of a failure, the updates performedby the transaction can be re-created when the system restartsafter the failure. When the last of this information is writtenout, the transaction enters the committed state.As mentioned earlier, we assume for now that failures do notresult in loss of data on disk.A transaction enters the failed state after the system determinesthat the transac-tion can no longer proceed with its normalexecution (for example, because of hard-ware or logical errors).Such a transaction must be rolled back. Then, it enters theaborted state. At this point, the system has two options:

Figure 23.1 state diagram of a transaction• It can restart the transaction, but only if the transaction was

aborted as a result of some hardware or software error thatwas not created through the internal logic of thetransaction. A restarted transaction is considered to be a newtransaction.

• It can kill the transaction. It usually does so because ofsome internal logical error that can be corrected only byrewriting the application program, or be-cause the input wasbad, or because the desired data were not found in thedatabase.

We must be cautious when dealing with observable externalwrites, such as writes to a terminal or printer. Once such a writehas occurred, it cannot be erased, since it may have been seenexternal to the database system. Most systems allow such writesto take place only after the transaction has entered the commit-ted state. One way to implement such a scheme is for thedatabase system to store any value associated with such external

writes temporarily in nonvolatile storage, and to perform the ac-tual writes only after the transaction enters the committed state.If the system should fail after the transaction has entered thecommitted state, but before it could complete the externalwrites, the database system will carry out the external writes(using\~he data in nonvolatile storage) when the system isrestarted.Handling external writes can be more complicated in somesituations. For example suppose the external action is that ofdispensing cash at an automated teller machine, and the systemfails just before the cash is actually dispensed (we assume thatcash can be dispensed atomically). It makes no sense todispense cash when the system is restarted, since the user mayhave left the machine. In such a case a compensat-ing transac-tion, such as depositing the cash back in the users account,needs to be executed when the system is restarted.For certain applications, it may be desirable to allow activetransactions to dis-play data to users, particularly for long-duration transactions that run for minutes or hours.Unfortunately, we cannot allow such output of observable dataunless we are willing to compromise transaction atomicity. Mostcurrent transaction systems ensure atomicity and, therefore,forbid this form of interaction with users.

Implementation of Atomicity andDurabilityThe recovery-management component of a database system cansupport atomicity and durability by a variety of schemes. Wefirst consider a simple, but extremely in-efficient, scheme calledthe shadow copy scheme. This scheme, which is based onmaking copies of the database, called shadow copies, assumesthat only one transac-tion is active at a time. The scheme alsoassumes that the database is simply a file on disk. A pointercalled db-pointer is maintained on disk; it points to the currentcopy of the database.In the shadow-copy scheme, a transaction that wants to updatethe database first creates a complete copy of the database. Allupdates are done on the new database copy, leaving the originalcopy, the shadow copy, untouched. If at any point the trans-action has to be aborted, the system merely deletes the newcopy. The old copy of the database has not been affected.If the transaction completes, it is committed as follows. First,the operating system is asked to make sure that all pages of thenew copy of the database have been written out to disk. (Unixsystems use the flush command for this purpose.) After theoperat-ing system has written all the pages to disk, the databasesystem updates the pointer db-pointer to point to the new copyof the database; the new copy then becomes the current copy ofthe database. The old copy of the database is then deleted. Fig-ure 23.2 depicts the scheme, showing the database state beforeand after the update.

Partially committed

Failed

Active

Committed

Aborted

db-pointer db-pointer

Old copy of

database

New copy of

database

Old copy of database (to be

deleted )

(a) Before update (a) After update

133

DATABASE MANAGEMENT

Figure 23.2 Shadow-copy technique for atomicity and durabil-ity.The transaction is said to have been committed at the pointwhere the updated db-pointer is written to disk.We now consider how the technique handles transaction andsystem failures. First, consider transaction failure. If thetransaction fails at any time before db-pointer is updated, theold contents of the database are not affected. We can abort thetrans-action by just deleting the new copy of the database. Oncethe transaction has been committed, all the updates that itperformed are in the database pointed to by db-pointer. Thus,either all updates of the transaction are reflected, or none of theeffects are reflected, regardless of transaction failure.Now consider the issue of system failure. Suppose that thesystem fails at any time before the updated db-pointer is writtento disk. Then, when the system restarts, it will read db-pointerarid will thus see the original contents of the database, andnone of the effects of the transaction will be visible on thedatabase. Next, suppose that the system fails after db-pointerhas been updated on disk. Before the pointer is updated, allupdated pages of the new copy of the database were written todisk. Again, we assume that, once a file is written to disk, itscontents will not be damaged even if there is a system failure.Therefore, when the system restarts, it will read db-pointer andwill thus see the contents of the database after all the updatesperformed by the transaction.The implementation actually depends on the write to db-pointer being atomic; that is, either all its bytes are written ornone of its bytes are written. If some of the bytes of thepointer were updated by the write, but others were not, thepointer is meaningless, and neither old nor new versions of thedatabase may be found when the system restarts. Luckily, disksystems provide atomic updates to entire blocks, or at least to adisk sector. In other words, the disk system guarantees that itwill update db-pointer atomically, as long as we make sure thatdb-pointer lies entirely in a single sector, which we can ensure bystoring db-pointer at the beginning of a block.Thus, the atomicity and durability properties of transactions areensured by the shadow-copy implementation of the recovery-management component.As a simple example of a transaction outside the databasedomain, consider a text editing session. An entire editingsession can be modeled as a transaction. The actions executed bythe transaction are reading and updating the file. Saving the fileat the end of editing corresponds to a commit of the editingtransaction; quitting the editing session without saving the filecorresponds to an abort of the editing transaction.Many text editors use essentially the implementation justdescribed, to ensure that an editing session is transactional. Anew file is used to store the updated file. At the end of theediting session, if the updated file is to be saved, the text editoruses a file rename command to rename the new file to have theactual file name. The rename, assumed to be implemented as anatomic operation by the underlying file system, deletes the oldfile as well.

Unfortunately, this implementation is extremely inefficient inthe context of large databases, since executing a single transac-tion requires copying the entire database. Furthermore, theimplementation does not allow transactions to executeconcurrently with one another. There are practical ways ofimplementing atomicity and durability that are much lessexpensive and more powerful.

Points to Ponder

• Collections of operations that form a single logical unit ofwork are called transac-tions

• Usually, a transaction is initiated by a user program writtenin a high-level data-manipulation language or programminglanguage

• Transactions access data using two operations:Read(X)Write(X)

• A transaction may not always complete its executionsuccessfully. Such a transaction is termed aborted.

• A transaction must be in one of the following states:ActiveFailedAbortedCommited

• The recovery-management component of a database systemcan support atomicity and durability by a variety of schemes

Review Terms

• Transactions• ACID properties• Transaction state• Implementation of ACID properties

Students Activity

1. Define transaction with the help of an example?

2. Define ACID properties of a database?

3. Define various states of a transaction ?

134

DATABASE MANAGEMENT

4. Define the implementation of Atomicity property of adatabase?

5. Define implementation of Durability?

6. Define the importance of consistency in a databasetransaction?

7. Define Isolation of a database?

135

DATABASE MANAGEMENT

Student Notes

136

DATABASE MANAGEMENT

LESSON 41:

CONCURRENCY CONTROL - I

Lesson objectives

• concurrent executions• Advantages of concurrency• Problems with concurrency• Serializibility

Concurrent ExecutionsTransaction-processing systems usually allow multiple transac-tions to run concur-rently. Allowing multiple transactions toupdate data concurrently causes several complications withconsistency of the data, as we saw earlier. Ensuring consistencyin spite of concurrent execution of transactions requires extrawork; it is far easier to insist that transactions run serially-that is,one at a time, each starting only after the previous one hascompleted. However, there are two good reasons for allowingconcurrency:• Improved throughput and resource utilization. A

transaction consists of many steps. Some involve I/Oactivity; others involve CPU activity. The CPU and the disksin a computer system can operate in parallel. Therefore, I/Oactivity can be done in parallel with processing at the CPU.The parallelism of the CPU and the I/O system cantherefore be exploited to run multiple transactions inparallel. While a read or write on behalf of one transactionis in progress on one disk, another transaction can berunning in the CPU, while another disk may be executing aread or write on behalf of a third transaction. All of thisincreases the throughput of the system-that is, the numberof transactions executed in a given amount of time.Correspondingly, the processor and disk utilization alsoincrease; in other words, the processor and disk spend lesstime idle, or not performing any useful work.

• Reduced waiting time. There may be a mix oftransactions running on a sys-tem, some short and somelong. If transactions run serially, a short transaction mayhave to wait for a preceding long transaction to complete,which can lead to unpredictable delays in running atransaction. If the transactions are oper-ating on differentparts of the database, it is better tolet them runconcurrently, sharing the CPU cycles and disk accessesamong them. Concurrent execution reduces theunpredictable delays in running transactions. Moreover, italso reduces the average response time: the average time fora transaction to be completed after it has been submitted.

The motivation for using concurrent execution in a database isessentially the same as the motivation for using multiprogram-ming in an operating system.When several transactions run concurrently, database consistencycan be destroyed despite the correctness of each individualtransaction. In this section, we present the concept of schedules

to help identify those executions that are guaranteed to ensureconsistency.The database system must control the interaction among theconcurrent trans-actions to prevent them from destroying theconsistency of the database. It does so through a variety ofmechanisms called concurrency-control schemesConsider again the simplified banking system of Section 23.1,which has several accounts, and a set of transactions that accessand update those accounts. Let Tl and T2 be two transactionsthat transfer funds from one account to another. Transaction Tltransfers $50 from account A to account R It is defined as

T1: read(A);A := A-50;write(A);read(B);

B := B + 50;write(B).

Transaction T2 transfers 10 percent of the balance from accountA to account B. It is defined as

T2: read(A);temp:= A * 0.1;A :=A - temp;

write(A);read(B);

B := B + temp;write(B).

Suppose the current values of accounts A and Bare $1000 and$2000, respectively. Suppose also that the two transactions areexecuted one at a time in the order Tl followed by T2. Thisexecution sequence appears in Figure 23.3. In the figure, thesequence of instruction steps is in chronological order from topto bottom, with in-structions of Tl appearing in the leftcolumn and instructions of T2 appearing in the right column.The final values of accounts A and B, after the execution inFigure 23.3 takes place, are $855 and $2145, respectively. Thus,the total amount of money inT1 T2

read(A) read(A)A := A-50 temp:= A * 0.1write (A) A := A ~ tempread(B) write(A)B := B + 50 read(B)write (B) B := B + temp

write(B)Figure 41.1 Schedule I-a serial schedule in which Tl is followedby T2.

137

DATABASE MANAGEMENT

Accounts A and B-that is, the sum A + B - is preserved afterthe execution of both transactions.Similarly, if the transactions are executed one at a time in theorder T2 followed by TI, then the corresponding executionsequence is that of Figure 41.2. Again, as expected, the sum A +B is preserved, and the final values of accounts A and Bare $850and $2150, respectively.The execution sequences just described are called schedules.They represent the chronological order in which instructions areexecuted in the system. Clearly, a sched-ule for a set of transac-tions must consist of all instructions of those transactions, andmust preserve the order in which the instructions appear in eachindividual transac-tion. For example, in transaction TI, theinstruction write(A) must appear before the instruction read(B),in any valid schedule. In the following discussion, we shall referto the first execution sequence (TI followed by T2) as schedule 1,and to the second execution sequence (T2 followed by T1 )asschedule 2.These schedules are serial: Each serial schedule consists of asequence of instruc-tions from various transactions, where theinstructions belonging to one single trans-action appeartogether in that schedule. Thus, for a set of n transactions, thereexist n! different valid serial schedules.When the database system executes several transactionsconcurrently, the corre-sponding schedule no longer needs to beserial. If two transactions are running con-currently, theoperating system may execute one transaction for a little while,then perform a context switch, execute the second transactionfor some time, and then switch back to the first transaction forsome time, and so on. With multiple transac-tions, the CPUtime is shared among all the transactions.Several execution sequences are possible, since the variousinstructions from both transactions may now be interleaved. Ingeneral, it is not possible to predict exactly how many instruc-tions of a transaction will be executed before the CPU switchestoTl T2

read(A)temp:= A * 0.1A :=A - tempwrite(A)read(B)B := B + tempwrite(B)

read(A)A := A-50 write(A)read(B)B := B + 50write(B)Figure 41.2 Schedule 2-a serial schedule in which T2 is followedby TI.

T1 T2

read(A)A := A-50write(A)

read(A)temp := A * 0.1A :=A - tempwrite(A)

read(B)B := B + 50write(B)

read(B)B := B + tempwrite(B)

Figure 41.3 Schedule 3-a concurrent schedule equivalent toschedule 1.another transaction. Thus, the number of possible schedulesfor a set of n transactions is much larger than n!.Returning to our previous example, suppose that the twotransactions are exe-cuted concurrently. One possible scheduleappears in Figure 41.3. After this execution takes place, we arriveat the same state as the one in which the transactions areexecuted serially in the order Tl followed by T2. The sum A + Bis indeed preserved.Not all concurrent executions result in a correct state. Toillustrate, consider the schedule of Figure 23.6. After theexecution of this schedule, we arrive at a state where the finalvalues of accounts A and Bare $950 and $2100, respectively.This final state is an inconsistent state, since we have gained $50in the process of the concur-rent execution. Indeed, the sum A+ B is not preserved by the execution of the two transactions.If control of concurrent execution is left entirely to the operat-ing system, many possible schedules, including ones that leavethe database in an inconsistent state, such as the one justdescribed, are possible. It is the job of the database system toensure that any schedule that gets executed will leave thedatabase in a consistent state. The concurrency-control compo-nent of the database system carries out this task.We can ensure consistency of the database under concurrentexecution by making sure that any schedule that executed hasthe same effect as a schedule that could have occurred withoutany concurrent execution. That is, the schedule should, in somesense, be equivalent to a serial schedule. We examine this idea inSection

SerializabilityThe database system must control concurrent execution oftransactions, to ensure that the database state remains consis-tent. Before we examine how the databaseT1 T2

read(A)A := A-50

read(A)

138

DATABASE MANAGEMENT

temp := A * 0.1A :=A – tempwrite (A)read(B)

write (A)read(B)B := B + 50write (B)B := B + tempwrite (B)Figure 41.4 Schedule 4-a concurrent schedule.system can carry out this task, we must first understand whichschedules will en-sure consistency, and which schedules will not.Since transactions are programs, it is computationally difficult todetermine ex-actly what operations a transaction performs andhow operations of various trans-actions interact. For thisreason, we shall not interpret the type of operations that atransaction can perform on a data item. Instead, we consideronly two operations: read and write. We thus assume that,between a read (Q) instruction and a write (Q) instruction on adata item Q, a transaction may perform an arbitrary sequence ofop-erations on the copy of Q that is residing in the local bufferof the transaction. Thus, the only significant operations of atransaction, from a scheduling point of view, are its read andwrite instructions. We shall therefore usually show only readand write instructions in schedules, as we do in schedule 3 inFigure 41.5.In this section, we discuss different forms of schedule equiva-lence; they lead to the notions of conflict serializability and viewserializability.T1 T2

read(A)write(A)

read(A)write(A)

read(B)write(B)

read(B)write(B)

Figure 41.5 Schedule 3-showing only the read and writeinstructions.

Points to Ponder

• Transaction-processing systems usually allow multipletransactions to run concur-rently

• Allowing multiple transactions to update data concurrentlycauses several complications with consistency of the data

• The parallelism of the CPU and the I/O system cantherefore be exploited to run multiple transactions inparallel

• Concurrent execution reduces the unpredictable delays inrunning transactions. Moreover, it also reduces the averageresponse time

• The database system must control concurrent execution oftransactions, to ensure that the database state remainsconsistent

• Since transactions are programs, it is computationallydifficult to determine ex-actly what operations a transactionperforms and how operations of various trans-actionsinteract

Review Terms

• Concurrent executions• Advantages of concurrency• Problems with concurrency• Serializibility

Students Activity

1. Define concurrency?

2. Define the advantages & disadvantages of concurrencycontrol?

3. Define serializibility?

4. When a database can enter into an inconsistent state?

139

DATABASE MANAGEMENT

5. How we can ignore inconsistency problem?

140

DATABASE MANAGEMENT

Student Notes

141

DATABASE MANAGEMENT

Lesson objectives

• Types of serializibility• Conflict Serializability• View Serializability• Implementation of isolation• Transaction definition in SQL

Conflict SerializabilityLet us consider a schedule 5 in which there are two consecutiveinstructions Ii and Ij, of transactions Ti and Tj, respectively (i ≠j). If Ii and Ij refer to different data items, then we can swap Iiand Ij without affecting the results of any instruction in theschedule. However, if Ii and Ij refer to the same data item Q,then the order of the two steps may matter. Since we are dealingwith only read and write instructions, there are four cases thatwe need to consider:1. Ii = read(Q), Ij = read(Q). The order of Ii and Ij does not

matter, since the same value of Q is read by Ti and Tj,regardless of the order.

2. Ii = read(Q), Ij = write(Q). If Ii comes before Ij, then Tidoes not read the value of Q that is written by Tj ininstruction Ij. If Ij comes before Ii, then Ti reads the valueof Q that is written by Tj. Thus, the order of Ii and Ijmatters.

3. Ii = write(Q), Ij = read(Q). The order of Ii and Ij mattersfor reasons similar to those of the previous case.

4. Ii = write(Q), Ij = write(Q). Since both instructions are writeoperations, the order of these instructions does not affecteither Ti or Tj. However, the value obtained by the nextread(Q) instruction of 5 is affected, since the result of onlythe latter of the two write instructions is preserved in thedatabase. If there is no other write(Q) instruction after Iiand Ij in 5, then the order of Ii and Ij directly affects the finalvalue of Q in the database state that results from schedule5.

Thus, only in the case where both Ii and Ij are read instructionsdoes the relative order of their execution not matter.We say that Ii and Ij conflict if they are operations by differenttransactions on the same data item, and at least one of theseinstructions is a write operation.To illustrate the concept of conflicting instructions, we considerschedule 3, in Fig-ure 23.7. The write(A) instruction of Tlconflicts with the read(A) instruction of T2. However, thewrite(A) instruction of T2 does not conflict with the read(B)instruction of TI, because the two instructions access differentdata items.Let Ii and Ij be consecutive instructions of a schedule 5. If Ii andIj are instructions of different transactions and Ii and Ij do notconflict, then we can swap the order of Ii and Ij to produce a

new schedule S!. We expect S to be equivalent to S!, since allinstructions appear in the same order in both schedules exceptfor Ii and Ij, whose order does not matter.Since the write(A) instruction of T2 in schedule 3 of Figure 23.7does not conflict with the read(B) instruction of TI, we canswap these instructions to generate an equivalent schedule,schedule 5, in. Figure 23.8. Regardless of the initial system state,schedules 3 and 5 both produce the same final system state.We continue to swap non-conflicting instructions:• Swap the read (B) instruction of TI with the read (A)

instruction of T2.• Swap the write (B) instruction of TI with the write (A)

instruction of T2.• Swap the write(B) instruction of TI with the read(A)

instruction of T2.T1 T2

Read (A)Write (A)

Read (A)Read (B)

Write (A)Write (B)

Read (B)Write (B)

Figure 42.1 Schedule 5-schedule 3 after swapping of a pair ofinstructions.The final result of these swaps, schedule 6 of Figure 42.2, is aserial schedule. Thus, we have shown that schedule 3 isequivalent to a serial schedule. This equivalence implies that,regardless of the initial system state, schedule 3 will produce thesame final state as will some serial schedule.If a schedule S can be transformed into a schedule S’ by a seriesof swaps of non--conflicting instructions, we say that Sand S’are conflict equivalent.In our previous examples, schedule 1 is not conflict equivalentto schedule 2. How- ever, schedule 1 is conflict equivalent toschedule 3, Because the read(B) and write(B) instruction of Tlcan be swapped with the read(A) and write(A) instruction of T2.The concept of conflict equivalence leads to the concept ofconflict serializability. We say that a schedule S is conflictserializable if it is conflict equivalent to a serial schedule. Thus,schedule 3 is conflict serializable, since it is conflict equivalent tothe serial schedule 1.Finally, consider schedule 7 of Figure 42.3; it consists of onlythe significant op-erations (that is, the read and write) oftransactions T3 and T4. This schedule is not conflict serializable,

LESSON 42:

CONCURENCY CONTROL - II

142

DATABASE MANAGEMENT

since it is not equivalent to either the serial schedule <T3,T4> orthe serial schedule <T4,T3>.It is possible to have two schedules that produce the sameoutcome, but that are not conflict equivalent. For example,consider transaction T5, which transfers $10T1 T2

read(A)write(A) read(B)write(B)

read(A)write(A)read(B)write (B)

Figure 42.2 Schedule 6-a serial schedule that is equivalent toschedule 3.T3 T4

Read (Q)Write (Q)

Write (Q)Figure 42.3 Schedule 7.from account B to account A. Let schedule 8 be as defined inFigure 42.4. We claim that schedule 8 is not conflict equivalentto the serial schedule <TI, T5>, since, in schedule 8, the write (B)instruction of Ts conflicts with the read (B) instruction of TI.Thus, we cannot move all the instructions of TI before those ofT5 by swapping con-secutive non-conflicting instructions.However, the final values of accounts A and B after theexecution of either schedule 8 or the serial schedule <TI,T5> arethe same -$960 and $2040, respectively.We can see from this example that there are less stringentdefinitions of schedule equivalence than conflict equivalence.For the system to determine that schedule 8 produces the sameoutcome as the serial schedule <TI,T5>, it must analyze thecom-putation performed by TI and T5, rather than just the readand write operations. In general, such analysis is hard toimplement and is computationally expensive. How-ever, thereare other definitions of schedule equivalence based purely onthe read and write operations. We will consider one suchdefinition in the next section.

View SerializabilityIn this section, we consider a form of equivalence that is lessstringent than conflict equivalence, but that, like conflictequivalence, is based on only the read and write operations oftransactions.TI T2

read(A)A := A-50write (A)

read(B)B := B - 10write(B)

read(B)B := B + 50write(B)

read(A)A := A + 10write (A)

figure 42.4 schedule 8.Consider two schedules S and S’, where the same set oftransactions participates in both schedules. The schedules S andS’ are said to be view equivalent if three conditions are met:1. For each data item Q, if transaction Ti reads the initial value

of Q in schedule S, then transaction Ti must, in schedule S’,also read the initial value of Q.

2. For each data item Q, if transaction Ti executes read(Q) inschedule S, and if that value was produced by a write(Q)operation executed by transaction Ti, then the read(Q)operation of transaction Ti must, in schedule S’, also readthe value of Q that was produced by the same write(Q)operation of transaction Ti.

3. For each data item Q, the transaction (if any) that performsthe final write(Q) operation in schedule S must perform thefinal write(Q) operation in sched-ule S’.

Conditions 1 and 2 ensure that each transaction reads the samevalues in both schedules and, therefore, performs the samecomputation. Condition 3, coupled with conditions 1 and 2,ensures that both schedules result in the same final systemstate.In our previous examples, schedule 1 is not view equivalent toschedule 2, since, in schedule 1, the value of account A read bytransaction T2 was produced by T1 whereas this case does nothold in schedule 2. However, schedule 1 is view equivalent toschedule 3, because the values of account A and B read bytransaction T2 were produced by T1 in both schedules.The concept of view equivalence leads to the concept of viewserializability. We say that a schedule 5 is view serializable if it isview equivalent to a serial schedule.As an illustration, suppose that we augment schedule 7 withtransaction T6, and obtain schedule 9 in Figure 42.5. Schedule 9is view serializable. Indeed, it is view equivalent to the serialschedule <T3, T4, T6>, since the one read(Q) instruction readsthe initial value of Q in both schedules, and T6 performs thefinal write of Q in both schedules.Every conflict-serializable schedule is also view serializable, butthere are view-serializable schedules that are not conflictserializable. Indeed, schedule 9 is not con-flict serializable, sinceevery pair of consecutive instructions conflicts, and, thus, noswapping of instructions is possible.Observe that, in schedule 9, transactions T4 and T6 performwrite(Q) operations without having performed a read(Q)operation. Writes of this sort are called blind writes. Blindwrites appear in any view-serializable schedule that is not conflictseri-alizable.

143

DATABASE MANAGEMENT

Figure 42.5 Schedule 9-a view-serializable schedule.

Implementation of IsolationSo far, we have seen what properties a schedule must have if itis to leave the database in a consistent state and allow transac-tion failures to be handled in a safe manner. Specifically,schedules that are conflict or view serializable and cascadelesssatisfy these requirements.There are various concurrency-control schemes that we can useto ensure that, even when multiple transactions are executedconcurrently, only acceptable sched-ules are generated, regardlessof how the operating-system time-shares resources (such asCPU time) among the transactions.As a trivial example of a concurrency-control scheme, considerthis scheme: A transaction acquires a lock on the entire databasebefore it starts and releases the lock after it has committed.While a transaction holds a lock, no other transaction is allowedto acquire the lock and all must therefore wait for the lock to bereleased. As a result of the locking policy, only one transactioncan execute at a time. Therefore, only serial schedules aregenerated. These are trivially serializable, and it is easy to verifythat they are cascadeless as well.A concurrency-control scheme such as this one leads to poorperformance, since it forces transactions to wait for precedingtransactions to finish before they can start. In other words, itprovides a poor degree of concurrency. As explained in Section23.4, concurrent execution has several performance benefits.The goal of concurrency-control schemes is to provide a highdegree of concur-rency, while ensuring that all schedules that canbe generated are conflict or view serializable, and are cascadeless.The schemes have different trade-offs in terms of the amountof concurrency they allow and the amount of overhead thatthey incur. Some of them allow only conflict serializableschedules to be generated; others allow certain view-serializableschedules that are not conflict-serializable to be generated.

Transaction Definition in SQLA data-manipulation language must include a construct forspecifying the set of ac-tions that constitute a transaction.The SQL standard specifies that a transaction begins implicitly.Transactions are ended by one of these SQL statements:• Commit work commits the current transaction and begins

a new one.• Rollback work causes the current transaction to abort.The keyword work is optional in both the statements. If aprogram terminates with-out either of these commands, theupdates are either committed or rolled back -which of the twohappens is not specified by the standard and depends on theim-plementation.The standard also specifies that the system must ensure bothserializability and freedom from cascading rollback. Thedefinition of serializability used by the stan-dard is that a

schedule must have the same effect as would some serialschedule. Thus, conflict and view serializability are bothacceptable.The SQL.92 standard also allows a transaction to specify that itmay be executed in a manner that causes it to becomenonserializable with respect to other transactions. We studysuch weaker levels of consistency in Section 16.8.

Points to Ponder

• A transaction is a unit of program execution that accessesand possibly updates var-ious data items.

• To ensure integrity of the data, we require that the databasesystem maintain the following properties of thetransactions. These properties are often called the ACIDproperties.

• Allowing multiple transactions to update data concurrentlycauses several complications with consistency

• The database system must control concurrent execution oftransactions, to ensure that the database state remainsconsistent

Review Terms

• Inconsistent state• Transaction state• Active• Partially committed• Failed• Aborted• Committed• Terminated• Transaction• Restart• Kill• Observable external writes• Shadow copy scheme• Concurrent executions• Serial execution• Schedules• Conflict of operations• Conflict equivalence• Conflict serializability• View equivalence• View serializability

Students Activity

1. Define Transaction?

T3 T4 T6 read( Q) write( Q)

write(Q)

write(Q)

144

DATABASE MANAGEMENT

2. Define Conflict Serializibility?

3. Define View Serializibility?

4. Define Commit,Rollback transaction?

145

DATABASE MANAGEMENT

Student Notes

146

DATABASE MANAGEMENT

Lesson objectives

• Locking mechanism• Graph based protocol• Timestamps• Timestamp based protocol• Timestamp ordering protocols• Validation protocolWhen the lock manager receives an unlock message from atransaction, it deletes the record for that data item in the linkedlist corresponding to that transaction. It tests the record thatfollows, if any to see if that request can now be granted. If itcan, the lock man-ager grants that request, and processes therecord following it, if any, similarly, and so on.If a transaction aborts, the lock manager deletes any waitingrequest made by the transaction. Once the database system hastaken appropriate actions to undo the transaction it releases alllocks held by the aborted transaction.This algorithm guarantees freedom from starvation for lockrequests, since a re-quest can never be granted while a requestreceived earlier is waiting to be granted.

Graph-based ProtocolsBut, if we wish to develop protocols that are not two phase, weneed additional information on how each transaction will accessthe database. There are various models that can give us theadditional information, each differing in the amount ofinformation provided. The simplest model requires that wehave prior knowledge about the order in which the databaseitems will be accessed. Given such information, it is possible toconstruct locking protocols that are not two phase, but that,nevertheless, ensure conflict serializability.To acquire such prior knowledge, we impose a partial ordering® on the set D = {d1, d2, . . ., dh} of all data items. If di → dj,then any transaction accessing both di and dj must access dibefore accessing dj. This partial ordering may be the result ofeither the logical or the physical organization of the data, or itmay be imposed solely for the purpose of concurrency control.The partial ordering implies that the set D may now be viewedas a directed acyclic graph, called a database graph. In this section,for the sake of simplicity, we will restrict our attention to onlythose graphs that are rooted trees. We will present a simpleprotocol, called the tree protocol, which is restricted to employonly exclusive locks. References to other, more complex, graph-based locking protocols are in the bibliographical notes.In the tree protocol, the only lock instruction allowed is lock-X.Each transaction Ti can lock a data item at most once, and mustobserve the following rules:1. The first lock by Ti may be on any data item.

2. Subsequently, a data item Q can be locked by Ti only if theparent of Q is currently locked by Ti.

3. Data items may be unlocked at any time.4. A data item that has been locked and unlocked by Ti cannot

subsequently be relocked by T1.All schedules that are legal under the tree protocol are conflictserializable.To illustrate this protocol, consider the database graph ofFigure 43.1. The follow-ing four transactions follow the treeprotocol on this graph. We show only the lock and unlockinstructions:TlO : lock-X(B); lock-X(E); lock-X(D); unlock(B); unlock(E);lock-X(G); unlock(D); unlock(G).T11: lock-X(D); lock-X(H); unlock(D); unlock(H).T12: lock-X(B); lock-X(E); unlock(E); unlock(B).T13: lock-X(D); lock-X(H); unlock(D); unlock(H).One possible schedule in which these four transactionsparticipated appears in Figure 43.2. Note that, during itsexecution, transaction T1O holds locks on two dis-joint subtrees.Observe that the schedule of Figure 43.2 is conflict serializable.It can be shown not only that the tree protocol ensures conflictserializability, but also that this proto-col ensures freedom fromdeadlock.The tree protocol in Figure 43.2 does not ensure recoverabilityand cascadeless-ness. To ensure recoverability andcascadelessness, the protocol can be modified to not permitrelease of exclusive locks until the end of the transaction.Holding exclu-sive locks until the end of the transaction reducesconcurrency. Here is an alterna-tive that improves concurrency,but ensures only recoverability: For each data item with anuncommitted write we record which transaction performed thelast write to the data item. Whenever a transaction Ti performs aread of an uncommitted data item, we record a commitdependency of Ti on the transaction that performed the lastwrite to the data item. Transaction Ti is then not permitted tocommit until the commit of all transactions on which it has acommit dependency. If any of these trans-actions aborts, Timust also be aborted.

LESSON 43:

CONCURRENCY CONTROL - III

147

DATABASE MANAGEMENT

Figure 43.1 Tree-structured database graph

Figure 43.1 Serializable schedule under the tree protocol.The tree-locking protocol has an advantage over the two-phaselocking protocol in that, unlike two-phase locking, it is dead-lock-free, so no rollbacks are required. The tree-locking protocolhas another advantage over the two-phase locking protocol inthat unlocking may occur earlier. Earlier unlocking may lead toshorter waiting times, and to an increase in concurrency.However, the protocol has the disadvantage that, in some cases,a transaction may have to lock data items that it does not access.For example, a transaction that needs to access data items A andJ in the database graph of Figure 43.1 must lock not only A andJ, but also data items B, D, and H. This additional lockingresults in increased locking overhead, the possibility of addi-tional waiting time, and a potential decrease in concurrency.Further, without prior knowledge of what data items will needto be locked, transactions will have to lock the root of the tree,and that can reduce concurrency greatly.For a set of transactions, there may be conflict-serializableschedules that cannot be obtained through the tree protocol.Indeed, there are schedules possible under the two-phaselocking protocol that are not possible under the tree protocol,and vice versa. Examples of such schedules are explored in theexercises.

Timestamp-based ProtocolsThe locking protocols that we have described thus far determinethe order between every pair of conflicting transactions atexecution time by the first lock that both members of the pairrequest that involves incompatible modes. Another method fordetermining the serializability order is to select an orderingamong transactions in advance. The most common method fordoing so is to use a timestamp-ordering scheme.

TimestampsWith each transaction Ti in the system, we associate a uniquefixed timestamp, de-noted by TS(Ti). This timestamp isassigned by the database system before the trans-action Ti startsexecution. If a transaction Ti has been assigned timestamp

TS(Ti), and a new transaction Tj enters the system, then TS(Ti)< TS(Tj). There are two simple methods for implementing thisscheme:1. Use the value of the system clock as the timestamp; that is,

a transaction’s time-stamp is equal to the value of the clockwhen the transaction enters the system.

2. Use a logical counter that is incremented after a newtimestamp has been assigned; that is, a transaction’stimestamp is equal to the value of the counter when thetransaction enters the system.

The timestamps of the transactions determine the serializabilityorder. Thus, if TS(Ti) < TS(Tj), then the system must ensurethat the produced schedule is equiva-lent to a serial schedule inwhich transaction Ti appears before transaction Tj.To implement this scheme, we associate with each data item Qtwo timestamp values:• W-timestamp(Q) denotes the largest timestamp of any

transaction that exe-cuted write(Q) successfully.• R-timestamp(Q) denotes the largest timestamp of any

transaction that exe-cuted read(Q) successfully.These timestamps are updated whenever a new read(Q) orwrite(Q) instruction is executed.

The Timestamp-ordering ProtocolThe timestamp-ordering protocol ensures that any conflictingread and write opera-tions are executed in timestamp order.This protocol operates as follows:1. Suppose that transaction Ti issues read(Q).

a. If TS(Ti) < W-timestamp(Q), then Ti needs to read avalue of Q that was already overwritten. Hence, theread operation is rejected, and Ti is rolled back.

b. If TS(Ti) ~ W-timestamp(Q), then the read operationis executed, and R-timestamp(Q) is set to themaximum of R-timestamp(Q) and TS(Ti).

2. Suppose that transaction Ti issues write(Q).a. If TS(Ti) < R-timestamp(Q), then the value of Q that

Ti is producing was needed previously, and the systemassumed that that value would never be produced.Hence, the system rejects the write operation and rollsTi back.

b. If TS(Ti) < W-timestamp(Q), then Ti is attempting towrite an obsolete value of Q. Hence, the system rejectsthis write operation and rolls Ti back.

c. Otherwise, the system executes the write operation andsets W-timestamp(Q) to TS(Ti). TS(Ti) = R/W -TS(Q).

If a transaction Ti is rolled back by the concurrency-controlscheme as result of is-suance of either a read or write operation,the system assigns it a new timestamp and restarts it.To illustrate this protocol, we consider transactions TI4 and T15.Transaction TI4 displays the contents of accounts A and B:T14: read(B); read(A);display(A + B).

T10 T11 T12 T13 lock-X(B) lock-x(E) lock-x(D) unlock(B) unlock(E) lock-X(G) unlock(D) unlock (G)

lock-x (D) lock-x(H) unlock(D) unlock (H)

lock-x (B) lock-x (E) unlock(E) unlock(B)

lock-x (D) lock-X(H) unlock(D) unlock(H)

148

DATABASE MANAGEMENT

Transaction TI5 transfers $50 from account A to account B, andthen displays thecontents of both:T15: read(B);B := B - 50;write(B);read(A);A := A + 50;write(A);display(A + B).In presenting schedules under the timestamp protocol, we shallassume that a trans-action is assigned a timestamp immediatelybefore its first instruction. Thus, in sched-ule 3 of Figure 43.3,TS(TI4) < TS(TI5), and the schedule is possible under the time-stamp protocol.We note that the preceding execution can also be produced bythe two-phase lock-ing protocol. There are, however, schedulesthat are possible under the two-phase locking protocol, but arenot possible under the timestamp protocol, and vice versa The timestamp-ordering protocol ensures conflict serializability.This is because conflicting operations are processed intimestamp order.The protocol ensures freedom from deadlock, since notransaction ever waits. However, there is a possibility ofstarvation of long transactions if a sequence of conflictingshort transactions causes repeated restarting of the longtransaction. If

Figure 43.13 Schedule 3.a transaction is found to be getting restarted repeatedly,conflicting transactions need to be temporarily blocked to enablethe transaction to finish.The protocol can generate schedules that are not recoverable.However, it can be extended to make the schedules recoverable,in one of several ways:Recoverability and cascadelessness can be ensured by performingall writes together at the end of the transaction. The writesmust be atomic in the fol-lowing sense: While the writes are inprogress, no transaction is permitted to access any of the dataitems that have been written.• Recoverability and cascadelessness can also be guaranteed by

using a limited form of locking, whereby reads ofuncommitted items are postponed until the transaction thatupdated the item commits

T14 T15 read (B) read (A) Display(A + B)

Read (B) B: = B-50 Write (B) read (A) A : = A+50 Write (A) Display (A+B)

• Recoverability alone can be ensured by trackinguncommitted writes, and al-lowing a transaction Ti tocommit only after the commit of any transaction that wrotea value that Ti read. Commit dependencies.

Validation-based ProtocolsIn cases where a majority of transactions are read-only transac-tions, the rate of con-flicts among transactions may be low.Thus, many of these transactions, if executed without thesupervision of a concurrency-control scheme, would neverthe-less leave the system in a consistent state. A concurrency-controlscheme imposes overhead of code execution and possible delayof transactions. It may be better to use an alterna-tive schemethat imposes less overhead. A difficulty in reducing theoverhead is that we do not know in advance which transactionswill be involved in a conflict. To gain that knowledge, we need ascheme for monitoring the system.We assume that each transaction Ti executes in two or threedifferent phases in its lifetime, depending on whether it is aread-only or an update transaction. The phases are, in order,1. Read phase. During this phase, the system executes

transaction Ti. It reads the values of the various data itemsand stores them in variables local to Ti. It performs all writeoperations on temporary local variables, without updates ofthe actual database.

2. Validation phase. Transaction Ti performs a validation testto determine whe-ther it can copy to the database thetemporary local variables that hold the results of writeoperations without causing a violation of serializability.

3. Write phase. If transaction Ti succeeds in validation (step 2),then the system applies the actual updates to the database.Otherwise, the system rolls back Ti.

Each transaction must go through the three phases in the ordershown. However, all three phases of concurrently executingtransactions can be interleaved.To perform the validation test; we need to know when thevarious phases of trans-actions Ti took place. We shall, there-fore, associate three different timestamps with transaction Ti:1. Start(Ti), the time when Ti started its execution.2. Validation(Ti), the time when Ti finished its read phase and

started its vali-dation phase.3. Finish(Ti), the time when Ti finished its write phase.We determine the serializability order by the timestamp-ordering technique, using the value of the timestampValidation(Ti). Thus, the value TS(Ti) = Validation(Ti) and, ifTS(Tj) < TS(Tk), then any produced schedule must be equiva-lent to a serial schedule in which transaction Tj appears beforetransaction Tk. The reason we have chosen Validation(Ti), ratherthan Start(Ti), as the timestamp of transaction Ti is that we canexpect faster response time provided that conflict rates amongtransactions are indeed low.The validation test for transaction Tj requires that, for alltransactions Ti with TS(Ti) < TS(Tj), one of the following twoconditions must hold:

149

DATABASE MANAGEMENT

1. Finish(Ti) < Start(Tj) Since Ti completes its executionbefore Tj started, the serializability order is indeedmaintained.

2. The set of data items written by Ti does not intersect withthe set of data items read by T, and Ti completes its writephase before Tj starts its validation phase (Start(Tj) <Finish(Ti) < Validation(Tj)). This condition ensures that

Figure 43.5 Schedule 5, a schedule produced by using valida-tion.the writes of Ti and Tj do not overlap. Since the writes of Ti donot affect the read of Tj, and since Tj cannot affect the read of Tithe serializability order is indeed maintained.As an illustration, consider again transactions T14 and T15.Suppose that TS(T14) < TS(T15). Then, the validation phasesucceeds in the schedule 5 in Figure 43.5. Note that the writes tothe actual variables are performed only after the validation phaseof T15. Thus, T14 reads the old values of B and A, and thisschedule is serializable.The validation scheme automatically guards against cascadingrollbacks, since the actual writes take place only after thetransaction issuing the write has committed. However, there is apossibility of starvation of long transactions, due to a sequenceof conflicting short transactions that cause repeated restarts ofthe long transaction. To avoid starvation, conflicting transac-tions must be temporarily blocked, to enable the longtransaction to finish.This validation scheme is called the optimistic concurrencycontrol scheme since transactions execute optimistically,assuming they will be able to finish execution and validate at theend. In contrast, locking and timestamp ordering are pessimisticin that they force a wait or a rollback whenever a conflict isdetected, even though there is a chance that the schedule may beconflict serializable.

Points to Ponder

• When the lock manager receives an unlock message from atransaction, it deletes the record for that data item in thelinked list corresponding to that transaction

• If we wish to develop protocols that are not two phase, weneed additional information on how each transaction willaccess the database

• One method for determining the serializability order is toselect an ordering among transactions in advance

T14 T15 read(B) read (A) (validate) display (A + B)

read(B) B := B – 50 read (A) A := A + 50 (validate) write(B) write (A)

• With each transaction Ti in the system, we associate a uniquefixed timestamp, This timestamp is assigned by thedatabase system before the trans-action Ti starts execution

• The timestamp-ordering protocol ensures that anyconflicting read and write opera-tions are executed intimestamp order

Review Terms

• Locking mechanism• Graph based protocol• Timestamps• Timestamp based protocol• Timestamp ordering protocols• Validation protocol

Students Activity

1. Define locks in a database system?Why it is advisable

2. Define graph based protocol?

3. Define Timestamps?

4. Define Timestamp based protocol?

150

DATABASE MANAGEMENT

5. Define Timestamp ordering protocol?

6. Define validation based protocol?

151

DATABASE MANAGEMENT

Student Notes

152

DATABASE MANAGEMENT

Lesson objectives

• Failure• Types of failures• Storage structure• Storage Type• Stable-Storage Implementation• Data Access• Recovery and Atomicity• Log-Based Recovery• CheckpointsThe database system must take actions in advance to ensure thatthe atomicity and durability properties of transactions as acomputer system, like any other device, is subject to failure froma variety of causes: disk crash, power outage, software error, afire in the machine room, even sabotage”. In any failure,information may be lost. are preserved. An integral part of adatabase system is a recovery scheme that can restore thedatabase to the consistent state that existed before the failure.The recovery scheme must also provide high availability; that is,it must minimize the time for which the database is not usableafter a crash.

Failure ClassificationThere are various types of failure that may occur in a system,each of which needs to be dealt with in a different manner. Thesimplest type of failure is one that does not result in the loss ofinformation in the system. The failures that are more difficulttom deal with are those that result in loss of information.Various types of failure are:

Transaction FailureThere are two types of errors that may cause a transaction to fail:Logical error. The transaction can no longer continue with itsnormal ex-ecution because of some internal condition, such asbad input, data not found, overflow, or resource limit exceeded.System error. The system has entered an undesirable state (forexample, deadlock), as a result of which a transaction cannotcontinue with its nor-mal execution. The transaction, however,can be executed at a later time.System crash There is a hardware malfunction, or a bug in thedatabase software or the operating system, that causes the lossof the content of volatile storage, and brings transactionprocessing to a halt. The content of nonvolatile storage remainsintact, and is not corrupted.The assumption that hardware errors and bugs in the softwarebring the system to a halt, but do not corrupt the nonvolatilestorage contents, is known as the fail-stop assumption. Well-designed systems have numerous internal checks, at thehardware and the software level, that bring the system to a halt

when there is an error. Hence, the fail-stop assumption is areasonable one.• Disk failure. A disk block loses its content as a result of

either a head crash or failure during a data transferoperation. Copies of the data on other disks, or archivalbackups on tertiary media, such as tapes, are used to recoverfrom the failure.

To determine how the system should recover from failures, weneed to identify the failure modes of those devices used forstoring data. Next, we must consider how these failure modesaffect the contents of the database. We can then proposealgorithms to ensure database consistency and transactionatomicity despite failures. These algorithms, known as recoveryalgorithms, have two parts:1. Actions taken during normal transaction processing to

ensure that enough information exists to allow recoveryfrom failures.

2. Actions taken after a failure to recover the database contentsto a state that ensures database consistency, transactionatomicity, and durability.

Storage StructureThe various data items in the database may be stored andaccessed in a number of different storage media. To understandhow to ensure the atomicity and durability properties of atransaction, we must gain a better under-standing of thesestorage media and their access methods.

Storage TypesStorage media can be distinguished by their relative speed,capacity, and resilience to failure, and classified as volatile storageor nonvolatile stor-age. We review these terms, and introduceanother class of storage, called stable stor-age.• Volatile storage. Information residing in volatile storage

does not usually sur-vive system crashes. Examples of suchstorage are main memory and cache memory. Access tovolatile storage is extremely fast, both because of the speedof the memory access itself, and because it is possible toaccess any data item in volatile storage directly.

• Nonvolatile storage. Information residing in nonvolatilestorage survives sys-tem crashes. Examples of such storageare disk and magnetic tapes. Disks are used for onlinestorage, whereas tapes are used for archival storage.Bothhowever, are subject to failure (for example, head crash),which may result in loss of information. At the currentstate of technology, nonvolatile st-age is slower than volatilestorage by several orders of magnitude. This is because diskand tape devices are electromechanical, rather than based en-tirely on chips, as is volatile storage. In database systems,disks are used for most nonvolatile storage. Othernonvolatile media are normally used only for backup data.

LESSON 44:

DATABASE RECOVERY

153

DATABASE MANAGEMENT

Flash storage, though nonvolatile, has insuffi-cient capacityfor most database systems.

• Stable storage. Information residing in stable storage isnever lost (never should be taken with a grain of salt, sincetheoretically never cannot be guaranteed for example, it ispossible, although extremely unlikely, that a black hole mayenvelop the earth and permanently destroy all data!).Although stable stor-age is theoretically impossible toobtain, it can be closely approximated by techniques thatmake data loss extremely unlikely.

The distinctions among the various storage types are often lessclear in practice than in our presentation. Certain systemsprovide battery backup, so that some main memory can survivesystem crashes and power failures. Alternative forms of non-volatile storage, such as optical media, provide an even higherdegree of reliability than do disks.

Stable-Storage ImplementationTo implement stable storage, we need to replicate the neededinformation in sev-eral nonvolatile storage media (usually disk)with independent failure modes, and to update the informationin a controlled manner to ensure that failure during data transferdoes not damage the needed information. RAID systems guarantee that the failure of a single disk (evenduring data transfer) will not result in loss of data. Thesimplest and fastest form of RAID is the mirrored disk, whichkeeps two copies of each block, on separate disks. Other formsof RAID offer lower costs, but at the expense of lowerperformance.RAID systems, however, cannot guard against data loss due todisasters such as fires or flooding. Many systems store archivalbackups of tapes off-site to guard against such disasters.However, since tapes cannot be carried off-site continually,updates since the most recent time that tapes were carried off-site could be lost in such a disaster. More secure systems keep acopy of each block of stable storage at a remote site, writing itout over a computer network, in addition to storing the blockon a local disk system. Since the blocks are output to a remotesystem as and when they are output to local storage, once anoutput operation is complete, the output is not lost, even inthe event of a disaster such as a fire or flood. We study suchremote’ backup systemsIn the remainder of this section, we discuss how storage mediacan be protected from failure during data transfer. Block transferbetween memory and disk storage can result in• Successful completion. The transferred information

arrived safely at its des-tination.• Partial failure. A failure occurred in the midst of transfer,

and the destination block has incorrect information.• Total failure. The failure occurred sufficiently early during

the transfer that the destination block remains intact.

We require that, if a data-transfer failure occurs, the systemdetects it and invokes a recovery procedure to restore the blockto a consistent state. To do so, the system must maintain twophysical blocks for each logical database block; in the case ofmirrored disks, both blocks are at the same location; in the case

of remote backup, one of the blocks is local, whereas the otheris at a remote site. An output operation is executed as follows:1. Write the information onto the first physical block.2. When the first write completes successfully, write the same

information onto the second physical block.3. The output is completed only after the second write

completes successfully.During recovery, the system examines each pair of physicalblocks. If both are the same and no detectable error exists, thenno further actions are necessary. (Recall that errors in a diskblock, such as a partial write to the block, are detected by storinga checksum with each block.) If the system detects an error inone block, then it replaces its content with the content of theother block. If both blocks contain no detectable error, but theydiffer in content, then the system replaces the content of thefirst block with the value of the second. This recovery procedureensures that a write to stable storage either succeeds completely(that is, updates all copies) or results in no change.The requirement of comparing every corresponding pair ofblocks during recovery is expensive to meet. We can reduce thecost greatly by keeping track of block writes that are in progress,using a small amount of nonvolatile RAM. On recovery, onlyblocks for which writes were in progress need to be compared.The protocols for writing out a block to a remote site are similarto the protocols for writing blocks to a mirrored disk systemWe can extend this procedure easily to allow the use of anarbitrarily large number of copies of each block of stablestorage. Although a large number of copies reduces theprobability of a failure to even lower than two copies-do, it isusually reasonable to simulate stable storage with only twocopies.

Data AccessThe database system resides permanently on nonvolatile storage(usually disks), and is partitioned into fixed-length storage unitscalled blocks. Blocks are the units of data transfer to and fromdisk, and may contain several data items. We shall assume thatno data item spans two or more blocks. This assumption isrealistic for most data-processing applications, such as ourbanking example.Transactions input information from the disk to main memory,and then output the information back onto the disk. The inputand output operations are done in block units. The blocksresiding on the disk are referred to as physical blocks; the blocksresiding temporarily in main memory are referred to as bufferblocks. The area of memory where blocks reside temporarily iscalled the disk buffer.Block movements between disk and main memory are initiatedthrough the fol-lowing two operations:1. input(B) transfers the physical block B to main memory.2. output(B) transfers the buffer block B to the disk, and

replaces the appropriate physical block there.Each transaction Ti has a private work area in which copies of allthe data items accessed and updated by Ti are kept. The systemcreates this work area when the transaction is initiated; thesystem removes it when the transaction either commits or

154

DATABASE MANAGEMENT

aborts. Each data item X kept in the work area of transaction Tiis denoted by Xi. Transaction Ti interacts with the databasesystem by transferring data to and from its work area to thesystem buffer. We transfer data by these two operations:1. read(X) assigns the value of data item X to the local variable

Xi. It executes this operation as follows:a. If block B x on which X resides is not in main

memory, it issues input(B x).b. It assigns to Xi the value of X from the buffer block.

2. write(X) assigns the value of local variable Xi to data item Xin the buffer block.It executes this operation as follows:a. If block B x on which X resides is not in main

memory, it issues input(Bx Lb. It assigns the value of Xi to X in buffer B x.

Note that both operations may require the transfer of a blockfrom disk to main mem-ory. They do not, however, specificallyrequire the transfer of a block from main mem-ory to disk.A buffer block is eventually written out to the disk eitherbecause the buffer man-ager needs the memory space for otherpurposes or because the database system wishes to reflect thechange to B on the disk. We shall say that the database systemperforms a force-output of buffer B if it issues an output(B).When a transaction needs to access a data item X for the firsttime, it must execute read (X). The system then performs allupdates to X on Xi. After the transaction ac-cesses X for thefinal time, it must execute write(X) to reflect the change to X inthe database itself.The output(B x) operation for the buffer block B x on which Xresides ,does not need to take effect immediately after write(X)is executed, since the block Bx may contain other data items thatare still being accessed. Thus, the actual output may take placelater. Notice that, if the system crashes after the write(X)operation was executed but before output(Bx) was executed,the new value of X is never written to disk and, thus, is lost.

Recovery and AtomicityConsider again our simplified banking system and transactionTi that transfers $50 from account A to account B, with initialvalues of A aI)d B being $1000 and $2000, respectively.Suppose that a system crash has occurred during the executionof Ti, after output (B A) has taken place, but before output(BB) was executed, where B A and B B denote the buffer blockson which A and B reside. Since the memory contents were lost,we do not know the fate of the transaction; thus, we couldinvoke one of two possible recovery procedures:• Execute Ti. This procedure will result in the value of A

becoming $900, rather than $950. Thus, the system entersan inconsistent state.

• Do not execute Ti. The current system state has values of$950 and $2000 for A and B, respectively. Thus, the systementers an inconsistent state.

In either case, the database is left in an inconsistent state, andthus this simple re-covery scheme does not work. The reason

for this difficulty is that we have modified the database withouthaving assurance that the transaction will indeed commit. Ourgoal is to perform either all or no database modifications madeby Ti. However, if Ti performed multiple database modifica-tions, several output operations may be re-quired, and a failuremay occur after some of these modifications have been made,but before all of them are made.To achieve our goal of atomicity, we must first output informa-tion describing the modifications to stable storage, withoutmodifying the database itself. As we shall see, this procedurewill allow us to output all the modifications made by a commit-ted transaction, despite failures. There are two ways to performsuch outputs; we study them in Sections 44.4 and 44.5. In thesetwo sections, we shall assume that transactions are executedserially; in other words, only a single transaction is active at atime. We shall describe how to handle concurrently executingtransactions later, in Section 44.6.

Log-Based RecoveryThe most widely used structure for recording database modifi-cations is the log. The log is a sequence of log records, recordingall the update activities in the database. There are several typesof log records. An update log record describes a single data-basewrite. It has these fields:Transaction identifier is the unique identifier of the transactionthat performed the write operation.• Data-item identifier is the unique identifier of the data item

written. Typically, it is the location on disk of the data item.• Old value is the value of the data item prior to the write.• New value is the value that the data item will have after the

write.Other special log records exist to record significant events duringtransaction pro-cessing, such as the start of a transaction andthe commit or abort of a transaction. We denote the varioustypes of log records as:• <Ti start>. Transaction Ti has started.• <Ti, Xj, VI, V2>. Transaction Ti has performed a write on

data item Xj. Xj• had value VI before the write, and will have value V2 after

the write.• <Ti commit>. Transaction Ti has committed. . <Ti abort>.

Transaction Ti has aborted.Whenever a transaction performs a write, it is essential that thelog record for that write be created before the database ismodified. Once a log record exists, we can output the modifica-tion to the database if that is desirable. Also, we have the abilityto undo a modification that has already been output to thedatabase. We undo it by using the old-value field in log records.For log records to be useful for recovery from system and diskfailures, the log must reside in stable storage. For now, weassume that every log record is written to the end of the log onstable storage as soon as it is created. In Section 44.7, we shallsee when it is safe to relax this requirement so as to reduce theoverhead imposed by logging. Two techniques for using the logto ensure transaction atomicity despite failures. Observe that thelog contains a complete record of all database activity. As a

155

DATABASE MANAGEMENT

result, the volume of data stored in the log may becomeunreasonably large.

Deferred Database ModificationThe deferred-modification technique ensures transactionatomicity by recording all database modifications in the log, butdeferring the execution of all write operations of a transactionuntil the transaction partially commits. Recall that a transactionis said to be partially committed once the final action of thetransaction has been ex-ecuted. The version of the deferred-modification technique that we describe in this section assumesthat transactions are executed serially.When a transaction partially commits, the information on thelog associated with the transaction is used in executing thedeferred writes. If the system crashes before the transactioncompletes its execution, or if the transaction aborts, then theinforma-tion on the log is simply ignored.The execution of transaction Ti proceeds as follows. Before Tistarts its execution, a record <Ti start> is written to the log. Awrite(X) operation by Ti results in the writing of a new recordto the log. Finally, when Ti partially commits, a record <Ticommit> is written to the log.When transaction Ti partially commits, the records associatedwith it in the log are used in executing the deferred writes. Sincea failure may occur while this updating is taking place, we mustensure that, before the start of these updates, all the log recordsare written out to stable storage. Once they have been written,the actual updating takes place, and the transaction enters thecommitted state.Observe that only the new value of the data item is required bythe deferred modification technique. Thus, we can simplify thegeneral update-log record struc-ture that we saw in the previoussection, by omitting the old-value field.To illustrate, reconsider our simplified banking system. Let Tobe a transaction that transfers $50 from account A to account B:

To: read (A);A := A-50;write(A);read(B);

B := B + 50;write(B).

Let Tl be a transaction that withdraws $100 from account C:T1: read(C);C := C - 100;

write(C).Suppose that these transactions are executed serially, in the orderTo followed by Tl’ and that the values of accounts A, B, and Cbefore the execution took place were $1000, $2000, and $700,respectively. There are various orders in which the actual outputs can takeplace to both the database system and the log as a result of theexecution of To and Tl’ One such order

<To start> <To, A, 950>

<To, B, 2050><To commit>

<T 1 start><T1, C, 600><Tl commit>

Figure 44.2 Portion of the database log corresponding to Toand Tl’appears in Figure 44.3. Note that the value of A is changed inthe database only after the record <To, A, 950> has been placedin the log.Using the log, the system can handle any failure that results inthe loss of informa-tion on volatile storage. The recoveryscheme uses the following recovery procedure:• redo(Ti) sets the value of all data items updated by

transaction Ti to the new values.The set of data items updated by Ti and their respective newvalues can be- found in the log.The redo operation must be independent; that is, executing itseveral times must be equivalent to executing it once. Thischaracteristic is required if we are to guarantee correct behavioreven if a failure occurs during the recovery process.After a failure, the recovery subsystem consults the log todetermine which trans-actions need to be redone. Transaction Tineeds to be redone if and only if the log contains both therecord <Ti start> and the record <Ti commit>. Thus, if thesystem crashes after the transaction completes its execution, therecovery scheme uses the information in the log to restore thesystem to a previous consistent state after the transaction hadcompleted.As an illustration, let us return to our banking example withtransactions To and Tl executed one after the other in the orderTo followed by Tl’ Figure 44.2 shows the log that results fromthe complete execution of To and Tl’ Let us suppose that theLog Database<To start><To, A, 950><To, B, 2050><To commit>

A = 950B = 2050

<Tl start><T1, C, 600><Tl commit>

C = 600Figure 44.3 State of the log and database corresponding. to Toand Tl’<T0 start> <T0 start> <T0 start>< T0,A,950> < T0,A,950> < T0,A,950><T0,B,2050> <T0,B,2050> <T0,B,2050>

156

DATABASE MANAGEMENT

<T0 commit> <T0 commit> <T1 start><T1 start> <T1,C,600> <T1,C,600>

<T1,commit>(a) (b) (c)Figure 44.4 Same log shown at three different times.System crashes before the completion of transaction, so that wecan see how the recovery technique restores the database to aconsistent state.Assume that the crash occurs just after the logrecord for the step.

Write(B)Of transaction t0 has been written to stable storage.The log atthe time of crash appears in figure 44.4(a) When the systemcomes back up, no redo actions need to be taken, since nocommit record appears in the log. The values of ‘accounts Aand B remain $1000 and $2000, respectively. The log records ofthe incomplete transaction To can be deleted from the log.Now, let us assume the crash comes just after the log record forthe step

write( C)of transaction Tl has been written to stable storage. In this case,the log at the time of the crash is as in Figure 17.4b. When thesystem comes back up, the operation redo(To) is performed,since the record

<To commit>appears in the log on the disk. After this operation is executed,the values of accounts A and Bare $950 and $2050, respectively.The value of account Remains $700. As before, the log recordsof the incomplete transaction Tl can be deleted from the log.Finally, assume that a crash occurs just after the log record

<Tl commit>is written to stable storage. The log at the time of this crash isas in Figure 44.4c. When the system comes back up, twocommit records are in the log: one for To and one for Tl’Therefore, the system must perform operations redo(To) andredo(T1), in the order in which their commit records appear inthe log. After the system executes these operations, the valuesof accounts A, B, and Care $950, $2050, and $600, respectively.Finally, let us consider a case in which a second system crashoccurs during re-covery from the first crash. Some changes mayhave been made to the database as a result of the redo opera-tions, but all changes may not have been made. When theSystem comes up after the second crash, recovery proceedsexactly as in the pr_.g examples. For each commit record

<Ticommit>Finally, let us consider a case in which a second system crashoccurs during re-covery from the first crash. Some changes mayhave been made to the database as aresult of the redo operations, but all changes may not havebeen made. When the System comes up after the second crash,recovery proceeds exactly as in the pr_.g examples. For eachcommit record

<Ticommit>

found in the log , the system performs the operation redo(Ti).In other words ,it restarts the recovery action from the begin-ning. Since redo writes values to the database independent ofthe values currentlyin the database, the result of a successfulsecond attempt at redo is the same as though redo has suc-ceeded the first time.

CheckpointsWhen a system failure occurs, we must consult the log todetermine those transac-tions that need to be redone and thosethat need to be undone. In principle, we need to search theentire log to determine this information. There are two majordifficulties with this approach:1. The search process is time consuming.2. Most of the transaction that, need to be redone have already

written their updates into the database.To reduce these types of overhead, we introduce checkpoints.During execution, the system maintains the log. In addition,the system periodically performs checkpoints, which require thefollowing sequence of actions to take place:1. Output onto stable storage all log records currently residing

in main memory.2. Output to the disk all modified buffer blocks.3. Output onto stable storage a log record <checkpoint>.Transactions are not allowed to perform any update action, suchas writing to any buffer blockor writing a log record , while acheckpoint is in progress.

Points to Ponder

• There are various types of failure that may occur in a system,each of which needs to be dealt with in a different manneras transaction failure,system crash, disk failures.

• Storage media can be distinguished by their relative speed,capacity, and resilience to failure, and classified as volatilestorage or nonvolatile stor-age

• RAID systems guarantee that the failure of a single disk(even during data transfer) will not result in loss of data

• When a system failure occurs, we must consult the log todetermine those transac-tions that need to be redone andthose that need to be undone.

Review Terms

• Failure• Types of failures• Storage structure• Storage Type• Stable-Storage Implementation• Data Access• Recovery and Atomicity• Log-Based Recovery• Checkpoints

157

DATABASE MANAGEMENT

Students Activity

1. Define database recovery?

2. What are the various kinds of failures?

3. Define log-based recovery?

4. Define checkpoints?

158

DATABASE MANAGEMENT

Student Notes

9, Km Milestone, NH-65, Kaithal - 136027, HaryanaWebsite: www.niilmuniversity.in

“The lesson content has been compiled from various sources in public domain including but not limited to the internet for the convenience of the users. The university has no proprietary right on the same.”

Documents

DATABSE MANAGEMENT - NIILM University DATABASE MANAGEMENT Lesson No. Topic Page No. Lesson 31 LAB 90 Lesson 32 Database Cursors 91 Lessom 33 LAB 100 Lesson 34 LAB 101 Lesson 35 Normalisation