Download pdf - Prof- Neeta Bonde DBMS (FYCS) DBMS: · PDF fileA database management system stores data in such a way that it becomes easier to ... Data Abstraction in DBMS ... We have three levels

Prof- Neeta Bonde

DBMS (FYCS)

Unit - 1

DBMS: - Database is a collection of related data and data is a collection of facts

and figures that can be processed to produce information. Mostly data represents

recordable facts.

Data aids in producing information, which is based on facts. For example, if we

have data about marks obtained by all students, we can then conclude about

toppers and average marks.

A database management system stores data in such a way that it becomes easier to

retrieve, manipulate, and produce information.

Characteristics of DBMS :

Traditionally, data was organized in file formats. DBMS was a new concept then,

and all the research was done to make it overcome the deficiencies in traditional

style of data management. A modern DBMS has the following characteristics:

Real-world entity: A modern DBMS is more realistic and uses real-world

entities to design its architecture. It uses the behavior and attributes too. For

example, a school database may use students as an entity and their age as an

attribute

Relation-based tables: DBMS allows entities and relations among them to

form tables. A user can understand the architecture of a database just by

looking at the table names.

Isolation of data and application: A database system is entirely different

than its data. A database is an active entity, whereas data is said to be

passive, on which the database works and organizes. DBMS also stores

metadata, which is data about data, to ease its own process.

Less redundancy: DBMS follows the rules of normalization, which splits a

relation when any of its attributes is having redundancy in values.

Normalization is a mathematically rich and scientific process that reduces

data redundancy.

Consistency: Consistency is a state where every relation in a database

remains consistent. There exist methods and techniques, which can detect

attempt of leaving database in inconsistent state. A DBMS can provide

greater consistency as compared to earlier forms of data storing applications

like file-processing systems..

Query Language: DBMS is equipped with query language, which makes it

more efficient to retrieve and manipulate data. A user can apply as many and

as different filtering options as required to retrieve a set of data.

Traditionally it was not possible where file-processing system was used.

Multiuser and Concurrent Access: DBMS supports multi-user

environment and allows them to access and manipulate data in parallel.

Though there are restrictions on transactions when users attempt to handle

the same data item, but users are always unaware of them.

Multiple views: DBMS offers multiple views for different users. A user

who is in the Sales department will have a different view of database than a

person working in the Production department. This feature enables the users

to have a concentrate view of the database according to their requirements.

Security: Features like multiple views offer security to some extent where

users are unable to access data of other users and departments. DBMS offers

methods to impose constraints while entering data into the database and

retrieving the same at a later stage. DBMS offers many different levels of

security features, which enables multiple users to have different views with

different features. For example, a user in the Sales department cannot see the

data that belongs to the Purchase department. Additionally, it can also be

managed how much data of the Sales department should be displayed to the

user. Since a DBMS is not saved on the disk as traditional file systems, it is

very hard for miscreants to break the code.

Advantages of DBMS

1. Controlling Redundancy : In file system, each application has its

own private files, which cannot be shared between multiple applications.

1:his can often lead to considerable redundancy in the stored data, which

results in wastage of storage space. By having centralized database most of

this can be avoided. It is not possible that all redundancy should be

eliminated. Sometimes there are sound business and technical reasons for·

maintaining multiple copies of the same data. In a database system,

however this redundancy can be controlled.

For example: In case of college database, there may be the number of

applications like General Office, Library, Account Office, Hostel etc. Each

of these applications may maintain the following information into own

private file applications:

It is clear from the above file systems, that there is some common data of the

student which has to be mentioned in each application, like Rollno, Name,

Class, Phone_No~ Address etc. This will cause the problem of redundancy

which results in wastage of storage space and difficult to maintain, but in

case of centralized database, data can be shared by number of applications

and the whole college can maintain its computerized data with the following

database:

It is clear in the above database that Rollno, Name, Class, Father_Name, Address,

Phone_No, Date_of_birth which are stored repeatedly in file system in each

application, need not be stored repeatedly in case of database, because every other

application can access this information by joining of relations on the basis of

common column i.e. Rollno. Suppose any user of Library system need the Name,

Address of any particular student and by joining of Library and General Office

relations on the basis of column Rollno he/she can easily retrieve this information.

Thus, we can say that centralized system of DBMS reduces the redundancy of data

to great extent but cannot eliminate the redundancy because RollNo is still

repeated in all the relations.

2. Inconsistency can be avoided : When the same data is duplicated and changes

are made at one site, which is not propagated to the other site, it gives rise to

inconsistency and the two entries regarding the same data will not agree. At such

times the data is said to be inconsistent. So, if the redundancy is removed chances

of having inconsistent data is also removed.

Let us again, consider the college system and suppose that in case of

General_Office file it is indicated that Roll_Number 5 lives in Amritsar but in

library file it is indicated that Roll_Number 5 lives in Jalandhar. Then, this is a

state at which tIle two entries of the same object do not agree with each other (that

is one is updated and other is not). At such time the database is said to be

inconsistent.

An inconsistent database is capable of supplying incorrect or conflicting

information. So there should be no inconsistency in database. It can be clearly

shown that inconsistency can be avoided in centralized system very well as

compared to file system ..

Let us consider again, the example of college system and suppose that RollNo 5 is

.shifted from Amritsar to Jalandhar, then address information of Roll Number 5

must be updated, whenever Roll number and address occurs in the system. In case

of file system, the information must be updated separately in each application, but

if we make updation only at three places and forget to make updation at fourth

application, then the whole system show the inconsistent results about Roll

Number 5.

In case of DBMS, Roll number and address occurs together only single time in

General _Office table. So, it needs single updation and then another application

retrieve the address information from General Office which is updated so, all

application will get the current and latest information by providing single update

operation and this single update operation is propagated to the whole database or

all other application automatically, this property is called as Propagation of

Update.

We can say the redundancy of data greatly affect the consistency of data. If

redundancy is less, it is easy to implement consistency of data. Thus, DBMS

system can avoid inconsistency to great extent.

3. Data can be shared : As explained earlier, the data about Name, Class, Father

__name etc. of General_Office is shared by multiple applications in centralized

DBMS as compared to file system so now applications can be developed to operate

against the same stored data. The applications may be developed without having to

create any new stored files.

4 Standards can be enforced : Since DBMS is a central system, so standard can

be enforced easily may be at Company level, Department level, National level or

International level. The standardized data is very helpful during migration or

interchanging of data. The file system is an independent system so standard cannot

be easily enforced on multiple independent applications.

Advantage of DBMS over file system

There are several advantages of Database management system over file system.

Few of them are as follows:

No redundant data – Redundancy removed by data normalization

Data Consistency and Integrity – data normalization takes care of it too

Secure – Each user has a different set of access

Privacy – Limited access

Easy access to data

Easy recovery

Flexible

Data Abstraction in DBMS

Database systems are made-up of complex data structures. To ease the user

interaction with database, the developers hide internal irrelevant details from users.

This process of hiding irrelevant details from user is called data abstraction.

We have three levels of abstraction:

Physical level: This is the lowest level of data abstraction. It describes how data is

actually stored in database. You can get the complex data structure details at this

level.

Logical level: This is the middle level of 3-level data abstraction architecture. It

describes what data is stored in database.

View level: Highest level of data abstraction. This level describes the user

interaction with database system.

Example: Let’s say we are storing customer information in a customer table.

At physical level these records can be described as blocks of storage (bytes, gigabytes, terabytes etc.) in memory. These details are often hidden from the

programmers.

At the logical level these records can be described as fields and attributes along

with their data types, their relationship among each other can be logically

implemented. The programmers generally work at this level because they are

aware of such things about database systems.

At view level, user just interact with system with the help of GUI and enter the

details at the screen, they are not aware of how the data is stored and what data is

stored; such details are hidden from them.

Architecture of Database

Database architecture can be 2-tier or 3 tier architecture based on how users are

connected to the database to get their request done. They can either directly

connect to the database or their request is received by intermediary layer, which

synthesizes the request and then it sends to database.

2-tier architecture

In 2-tier architecture, application program directly interacts with the database.

There will not be any user interface or the user involved with database interaction.

Imagine a front end application of School, where we need to display the reports of

all the students who are opted for different subjects. In this case, the application

will directly interact with the database and retrieve all required data. Here no

inputs from the user are required. This involves 2-tier architecture of the database.

Let us consider another example of two tier architecture. Consider a railway ticket

reservation system. How does this work? Imagine a person is reserving the ticket

from Delhi to Goa on particular day. At the same time another person in some

other place of Delhi is also reserving the ticket to Goa on the same day for the

same train. Now there is a requirement for two tickets, but for different persons.

What will reservation system do? It takes the request from both of them, and

queues the requests entered by each of them. Here the request entered to

application layer and request is sent to database layer. Once the request is

processed in database, the result is sent back to application layer for the user.

Advantages of 2-tier

Easy to understand as it directly communicates with the database.

Requested data can be retrieved very quickly, when there is less number of

users.

Easy to modify – any changes required, directly requests can be sent to

database

Easy to maintain – When there are multiple requests, it will be handled in a

queue and there will not be any chaos.

Disadvantages of 2-tier

It would be time consuming, when there is huge number of users. All the

requests will be queued and handed one after another. Hence it will not

respond to multiple users at the same time.

This architecture would little cost effective.

3-tier architecture

3-tier architecture is the most widely used database architecture. It can be

viewed as below.

Presentation layer / User layer is the layer where user uses the database.

He does not have any knowledge about underlying database. He simply

interacts with the database as though he has all data in front of him. You

can imagine this layer as a registration form where you will be inputting

your details. Did you ever guessed, after pressing ‘submit’ button where

the data goes? No right? You just know that your details are saved. This is

the presentation layer where all the details from the user are taken, sent to

the next layer for processing.

Application layer is the underlying program which is responsible for

saving the details that you have entered, and retrieving your details to

show up in the page. This layer has all the business logics like validation,

calculations and manipulations of data, and then sends the requests to

database to get the actual data. If this layer sees that the request is invalid,

it sends back the message to presentation layer. It will not hit the database

layer at all.

Data layer or Database layer is the layer where actual database resides. In

this layer, all the tables, their mappings and the actual data present. When

you save you details from the front end, it will be inserted into the

respective tables in the database layer, by using the programs in the

application layer. When you want to view your details in the web browser,

a request is sent to database layer by application layer. The database layer

fires queries and gets the data. These data are then transferred to the

browser (presentation layer) by the programs in the application layer.

DBMS Component Modules Figure 2.3 illustrates, in a simplified form, the typical DBMS

components. The figure is divided into two parts. The top part of the

figure refers to the various users of the database environment and their

interfaces. The lower part shows the internals of the DBMS responsible

for storage of data and processing of transactions.

The database and the DBMS catalog are usually stored on disk. Access

to the disk is Controlled primarily by the operating system (OS), which

schedules disk read/write. Many DBMSs have their own buffer

management module to schedule Disk read/write, because this has a

considerable effect on performance. Reducing disk read/write improves

performance considerably. A higher-level stored data manager module

of the DBMS controls access to DBMS information that is stored on

disk, whether it is part of the database or the catalog.

Let us consider the top part of Figure 2.3 first. It shows interfaces for the DBA

staff, casual users who work with interactive interfaces to formulate queries,

application programmers who create programs using some host programming

languages, and parametric users who do data entry work by supplying parameters

to predefined transactions. The DBA staff works on defining the database and

tuning it by making changes to its definition using the DDL and other privileged

commands.

The DDL compiler processes schema definitions, specified in the DDL, and stores

descriptions of the schemas (meta-data) in the DBMS catalog. The catalog includes

information such as the names and sizes of files, names and data types of data

items, storage details of each file, mapping information among schemas, and

constraints. In addition, the catalog stores many other types of information that are

needed by the DBMS modules, which can then look up the catalog information as

needed. Casual users and persons with occasional need for information from the

database interact using some form of interface, which we call the interactive

query interface in Figure 2.3.

We have not explicitly shown any menu-based or form-based interaction that may

be used to generate the interactive query automatically. These queries are parsed

and validated for correctness of the query syntax, the names of files and data

elements, and so on by a query compiler that compiles them into an internal form.

This internal query is subjected to query optimization.

Figure 2.3

Component modules of a DBMS and their interactions.

.

Basic Client/Server Architectures

First, we discuss client/server architecture in general, then we see how it is applied

to DBMSs. The client/server architecture was developed to deal with computing

environments in which a large number of PCs, workstations, file servers, printers,

data

base servers, Web servers, e-mail servers, and other software and equipment are

connected via a network. The idea is to define specialized servers with specific

functionalities. For example, it is possible to connect a number of PCs or small

workstations as clients to a file server that maintains the files of the client

machines. Another machine can be designated as a printer server by being

connected to various printers; all print requests by the clients are forwarded to this

machine. Web servers or e-mail servers also fall into the specialized server

category. The resources provided by specialized servers can be accessed by many

client machines.

The client machines provide the user with the appropriate interfaces to utilize

these servers, as well as with local processing power to run local applications. This

concept can be carried over to other software packages, with specialized

programs—such as a CAD (computer-aided design) package—being stored on

specific server machines and being made accessible to multiple clients. Figure 2.5

illustrates client/server architecture at the logical level; Figure 2.6 is a simplified

diagram that shows the physical architecture. Some machines would be client sites

only (for example, diskless workstations or workstations/PCs with disks that have

only client software installed).

Other machines would be dedicated servers, and others would have both client and

server functionality. The concept of client/server architecture assumes an

underlying framework that consists of many PCs and workstations as well as a

smaller number of mainframe machines, connected via LANs and other types of

computer networks.

A client in this framework is typically a user machine that provides user interface

capabilities and local processing. When a client requires access to additional

functionality— such as database access—that does not exist at that machine, it

connects to a server that provides the needed functionality. A server is a system

containing both hardware and software that can provide services to the client

machines, such as file access, printing, archiving, or database access. In general,

some machines install only client software, others only server software, and still

others may include both client and server software, as illustrated in Figure 2.6.

However, it is more common that client and server software usually run on

separate machines.

Two main types of basic DBMS architectures were created on this underlying

client/server framework:

Two-tier and three-tier

Two-Tier Client/Server Architectures for DBMS

In relational database management systems (RDBMSs), many of which started as

centralized systems, the system components that were first moved to the client side

were the user interface and application programs. between client and server.

Hence, the query and transaction functionality related to SQL processing remained

on the server side. In such architecture, the server is often called a query server or

transaction server because it provides these two functionalities. In an RDBMS,

the server is also often called an SQL server. The user interface programs and

application programs can run on the client side. When DBMS access is required,

the program establishes a connection to the DBMS(Which is on the server side);

once the connection is created, the client program can communicate with the

DBMS. A standard called Open Database Connectivity (ODBC) provides an

application programming interface (API), which allows client-side programs to

call the DBMS, as long as both client and server machines have the necessary

software installed. Most DBMS vendors provide ODBC drivers for their systems.

A client program can actually connect to several RDBMSs and send query and

transaction requests using the ODBC API, which are then processed at the server

sites. Any query results are sent back to the client program, which can process and

display the results as needed. A related standard for the Java programming

Language, called JDBC, has also been defined. This allows Java client programs

to access one or more DBMSs through a standard interface.

The different approach to two-tier client/server architecture was taken by some

object-oriented DBMSs, where the software modules of the DBMS were divided

between client and server in a more integrated way. For example, the server level

may include the part of the DBMS software responsible for handling data storage

on disk pages, local concurrency control and recovery, buffering and caching of

disk pages, and other such functions. Meanwhile, the client level may handle the

user interface; data dictionary functions; DBMS interactions with programming

language compilers; global query optimization, concurrency control, and recovery

across multiple servers; structuring of complex objects from the data in the buffers;

and other such functions. In this approach, the client/server interaction is more

tightly coupled and is done internally by the DBMS modules—some of which

reside on the client and some on the server—rather than by the users/programmers.

The exact division of functionality can vary from system to system. In such a

client/server architecture, the server has been called a data server because it

provides data in disk pages to the client.

This data can then be structured into objects for the client programs by the client-

side DBMS software. The architectures described here are called two-tier

architectures because the software components are distributed over two systems:

client and server. The advantages of this architecture are its simplicity and

seamless compatibility with existing systems. The emergence of the Web changed

the roles of clients and servers, leading to the three-tier architecture.

Three-Tier and n-Tier Architectures

Many Web applications use an architecture called the three-tier architecture,

which

adds an intermediate layer between the client and the database server, as illustrated

in Figure 2.7

This intermediate layer or middle tier is called the application server or the Web

server, depending on the application. This server plays an intermediary role by

running application programs and storing business rules (procedures or constraints)

that are used to access data from the database server. It can also improve database

security by checking a client’s credentials before forwarding a request to the

database server.

Clients contain GUI interfaces and some additional application-specific

business rules. The intermediate server accepts requests from the client, processes

the request and sends database queries and commands to the database server, and

then acts as a conduit for passing (partially) processed data from the database

server to the clients, where it may be processed further and filtered to be presented

to users in GUI format. Thus, the user interface, application rules, and data access

act as the three tiers. Figure 2.7(b) shows another architecture used by database and

other application package vendors.

The presentation layer displays information to the user and allows data

entry. The business logic layer handles intermediate rules and constraints before

data is passed up to the user or down to the DBMS. The bottom layer includes all

data management services. The middle layer can also act as a Web server, which

retrieves query results from the database server and formats them into dynamic

Web pages that are viewed by the Web browser at the client side. Other

architectures have also been proposed. It is possible to divide the layers between

the user and the stored data further into finer components, thereby giving rise to n-

tier architectures; where n may be four or five tiers. Typically, the business logic

layer is divided into multiple layers. Besides distributing programming and data

throughout a network, n-tier applications