Prof- Neeta Bonde
DBMS (FYCS)
Unit - 1
DBMS: - Database is a collection of related data and data is a collection of facts
and figures that can be processed to produce information. Mostly data represents
recordable facts.
Data aids in producing information, which is based on facts. For example, if we
have data about marks obtained by all students, we can then conclude about
toppers and average marks.
A database management system stores data in such a way that it becomes easier to
retrieve, manipulate, and produce information.
Characteristics of DBMS :
Traditionally, data was organized in file formats. DBMS was a new concept then,
and all the research was done to make it overcome the deficiencies in traditional
style of data management. A modern DBMS has the following characteristics:
Real-world entity: A modern DBMS is more realistic and uses real-world
entities to design its architecture. It uses the behavior and attributes too. For
example, a school database may use students as an entity and their age as an
attribute
Relation-based tables: DBMS allows entities and relations among them to
form tables. A user can understand the architecture of a database just by
looking at the table names.
Isolation of data and application: A database system is entirely different
than its data. A database is an active entity, whereas data is said to be
passive, on which the database works and organizes. DBMS also stores
metadata, which is data about data, to ease its own process.
Less redundancy: DBMS follows the rules of normalization, which splits a
relation when any of its attributes is having redundancy in values.
Normalization is a mathematically rich and scientific process that reduces
data redundancy.
Consistency: Consistency is a state where every relation in a database
remains consistent. There exist methods and techniques, which can detect
attempt of leaving database in inconsistent state. A DBMS can provide
greater consistency as compared to earlier forms of data storing applications
like file-processing systems..
Query Language: DBMS is equipped with query language, which makes it
more efficient to retrieve and manipulate data. A user can apply as many and
as different filtering options as required to retrieve a set of data.
Traditionally it was not possible where file-processing system was used.
Multiuser and Concurrent Access: DBMS supports multi-user
environment and allows them to access and manipulate data in parallel.
Though there are restrictions on transactions when users attempt to handle
the same data item, but users are always unaware of them.
Multiple views: DBMS offers multiple views for different users. A user
who is in the Sales department will have a different view of database than a
person working in the Production department. This feature enables the users
to have a concentrate view of the database according to their requirements.
Security: Features like multiple views offer security to some extent where
users are unable to access data of other users and departments. DBMS offers
methods to impose constraints while entering data into the database and
retrieving the same at a later stage. DBMS offers many different levels of
security features, which enables multiple users to have different views with
different features. For example, a user in the Sales department cannot see the
data that belongs to the Purchase department. Additionally, it can also be
managed how much data of the Sales department should be displayed to the
user. Since a DBMS is not saved on the disk as traditional file systems, it is
very hard for miscreants to break the code.
Advantages of DBMS
1. Controlling Redundancy : In file system, each application has its
own private files, which cannot be shared between multiple applications.
1:his can often lead to considerable redundancy in the stored data, which
results in wastage of storage space. By having centralized database most of
this can be avoided. It is not possible that all redundancy should be
eliminated. Sometimes there are sound business and technical reasons for·
maintaining multiple copies of the same data. In a database system,
however this redundancy can be controlled.
For example: In case of college database, there may be the number of
applications like General Office, Library, Account Office, Hostel etc. Each
of these applications may maintain the following information into own
private file applications:
It is clear from the above file systems, that there is some common data of the
student which has to be mentioned in each application, like Rollno, Name,
Class, Phone_No~ Address etc. This will cause the problem of redundancy
which results in wastage of storage space and difficult to maintain, but in
case of centralized database, data can be shared by number of applications
and the whole college can maintain its computerized data with the following
database:
It is clear in the above database that Rollno, Name, Class, Father_Name, Address,
Phone_No, Date_of_birth which are stored repeatedly in file system in each
application, need not be stored repeatedly in case of database, because every other
application can access this information by joining of relations on the basis of
common column i.e. Rollno. Suppose any user of Library system need the Name,
Address of any particular student and by joining of Library and General Office
relations on the basis of column Rollno he/she can easily retrieve this information.
Thus, we can say that centralized system of DBMS reduces the redundancy of data
to great extent but cannot eliminate the redundancy because RollNo is still
repeated in all the relations.
2. Inconsistency can be avoided : When the same data is duplicated and changes
are made at one site, which is not propagated to the other site, it gives rise to
inconsistency and the two entries regarding the same data will not agree. At such
times the data is said to be inconsistent. So, if the redundancy is removed chances
of having inconsistent data is also removed.
Let us again, consider the college system and suppose that in case of
General_Office file it is indicated that Roll_Number 5 lives in Amritsar but in
library file it is indicated that Roll_Number 5 lives in Jalandhar. Then, this is a
state at which tIle two entries of the same object do not agree with each other (that
is one is updated and other is not). At such time the database is said to be
inconsistent.
An inconsistent database is capable of supplying incorrect or conflicting
information. So there should be no inconsistency in database. It can be clearly
shown that inconsistency can be avoided in centralized system very well as
compared to file system ..
Let us consider again, the example of college system and suppose that RollNo 5 is
.shifted from Amritsar to Jalandhar, then address information of Roll Number 5
must be updated, whenever Roll number and address occurs in the system. In case
of file system, the information must be updated separately in each application, but
if we make updation only at three places and forget to make updation at fourth
application, then the whole system show the inconsistent results about Roll
Number 5.
In case of DBMS, Roll number and address occurs together only single time in
General _Office table. So, it needs single updation and then another application
retrieve the address information from General Office which is updated so, all
application will get the current and latest information by providing single update
operation and this single update operation is propagated to the whole database or
all other application automatically, this property is called as Propagation of
Update.
We can say the redundancy of data greatly affect the consistency of data. If
redundancy is less, it is easy to implement consistency of data. Thus, DBMS
system can avoid inconsistency to great extent.
3. Data can be shared : As explained earlier, the data about Name, Class, Father
__name etc. of General_Office is shared by multiple applications in centralized
DBMS as compared to file system so now applications can be developed to operate
against the same stored data. The applications may be developed without having to
create any new stored files.
4 Standards can be enforced : Since DBMS is a central system, so standard can
be enforced easily may be at Company level, Department level, National level or
International level. The standardized data is very helpful during migration or
interchanging of data. The file system is an independent system so standard cannot
be easily enforced on multiple independent applications.
Advantage of DBMS over file system
There are several advantages of Database management system over file system.
Few of them are as follows:
No redundant data – Redundancy removed by data normalization
Data Consistency and Integrity – data normalization takes care of it too
Secure – Each user has a different set of access
Privacy – Limited access
Easy access to data
Easy recovery
Flexible
Data Abstraction in DBMS
Database systems are made-up of complex data structures. To ease the user
interaction with database, the developers hide internal irrelevant details from users.
This process of hiding irrelevant details from user is called data abstraction.
We have three levels of abstraction:
Physical level: This is the lowest level of data abstraction. It describes how data is
actually stored in database. You can get the complex data structure details at this
level.
Logical level: This is the middle level of 3-level data abstraction architecture. It
describes what data is stored in database.
View level: Highest level of data abstraction. This level describes the user
interaction with database system.
Example: Let’s say we are storing customer information in a customer table.
At physical level these records can be described as blocks of storage (bytes, gigabytes, terabytes etc.) in memory. These details are often hidden from the
programmers.
At the logical level these records can be described as fields and attributes along
with their data types, their relationship among each other can be logically
implemented. The programmers generally work at this level because they are
aware of such things about database systems.
At view level, user just interact with system with the help of GUI and enter the
details at the screen, they are not aware of how the data is stored and what data is
stored; such details are hidden from them.
Architecture of Database
Database architecture can be 2-tier or 3 tier architecture based on how users are
connected to the database to get their request done. They can either directly
connect to the database or their request is received by intermediary layer, which
synthesizes the request and then it sends to database.
2-tier architecture
In 2-tier architecture, application program directly interacts with the database.
There will not be any user interface or the user involved with database interaction.
Imagine a front end application of School, where we need to display the reports of
all the students who are opted for different subjects. In this case, the application
will directly interact with the database and retrieve all required data. Here no
inputs from the user are required. This involves 2-tier architecture of the database.
Let us consider another example of two tier architecture. Consider a railway ticket
reservation system. How does this work? Imagine a person is reserving the ticket
from Delhi to Goa on particular day. At the same time another person in some
other place of Delhi is also reserving the ticket to Goa on the same day for the
same train. Now there is a requirement for two tickets, but for different persons.
What will reservation system do? It takes the request from both of them, and
queues the requests entered by each of them. Here the request entered to
application layer and request is sent to database layer. Once the request is
processed in database, the result is sent back to application layer for the user.
Advantages of 2-tier
Easy to understand as it directly communicates with the database.
Requested data can be retrieved very quickly, when there is less number of
users.
Easy to modify – any changes required, directly requests can be sent to
database
Easy to maintain – When there are multiple requests, it will be handled in a
queue and there will not be any chaos.
Disadvantages of 2-tier
It would be time consuming, when there is huge number of users. All the
requests will be queued and handed one after another. Hence it will not
respond to multiple users at the same time.
This architecture would little cost effective.
3-tier architecture
3-tier architecture is the most widely used database architecture. It can be
viewed as below.
Presentation layer / User layer is the layer where user uses the database.
He does not have any knowledge about underlying database. He simply
interacts with the database as though he has all data in front of him. You
can imagine this layer as a registration form where you will be inputting
your details. Did you ever guessed, after pressing ‘submit’ button where
the data goes? No right? You just know that your details are saved. This is
the presentation layer where all the details from the user are taken, sent to
the next layer for processing.
Application layer is the underlying program which is responsible for
saving the details that you have entered, and retrieving your details to
show up in the page. This layer has all the business logics like validation,
calculations and manipulations of data, and then sends the requests to
database to get the actual data. If this layer sees that the request is invalid,
it sends back the message to presentation layer. It will not hit the database
layer at all.
Data layer or Database layer is the layer where actual database resides. In
this layer, all the tables, their mappings and the actual data present. When
you save you details from the front end, it will be inserted into the
respective tables in the database layer, by using the programs in the
application layer. When you want to view your details in the web browser,
a request is sent to database layer by application layer. The database layer
fires queries and gets the data. These data are then transferred to the
browser (presentation layer) by the programs in the application layer.
DBMS Component Modules Figure 2.3 illustrates, in a simplified form, the typical DBMS
components. The figure is divided into two parts. The top part of the
figure refers to the various users of the database environment and their
interfaces. The lower part shows the internals of the DBMS responsible
for storage of data and processing of transactions.
The database and the DBMS catalog are usually stored on disk. Access
to the disk is Controlled primarily by the operating system (OS), which
schedules disk read/write. Many DBMSs have their own buffer
management module to schedule Disk read/write, because this has a
considerable effect on performance. Reducing disk read/write improves
performance considerably. A higher-level stored data manager module
of the DBMS controls access to DBMS information that is stored on
disk, whether it is part of the database or the catalog.
Let us consider the top part of Figure 2.3 first. It shows interfaces for the DBA
staff, casual users who work with interactive interfaces to formulate queries,
application programmers who create programs using some host programming
languages, and parametric users who do data entry work by supplying parameters
to predefined transactions. The DBA staff works on defining the database and
tuning it by making changes to its definition using the DDL and other privileged
commands.
The DDL compiler processes schema definitions, specified in the DDL, and stores
descriptions of the schemas (meta-data) in the DBMS catalog. The catalog includes
information such as the names and sizes of files, names and data types of data
items, storage details of each file, mapping information among schemas, and
constraints. In addition, the catalog stores many other types of information that are
needed by the DBMS modules, which can then look up the catalog information as
needed. Casual users and persons with occasional need for information from the
database interact using some form of interface, which we call the interactive
query interface in Figure 2.3.
We have not explicitly shown any menu-based or form-based interaction that may
be used to generate the interactive query automatically. These queries are parsed
and validated for correctness of the query syntax, the names of files and data
elements, and so on by a query compiler that compiles them into an internal form.
This internal query is subjected to query optimization.
Figure 2.3
Component modules of a DBMS and their interactions.
.
Basic Client/Server Architectures
First, we discuss client/server architecture in general, then we see how it is applied
to DBMSs. The client/server architecture was developed to deal with computing
environments in which a large number of PCs, workstations, file servers, printers,
data
base servers, Web servers, e-mail servers, and other software and equipment are
connected via a network. The idea is to define specialized servers with specific
functionalities. For example, it is possible to connect a number of PCs or small
workstations as clients to a file server that maintains the files of the client
machines. Another machine can be designated as a printer server by being
connected to various printers; all print requests by the clients are forwarded to this
machine. Web servers or e-mail servers also fall into the specialized server
category. The resources provided by specialized servers can be accessed by many
client machines.
The client machines provide the user with the appropriate interfaces to utilize
these servers, as well as with local processing power to run local applications. This
concept can be carried over to other software packages, with specialized
programs—such as a CAD (computer-aided design) package—being stored on
specific server machines and being made accessible to multiple clients. Figure 2.5
illustrates client/server architecture at the logical level; Figure 2.6 is a simplified
diagram that shows the physical architecture. Some machines would be client sites
only (for example, diskless workstations or workstations/PCs with disks that have
only client software installed).
Other machines would be dedicated servers, and others would have both client and
server functionality. The concept of client/server architecture assumes an
underlying framework that consists of many PCs and workstations as well as a
smaller number of mainframe machines, connected via LANs and other types of
computer networks.
A client in this framework is typically a user machine that provides user interface
capabilities and local processing. When a client requires access to additional
functionality— such as database access—that does not exist at that machine, it
connects to a server that provides the needed functionality. A server is a system
containing both hardware and software that can provide services to the client
machines, such as file access, printing, archiving, or database access. In general,
some machines install only client software, others only server software, and still
others may include both client and server software, as illustrated in Figure 2.6.
However, it is more common that client and server software usually run on
separate machines.
Two main types of basic DBMS architectures were created on this underlying
client/server framework:
Two-tier and three-tier
Two-Tier Client/Server Architectures for DBMS
In relational database management systems (RDBMSs), many of which started as
centralized systems, the system components that were first moved to the client side
were the user interface and application programs. between client and server.
Hence, the query and transaction functionality related to SQL processing remained
on the server side. In such architecture, the server is often called a query server or
transaction server because it provides these two functionalities. In an RDBMS,
the server is also often called an SQL server. The user interface programs and
application programs can run on the client side. When DBMS access is required,
the program establishes a connection to the DBMS(Which is on the server side);
once the connection is created, the client program can communicate with the
DBMS. A standard called Open Database Connectivity (ODBC) provides an
application programming interface (API), which allows client-side programs to
call the DBMS, as long as both client and server machines have the necessary
software installed. Most DBMS vendors provide ODBC drivers for their systems.
A client program can actually connect to several RDBMSs and send query and
transaction requests using the ODBC API, which are then processed at the server
sites. Any query results are sent back to the client program, which can process and
display the results as needed. A related standard for the Java programming
Language, called JDBC, has also been defined. This allows Java client programs
to access one or more DBMSs through a standard interface.
The different approach to two-tier client/server architecture was taken by some
object-oriented DBMSs, where the software modules of the DBMS were divided
between client and server in a more integrated way. For example, the server level
may include the part of the DBMS software responsible for handling data storage
on disk pages, local concurrency control and recovery, buffering and caching of
disk pages, and other such functions. Meanwhile, the client level may handle the
user interface; data dictionary functions; DBMS interactions with programming
language compilers; global query optimization, concurrency control, and recovery
across multiple servers; structuring of complex objects from the data in the buffers;
and other such functions. In this approach, the client/server interaction is more
tightly coupled and is done internally by the DBMS modules—some of which
reside on the client and some on the server—rather than by the users/programmers.
The exact division of functionality can vary from system to system. In such a
client/server architecture, the server has been called a data server because it
provides data in disk pages to the client.
This data can then be structured into objects for the client programs by the client-
side DBMS software. The architectures described here are called two-tier
architectures because the software components are distributed over two systems:
client and server. The advantages of this architecture are its simplicity and
seamless compatibility with existing systems. The emergence of the Web changed
the roles of clients and servers, leading to the three-tier architecture.
Three-Tier and n-Tier Architectures
Many Web applications use an architecture called the three-tier architecture,
which
adds an intermediate layer between the client and the database server, as illustrated
in Figure 2.7
This intermediate layer or middle tier is called the application server or the Web
server, depending on the application. This server plays an intermediary role by
running application programs and storing business rules (procedures or constraints)
that are used to access data from the database server. It can also improve database
security by checking a client’s credentials before forwarding a request to the
database server.
Clients contain GUI interfaces and some additional application-specific
business rules. The intermediate server accepts requests from the client, processes
the request and sends database queries and commands to the database server, and
then acts as a conduit for passing (partially) processed data from the database
server to the clients, where it may be processed further and filtered to be presented
to users in GUI format. Thus, the user interface, application rules, and data access
act as the three tiers. Figure 2.7(b) shows another architecture used by database and
other application package vendors.
The presentation layer displays information to the user and allows data
entry. The business logic layer handles intermediate rules and constraints before
data is passed up to the user or down to the DBMS. The bottom layer includes all
data management services. The middle layer can also act as a Web server, which
retrieves query results from the database server and formats them into dynamic
Web pages that are viewed by the Web browser at the client side. Other
architectures have also been proposed. It is possible to divide the layers between
the user and the stored data further into finer components, thereby giving rise to n-
tier architectures; where n may be four or five tiers. Typically, the business logic
layer is divided into multiple layers. Besides distributing programming and data
throughout a network, n-tier applications