Database Management Systems (Mcom Ecommerce)

1

Meaning

A database is an organized collection of data. The data are typically organized to model aspects of reality in a way that

supports processes requiring information. For example, modeling the availability of rooms in hotels in a way that

supports finding a hotel with vacancies.

Database management systems (DBMSs) are specially designed software applications that interact with the user, other

applications, and the database itself to capture and analyze data. A general -purpose DBMS is a software system

designed to allow the definition, creation, querying, update, and administration of databases. Well-known DBMSs

include MySQL, PostgreSQL, Microsoft SQL Server, Oracle, SAP and IBM DB2. A database is not generally portable across

different DBMSs, but different DBMSs can interoperate by using standards such as SQL and ODBC or JDBC to allow a

single application to work with more than one DBMS. Database management systems are often classified according to

the database that they support; the most popular database systems since the 1980s have all supported the relational

model as represented by the SQL language.

Systematically organized or structured repository of indexed information (usually as a group of linked data files) that

allows easy retrieval, updating, analysis, and output of data. Stored usually in a computer, this data could be in

the form of graphics, reports, scripts, tables, text, etc., representing almost every kind of information.

Most computer applications (including software, spreadsheets, word-processors) are databases at their core. See

also flat database and relational database.

A database is a collection of information that is organized so that it can easily be accessed, managed, and updated. In

one view, databases can be classified according to types of content: bibliographic, full -text, numeric, and images. A

database is a collection of information that is organized so that it can easily be accessed, managed, and updated. In one

view, databases can be classified according to types of content: bibliographic, full -text, numeric, and images.

In computing, databases are sometimes classified according to their organizational approach. The most prevalen t

approach is the relational database, a tabular database in which data is defined so that it can be reorganized and

accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among

different points in a network. An object-oriented programming database is one that is congruent with the data defined

in object classes and subclasses.

Computer databases typically contain aggregations of data records or files, such as sales transactions, product catalogs

and inventories, and customer profiles. Typically, a database manager provides users the capabilities of controlling

read/write access, specifying report generation, and analyzing usage. Databases and database managers are prevalent in

large main frame systems, but are also present in smaller distributed workstation and mid-range systems such as the

AS/400 and on personal computers. SQL(Structured Query Language) is a standard language for making interactive

queries from and updating a database such as IBM's DB2, Microsoft's SQL Server, and database products

from Oracle, Sybase, and Computer Associates.

Features of a DBMS The prime purpose of a relational database management system is to maintain data integrity. This means all the rules

and relationships between data are consistent at all times. But a good DBMS will have other features as well.

These include: A command language that allows you to create, delete and alter the database (data description language

or DDL) A way of documenting all the internal structures that makes up the database (data dictionary) A language to

support the manipulation and processing of the data (data manipulation language) Support the ability to view the

database from different viewpoints according to the requirements of the user Provide some level of security and access

control to the data. The simplest RDBMS may be designed with a single user in mind e.g. the database is 'locked' until

that person has finished with it. Such a RDBMS will only cost a few hundred pounds at most and will have only a basic

capability. On the other hand an enterprise level DBMS can support a huge number of simultaneous users with

thousands of internal tables and complex 'roll back' capabilities should things go wrong.

Obviously this kind of system will cost thousands along with a need to have professional database administrators looking

after it and database specialists to create complex queries for management and staff.

http://en.wikipedia.org/wiki/Data_(computing)

http://en.wikipedia.org/wiki/Computer_software

http://en.wikipedia.org/wiki/PostgreSQL

http://en.wikipedia.org/wiki/Microsoft_SQL_Server

http://en.wikipedia.org/wiki/Oracle_Database

http://en.wikipedia.org/wiki/SAP

http://en.wikipedia.org/wiki/IBM_DB2

http://en.wikipedia.org/wiki/SQL

http://en.wikipedia.org/wiki/ODBC

http://en.wikipedia.org/wiki/JDBC

http://en.wikipedia.org/wiki/Relational_model

http://en.wikipedia.org/wiki/Relational_model


http://www.businessdictionary.com/definition/organized.html

http://www.businessdictionary.com/definition/structured.html

http://www.businessdictionary.com/definition/repository.html

http://www.businessdictionary.com/definition/information.html

http://www.businessdictionary.com/definition/group.html

http://www.businessdictionary.com/definition/data-file.html

http://www.businessdictionary.com/definition/analysis.html

http://www.businessdictionary.com/definition/output.html

http://www.businessdictionary.com/definition/data.html

http://www.businessdictionary.com/definition/computer.html

http://www.businessdictionary.com/definition/form.html

http://www.businessdictionary.com/definition/graphic.html

http://www.businessdictionary.com/definition/report.html

http://www.businessdictionary.com/definition/script.html

http://www.businessdictionary.com/definition/table.html

http://www.businessdictionary.com/definition/application.html

http://www.businessdictionary.com/definition/word-processor.html

http://www.businessdictionary.com/definition/flat-database.html

http://www.businessdictionary.com/definition/relational-database.html

http://searchsqlserver.techtarget.com/definition/information

http://searchsqlserver.techtarget.com/definition/relational-database

http://searchsoa.techtarget.com/definition/object-oriented-programming

http://searchdatacenter.techtarget.com/definition/mainframe

http://searchmobilecomputing.techtarget.com/definition/workstation

http://searchsqlserver.techtarget.com/definition/SQL

http://searchdatacenter.techtarget.com/definition/DB2

http://searchsqlserver.techtarget.com/definition/SQL-Server

http://searchoracle.techtarget.com/definition/Oracle

http://searchenterpriselinux.techtarget.com/definition/Sybase

2

1. Controlling Data Redundancy:

In non-database systems (traditional computer file processing), each application program has its own files. In this case,

the duplicated copies of the same data are created at many places. In DBMS, all the data of an organization is integrated

into a single database. The data is recorded at only one place in the database and it is not duplicated. For example, the

dean's faculty file and the faculty payroll file contain several items that are identical. When they are converted into

database, the data is integrated into a single database so that multiple copies of the same data are reduced to-single

copy. In DBMS, the data redundancy can be controlled or reduced but is not removed completely. Sometimes, it is

necessary to create duplicate copies of the same data items in order to relate tables with each other. By controlling

the data redundancy, you can save storage space. Similarly, it is useful for retrieving data from database using queries.

2. Data Consistency:

By controlling the data redundancy, the data consistency is obtained. If a data item appears only once, any update to its

value has to be performed only once and the updated value (new value of item) is immediately available to all users.

If the DBMS has reduced redundancy to a minimum level, the database system enforces consistency. It means that when

a data item appears more than once in the database and is updated, the DBMS automatically updates each occurrence

of a data item in the database.

3. Data Sharing:

In DBMS, data can be shared by authorized users of the organization. The DBA manages the data and gives rights to

users to access the data. Many users can be authorized to access the same set of information simultaneously. The

remote users can also share same data. Similarly, the data of same database can be shared between different

application programs.

4. Data Integration:

In DBMS, data in database is stored in tables. A single database contains multiple tables and relationships can be created

between tables (or associated data entities). This makes easy to retrieve and update data.

5. Integrity Constraints:

Integrity constraints or consistency rules can be applied to database so that the correct data can be entered into

database. The constraints may be applied to data item within a single record or they may be applied to relationships

between records.

Examples:

The examples of integrity constraints are:

(i) 'Issue Date' in a library system cannot be later than the corresponding 'Return Date' of a book.

(ii) Maximum obtained marks in a subject cannot exceed 100.

(iii) Registration number of BCS and MCS students must start with 'BCS' and 'MCS' respectively etc.

There are also some standard constraints that are intrinsic in most of the DBMSs. These are;

Constraint Name Description

PRIMARY KEY Designates a column or combination of columns as Primary Key and

therefore, values of columns cannot be repeated or left blank.

FOREIGN KEY Relates one table with another table.

UNIQUE Specifies that values of a column or combination of columns cannot be

repeated.

NOT NULL Specifies that a column cannot contain empty values.

CHECK Specifies a condition which each row of a table must satisfy.

Most of the DBMSs provide the facility for applying the integrity constraints. The database designer (or DBA) identifies

integrity constraints during database design. The application programmer can also identify integrity constraints in the

program code during developing the application program. The integrity constraints are automatically checked at the

time of data entry or when the record is updated. If the data entry operator (end-user) violates an integrity constraint,

3

the data is not inserted or updated into the database and a message is displayed by the system. For example, when you

draw amount from the bank through ATM card, then your account balance is compared with the amount you are

drawing. If the amount in your account balance is less than the amount you want to draw, then a message is displayed

on the screen to inform you about your account balance.

6. Data Security:

Data security is the protection of the database from unauthorized users. Only the authorized persons are allowed to

access the database. Some of the users may be allowed to access only a part of database i.e., the data that is related to

them or related to their department. Mostly, the DBA or head of a department can access all the data in the database.

Some users may be permitted only to retrieve data, whereas others are allowed to retrieve as well as to update data.

The database access is controlled by the DBA. He creates the accounts of users and gives rights to access the database.

Typically, users or group of users are given usernames protected by passwords.

Most of the DBMSs provide the security sub-system, which the DBA uses to create accounts of users and to specify

account restrictions. The user enters his/her account number (or username) and password to access the data from

database. For example, if you have an account of e-mail in the "hotmail.com" (a popular website), then you have to give

your correct username and password to access your account of e-mail. Similarly, when you insert your ATM card into the

Auto Teller Machine (ATM) in a bank, the machine reads your ID number printed on the card and then asks you to enter

your pin code (or password). In this way, you can access your account.

7. Data Atomicity:

A transaction in commercial databases is referred to as atomic unit of work. For example, when you purchase something

from a point of sale (POS) terminal, a number of tasks are performed such as;

Company stock is updated.

Amount is added in company's account.

Sales person's commission increases etc.

All these tasks collectively are called an atomic unit of work or transaction. These tasks must be completed in all;

otherwise partially completed tasks are rolled back. Thus through DBMS, it is ensured that only consistent data exists

within the database.

8. Database Access Language:

Most of the DBMSs provide SQL as standard database access language. It is used to access data from multiple tables of a

database.

9. Development of Application:

The cost and time for developing new applications is also reduced. The DBMS provides too ls that can be used to develop

application programs. For example, some wizards are available to generate Forms and Reports. Stored procedures

(stored on server side) also reduce the size of application programs.

10. Creating Forms:

Form is very important object of DBMS. You can create Forms very easily and quickly in DBMS, Once a Form is created, it

can be used many times and it can be modified very easily. The created Forms are also saved along with database and

behave like a software component. A Form provides very easy way (user-friendly interface) to enter data into database,

edit data, and display data from database. The non-technical users can also perform various operations on databases

through Forms without going into the technical details of a database.

11. Report Writers:

Most of the DBMSs provide the report writer tools used to create reports. The users can create reports very easily and

quickly. Once a report is created, it can be used many times and it can be modified very easily. The created re ports are

also saved along with database and behave like a software component.

12. Control Over Concurrency:

In a computer file-based system, if two users are allowed to access data simultaneously, it is possible that they will

interfere with each other. For example, if both users attempt to perform update operation on the same record, then one

may overwrite the values recorded by the other. Most DBMSs have sub-systems to control the concurrency so that

transactions are always recorded" with accuracy.

4

13. Backup and Recovery Procedures:

In a computer file-based system, the user creates the backup of data regularly to protect the valuable data from

damaging due to failures to the computer system or application program. It is a time consuming method, if volume of

data is large. Most of the DBMSs provide the 'backup and recovery' sub-systems that automatically create the backup of

data and restore data if required. For example, if the computer system fails in the middle (or end) of an update

operation of the program, the recovery sub-system is responsible for making sure that the database is restored to the

state it was in before the program started executing.

14. Data Independence:

The separation of data structure of database from the application program that is used to access data from database is

called data independence. In DBMS, database and application programs are separated from each other. The DBMS sits

in between them. You can easily change the structure of database without modifying the application program. For

example you can modify the size or data type of a data items (fields of a database table). On the other hand, in

computer file-based system, the structure of data items are built into the individual application programs. Thus the data

is dependent on the data file and vice versa.

15. Advanced Capabilities:

DBMS also provides advance capabilities for online access and reporting of data through Internet. Today, most of the

database systems are online. The database technology is used in conjunction with Internet technology to access data on

the web servers.

Data Base Management Systems Architecture

Data Base Management Systems (DBMS) are very relevant in today’s world where information matters. Most business

operations of large companies are dependent on their databases in some way or the other. Many companies use their

data analysis methods to leverage the data in their databases and provide better service to customers and compete

with their business rivals. Databases are collections of data that has been organized in a certain way. The term DBMS

is a commonly used to refer to computer program that can help you store, change and retrieve the data in your

database. Most DBMS software products use SQL as the main query language – the language that lets you interact

with and extract results from your database quickly. SQL is the language used to query popular database systems like

Oracle, SQL Server and MySQL. Learning SQL and DBMS can help you become a database administrator.

DBMS Architecture DBMS architecture is the way in which the data in a database is viewed (or represented to) by users. It helps you

represent your data in an understandable way to the users, by hiding the complex bits that deal with the working of

the system. Remember, DBMS architecture is not about how the DBMS software operates or how it processes data.

We’re going to take a look at the ANSI-SPARC DBMS standard model. ANSI is the acronym for American National

Standards Institute. It sets standards for American goods so that the y can be used anywhere in the world without

compatibility problems. In the case of DBMS software, ANSI has standardized SQL, so that most DBMS products use

SQL as the main query language. The ANSI has also standardized a three level DBMS architecture model followed by

most database systems, and it’s known as the abstract ANSI-SPARC design standard.

The ANSI-SPARC Database Architecture is set up into three tiers. Let’s take a closer look at them.

The Internal Level (Physical Representation of Data) : The internal level is the lowest level in a three tiered database.

This level deals with how the stored data on the database is represented to the user. This level shows exactly how the

data is stored and organized for access on your system. This is the most technical of the three levels. However, the

internal level view is still abstract –even if it shows how the data is stored physically, it will not show how the

database software operates on it. So how exactly is data stored on this level? There are several co nsiderations to be

made when storing data. Some of them include figuring out the right space allocation techniques, data compression

techniques (if necessary), security and encryption and the access paths the software can take to retrieve the data.

Most DBMS software products make sure that data access is optimized and that data uses minimum storage space.

The OS you’re running is actually in charge of managing the physical storage space.

5

The Conceptual Level (Holistic Representation of Data) : The conceptual level tells you how the database was

structured logically. This level tells you about the relationship between the data members of your database, exactly

what data is stored in it and what a user will need to use the database. This level does not concern itself with how this

logical structure will actually be implemented. It’s actually an overview of your database. The conceptual level acts as

a sort of a buffer between the internal level and the external level. It helps hide the complexity of the databa se and

hides how the data is physically stored in it. The database administrator will have to be conversant with this layer,

because most of his operations are carried out on it. Only a database administrator is allowed to modify or structure

this level. It provides a global view of the database, as well as the hardware and software necessary for running it – all

important info for a database admin.

The External Level (User Representation of Data) : This is the uppermost level in the database. It implements the

concept of abstraction as much as possible. This level is also known as the view level because it deals with how a user

views your database. The external level is what allows a user to access a customized version of the data in your

database. Multiple users can work on a database on the same time because of it. The external level also hides the

working of the database from your users. It maintains the security of the database by giving users access only to the

data which they need at a particular time. Any data that is not needed will not be displayed. Three “schemas”

(internal, conceptual and external) show how the database is internally and externally structured, and so this type of

database architecture is also known as the “three-schema” architecture.

Functional dependency A functional dependency occurs when one attribute in a relation uniquely determines another attribute. This can be

written A -> B which would be the same as stating "B is functionally dependent upon A."

Examples: In a table listing employee characteristic including Social Security Number (SSN) and name, it can be said that

name is functionally dependent upon SSN (or SSN -> name) because an employee's name can be uniquely determined

from their SSN. However, the reverse statement (name -> SSN) is not true because more than one employee can have

the same name but different SSNs.

Definition - What does Functional Dependency mean?

Functional dependency is a relationship that exists when one attribute uniquely determines another att ribute. If R is a

relation with attributes X and Y, a functional dependency between the attributes is represented as X ->Y, which specifies

Y is functionally dependent on X. Here X is a determinant set and Y is a dependent attribute. Each value of X is asso ciated

precisely with one Y value. Functional dependency in a database serves as a constraint between two sets of attributes.

Defining functional dependency is an important part of relational database design and contributes to aspect

normalization.

What is Normalization? Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization

process: eliminating redundant (unwanted) data (for example, storing the same data in more than one table) and

ensuring dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce

the amount of space a database consumes and ensure that data is logically stored.

Techopedia - Normalization is the process of reorganizing data in a database so that it meets two basic requirements: (1)

There is no redundancy of data (all data is stored in only one place), and (2) data dependencies are logical (all related

data items are stored together). Normalization is important for many reasons, but chiefly because it allows databases to

take up as little disk space as possible, resulting in increased performance. Normalization is also known as data

normalization.

The Normal Forms

The database community has developed a series of guidelines for e nsuring that databases are normalized. These are

referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal

form or 1NF) through five (fifth normal form or 5NF). In practical applications, you'll of ten see 1NF , 2NF ,

6

and 3NF along with the occasional 4NF. Fifth normal form is very rarely seen and won't be discussed in this article. Before we begin our discussion of the normal forms, it's important to point out that they are guidelines and guideline s

only. Occasionally, it becomes necessary to stray from them to meet practical business requirements. However, when

variations take place, it's extremely important to evaluate any possible ramifications they could have on your system

and account for possible inconsistencies. That said, let's explore the normal forms.

First Normal Form (1NF)

First normal form (1NF) sets the very basic rules for an organized database: Eliminate duplicative columns from the

same table. Create separate tables for each group of related data and identify each row with a unique column or set of

columns (the primary key). Second Normal Form (2NF) Second normal form (2NF) further addresses the concept of removing duplicative data: Meet all the requirements of the first normal form. Remove subsets of data that apply to multiple rows of a table and place them in separate tables.

Create relationships between these new tables and their predecessors through the use of foreign keys. Third Normal Form (3NF) Third normal form (3NF) goes one large step further: Meet all the requirements of the second normal form.

Remove columns that are not dependent upon the primary key.

Boyce-Codd Normal Form (BCNF or 3.5NF) The Boyce-Codd Normal Form also referred to as the "third and half (3.5) normal form", adds one more requirement:

Meet all the requirements of the third normal form. Every determinant must be a candidate key.

Fourth Normal Form (4NF) Finally, fourth normal form (4NF) has one additional requirement: Meet all the requirements of the third normal form.

A relation is in 4NF if it has no multi-valued dependencies. Remember, these normalization guidelines are cumulative.

For a database to be in 2NF, it must first fulfill all the criteria of a 1NF database.

Data Models E- R Model is a graphical representation of entities and their relationships to each other, typically used in computing in

regard to the organization of data within databases or information systems. An entity is a piece of data-an object or

concept about which data is stored.

A relationship is how the data is shared between entities. There are three types of relationships between entities:

1. One-to-One

One instance of an entity (A) is associated with one other instance of another entity (B). For example, in a database of

employees, each employee name (A) is associated with only one social security number (B).

2. One-to-Many

One instance of an entity (A) is associated with zero, one or many instances of another entity (B), but for one instance of

entity B there is only one instance of entity A. For example, for a company with all employees working in one building,

the building name (A) is associated with many different employees (B), but those employees all share the same singular

association with entity A.

3. Many-to-Many

One instance of an entity (A) is associated with one, zero or many instances of another entity (B), and one instance of

entity B is associated with one, zero or many instances of entity A. For example, for a company in which all of its

http://www.webopedia.com/TERM/D/database.html

http://www.webopedia.com/TERM/O/object.html

7

employees work on multiple projects, each instance of an employee (A) is associated with many instances of a project

(B), and at the same time, each instance of a project (B) has multiple employees (A) associated with it.

Relational Model The relational model for database management is a database model based on first-order predicate logic, first formulated

and proposed in 1969 by Edgar F. Codd. In the relational model of a database, all data is represented in terms of tuples,

grouped into relations. A database organized in terms of the relational model is a relational database.

The purpose of the relational model is to provide a declarative method for specifying data and queries: users directly

state what information the database contains and what information they want from it, and let the database

management system software take care of describing data structures for storing the data and retrieval procedures for

answering queries.

Most relational databases use the SQL data definition and query language; these systems implement what can be

regarded as an engineering approximation to the relational model. A table in an SQL database schema corresponds to a

predicate variable; the contents of a table to a relation; key constraints, other constraints, and SQL queries correspond

to predicates. However, SQL databases deviate from the relational model in many details, and Codd fiercely argued

against deviations that compromise the original principles.

Diagram of an example database according to the Relational model.

http://en.wikipedia.org/wiki/Database

http://en.wikipedia.org/wiki/Database_model

http://en.wikipedia.org/wiki/First-order_logic

http://en.wikipedia.org/wiki/Edgar_F._Codd

http://en.wikipedia.org/wiki/Tuple

http://en.wikipedia.org/wiki/Relation_(database)

http://en.wikipedia.org/wiki/Relational_database

http://en.wikipedia.org/wiki/Declarative_programming


http://en.wikipedia.org/wiki/Relational_model#SQL_and_the_relational_model

http://en.wikipedia.org/wiki/File:Relational_Model.svg

8

In the relational model, related records are linked together with a "key".

Network model The Network model replaces the hierarchical tree with a graph thus allowing more general connections among the

nodes. The main difference of the network model from the hierarchical model, is its ability to handle many to many (N:

N) relations. In other words, it allows a record to have more than one parent. Suppose an employee works for two

departments. The strict hierarchical arrangement is not possible here and the tree becomes a more generalized graph -

a network. The network model was evolved to specifically handle non-hierarchical relationships. As shown below data

can belong to more than one parent. Note that there are lateral connections as well as top -down connections. A

network structure thus allows 1:1 (one: one), l: M (one: many), M: M (many: many) relationships among entities. In

network database terminology, a relationship is a set. Each set is made up of at least two types of records: an owner

record (equivalent to parent in the hierarchical model) and a me mber record (similar to the child record in the

hierarchical model). The database of Customer-Loan, which we discussed earlier for hierarchical model, is now

represented for Network model as shown. It can easily depict that now the information about the jo int loan L1 appears

single time, but in case of hierarchical model it appears for two times. Thus, it reduces the redundancy and is better as

compared to hierarchical model.

Hierarchical Model The Hierarchical Data Model is a way of organizing a database with multiple one to many relationships. The structure is

based on the rule that one parent can have many children but children are allowed only one parent. This structure

allows information to be repeated through the parent child relations created by IBM and was implemented mainly in

their Information Management System. (IMF), the precursor to the DBMS.

A hierarchical database model is a data model in which the data is organized into a tree-like structure. The data is stored

as records which are connected to one another through links. A record is a collection of fields, with each field containing

only one value. The entity type of a record defines which fields the record contains.

A record in the hierarchical database model corresponds to a row (or tuple) in the relational database model and an

entity type corresponds to a table (or relation). The hierarchical database model mandates that each child record has

only one parent, whereas each parent record can have one or more child records. In order to retrieve data from a

http://en.wikipedia.org/wiki/Data_model

http://en.wikipedia.org/wiki/Tree_data_structure

http://en.wikipedia.org/wiki/Relational_database_model

http://en.wikipedia.org/wiki/File:Relational_key.png

http://en.wikipedia.org/wiki/File:Hierarchical_Model.svg

9

hierarchical database the whole tree needs to be traversed starting from the root node. This model is recognized as the

first database model created by IBM in the 1960

Distributed database A distributed database is a database that is under the control of a central database management system (DBMS) in

which storage devices are not all attached to a common CPU. It may be stored in multiple computers located in the

same physical location, or may be dispersed over a network of interconnected computers. Collections of data (e.g. in a

database) can be distributed across multiple physical locations. A distributed database can reside on network servers on

the Internet, on corporate intranets or extranets, or on other company networks. The replication and distribution of

databases improves database performance at end-user worksites.

To ensure that the distributive databases are up to date and current, there are two processes: replication and

duplication. Replication involves using specialized software that looks for changes in the distributive database. Once the

changes have been identified, the replication process makes all the databases look the same. The replication process can

be very complex and time consuming depending on the size and number of the distributive databases. This process can

also require a lot of time and computer resources. Duplication on the other hand is not as complicated. It basically

identifies one database as a master and then duplicates that database. The duplication process is normally done at a set

time after hours. This is to ensure that each distributed location has the same data. In the duplication process, changes

to the master database only are allowed. This is to ensure that local data will not be overwritten. Both of the processes

can keep the data current in all distributive locations.

Besides distributed database replication and fragmentation, there are many other distributed database design

technologies. For example, local autonomy, synchronous and asynchronous distributed database technologies. These

technologies' implementation can and does depend on the needs of the business and the sensitivity/confidentiality of

the data to be stored in the database, and hence the price the business is wil ling to spend on ensuring data security,

consistency and integrity.

Object oriented database An object database (also object-oriented database management system) is a database management system in which

information is represented in the form of objects as used in object-oriented programming. Object databases are

different from relational databases which are table-oriented. Object-relational databases are a hybrid of both

approaches. Object databases have been considered since the early 1980s.

Object-oriented database management systems (OODBMSs) combine database capabilities with object-oriented

programming language capabilities. OODBMSs allow object-oriented programmers to develop the product, store them

as objects, and replicate or modify existing objects to make new objects within the OODBMS. Because the database is

http://www.wikipedia.org/wiki/Database

http://www.wikipedia.org/wiki/Database_management_system

http://www.wikipedia.org/wiki/Computer_storage

http://www.wikipedia.org/wiki/Central_processing_unit

http://www.wikipedia.org/wiki/Computers

http://www.wikipedia.org/wiki/Computer_network

http://www.wikipedia.org/wiki/Network_servers

http://www.wikipedia.org/wiki/Internet

http://www.wikipedia.org/wiki/Intranets

http://www.wikipedia.org/wiki/Extranets

http://www.wikipedia.org/wiki/Computer_network

http://www.wikipedia.org/wiki/End-user

http://en.wikipedia.org/wiki/Database_management_system

http://en.wikipedia.org/wiki/Object_(computer_science)

http://en.wikipedia.org/wiki/Object-oriented_programming

http://en.wikipedia.org/wiki/Relational_database



10

integrated with the programming language, the programmer can maintain consistency within one environment, in that

both the OODBMS and the programming language will use the same model of representation. Relational DBMS projects,

by way of contrast, maintain a clearer division between the database model and the application.

As the usage of web-based technology increases with the implementation of Intranets and extranets, companies have a

vested interest in OODBMSs to display their complex data. Using a DBMS that has been specifically designed to store

data as objects gives an advantage to those companies that are geared towards multimedia presentation or

organizations that utilize computer-aided design (CAD). Some object-oriented databases are designed to work well

with object-oriented programming languages such as Delphi, Ruby, Python, Perl, Java, C#, Visual Basic

.NET, C++,Objective-C and Smalltalk; others have their own programming languages. OODBMSs use exactly the same

model as object-oriented programming languages.

Spatial database

A spatial database is a database that is optimized to store and query data that represents objects defined in a geometric

space. Most spatial databases allow representing simple geometric objects such as points, li nes and polygons. Some

spatial databases handle more complex structures such as 3D objects, topological coverage’s, linear networks, and TINs.

While typical databases are designed to manage various numeric and character types of data, additional functionality

needs to be added for databases to process spatial data types efficiently. These are typically called geometry or feature.

The Open Geospatial Consortium created the Simple Features specification and sets standards for adding spatial

functionality to database systems.

Multimedia database A Multimedia database (MMDB) is a collection of related multimedia data. The multimedia data include one or more primary media data types such as text, images, graphicobjects (including drawings, sketches and illustrations) animation sequences, audio and video.

A Multimedia Database Management System (MMDBMS) is a framework that manages different types of data

potentially represented in a wide diversity of formats on a wide array of media sources. It provides support for multimedia data types, and facilitate for creation, storage, access, query and control of a multimedia database.

Crash Recovery System Though we are living in highly technologically advanced era where hundreds of satellite monitor the earth and at every

second billions of people are connected through information technology, failure is expected but not every time

acceptable.

DBMS is highly complex system with hundreds of transactions being executed every second. Availability of DBMS

depends on its complex architecture and underlying hardware or system software. If it fails or crashes amid transactions

being executed, it is expected that the system would follow some sort of algorithm or techniques to recover from

crashes or failures.

Failure Classification

To see where the problem has occurred we generalize the failure into various categories, as follows:

TRANSACTION FAILURE

When a transaction is failed to execute or it reaches a point after which it cannot be completed successfully it has to

abort. This is called transaction failure. Where only few transaction or process are hurt.

Reason for transaction failure could be:

Logical errors: where a transaction cannot complete because of it has some code error or any internal error condition

System errors: where the database system itself terminates an active transaction because DBMS is not able to execute it

or it has to stop because of some system condition. For example, in case of deadlock or resource unavailability systems

aborts an active transaction.

http://en.wikipedia.org/wiki/Computer-aided_design

http://en.wikipedia.org/wiki/Object-oriented_programming_language

http://en.wikipedia.org/wiki/Object_Pascal

http://en.wikipedia.org/wiki/Ruby_(programming_language)

http://en.wikipedia.org/wiki/Python_(programming_language)

http://en.wikipedia.org/wiki/Perl

http://en.wikipedia.org/wiki/Java_(programming_language)

http://en.wikipedia.org/wiki/C_Sharp_(programming_language)

http://en.wikipedia.org/wiki/Visual_Basic_.NET

http://en.wikipedia.org/wiki/Visual_Basic_.NET

http://en.wikipedia.org/wiki/C%2B%2B

http://en.wikipedia.org/wiki/Objective-C

http://en.wikipedia.org/wiki/Smalltalk

http://en.wikipedia.org/wiki/Database

http://en.wikipedia.org/wiki/Triangulated_irregular_network

http://en.wikipedia.org/wiki/Open_Geospatial_Consortium

http://en.wikipedia.org/wiki/Simple_Features

http://en.wikipedia.org/wiki/Multimedia

http://en.wikipedia.org/wiki/Data_(computing)

http://en.wikipedia.org/wiki/Multimedia

http://en.wikipedia.org/wiki/Data_types

http://en.wikipedia.org/wiki/Written_text

http://en.wikipedia.org/wiki/Image

http://en.wikipedia.org/wiki/Graphic

http://en.wikipedia.org/wiki/Graphic

http://en.wikipedia.org/wiki/Animation

http://en.wikipedia.org/wiki/Sound

http://en.wikipedia.org/wiki/Video

http://en.wikipedia.org/wiki/Framework_(computer_science)

http://en.wikipedia.org/wiki/File_format

http://en.wikipedia.org/wiki/Data_types

http://en.wikipedia.org/wiki/Query_(database)

11

SYSTEM CRASH

There are problems, which are external to the system, which may cause the system to stop abruptly and cause the

system to crash. For example interruption in power supply, failure of underlying hardware or software failure.

Examples may include operating system errors.

DISK FAILURE:

In early days of technology evolution, it was a common problem where hard disk drives or storage drives used to fail

frequently. Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or any other failu re,

which destroys all or part of disk storage

Storage Structure

We have already described storage system here. In brief, the storage structure can be divided in various categories:

Volatile storage: As name suggests, this storage does not survive system crashes and mostly placed very closed to CPU

by embedding them onto the chipset itself for examples: main memory, cache memory. They are fast but can store a

small amount of information.

Nonvolatile storage: These memories are made to survive system crashes. They are huge in data storage capacity but

slower in accessibility. Examples may include, hard disks, magnetic tapes, flash memory, non-volatile (battery backed up)

RAM.

Recovery and Atomicity

When a system crashes, it many have several transactions being executed and various files opened for them to

modifying data items. As we know that transactions are made of various operations, which are atomic in nature. But

according to ACID properties of DBMS, atomicity of transactions as a whole must be maintained that is, either all

operations are executed or none.

When DBMS recovers from a crash it should maintain the following:

It should check the states of all transactions, which were being executed.

A transaction may be in the middle of some operation; DBMS must ensure the atomicity of transaction in this case.

It should check whether the transaction can be completed now or needs to be rolled back.

No transactions would be allowed to left DBMS in inconsistent state.

There are two types of techniques, which can help DBMS in recovering as well as maintaining the atomicity of

transaction: Maintaining the logs of each transaction, and writing them onto some stable storage before actually

modifying the database. Maintaining shadow paging, where the changes are done on a volatile memory and later the

actual database is updated.

Log-Based Recovery

Log is a sequence of records, which maintains the records of actions performed by a transaction. It is important that the

logs are written prior to actual modification and stored on a stable storage media, which is failsafe.

Log based recovery works as follows:

The log file is kept on stable storage media

When a transaction enters the system and starts execution, it writes a log about it

<Tn, Start>

When the transaction modifies an item X, it write logs as follows:

<Tn, X, V1, V2>

It reads Tn has changed the value of X, from V1 to V2.

When transaction finishes, it logs:

<Tn, commit>

Database can be modified using two approaches:

Deferred database modification: All logs are written on to the stable storage and database is updated when transaction

commits.

12

Immediate database modification: Each log follows an actual database modification. That is, database is modified

immediately after every operation.

Recovery with concurrent transactions

When more than one transaction is being executed in parallel, the logs are interleaved. At the time of recovery it would

become hard for recovery system to backtrack all logs, and then start recovering. To ease this situation most modern

DBMS use the concept of 'checkpoints'.

CHECKPOINT

Keeping and maintaining logs in real time and in real environment may fill out all the memory space available in the

system. At time passes log file may be too big to be handled at all. Checkpoint is a mechanism where all the previous

logs are removed from the system and stored permanently in storage disk. Checkpoint declares a point before which the

DBMS was in consistent state and all the transactions were committed.

RECOVERY

When system with concurrent transaction crashes and recovers, it does behave in the following manner:

[Image: Recovery with concurrent transactions]

The recovery system reads the logs backwards from the end to the last Checkpoint.

It maintains two lists, undo-list and redo-list.

If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>, it puts the transaction in redo-

list.

If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it puts the transaction in undo-list.

All transactions in undo-list are then undone and their logs are removed. All transaction in redo-list, their previous logs

are removed and then redone again and log saved

Database security / Authorization concerns the use of a broad range of information security controls to

protect databases (potentially including the data, the database applications or stored functions, the database systems,

the database servers and the associated network links) against compromises of their confidentiality, integrity and

availability. It involves various types or categories of controls, such as technical, procedural/administrative and

physical. Database security is a specialist topic within the broader realms of computer security, information

security and risk management. Security risks to database systems include, for example:

Unauthorized or unintended activity or misuse by authorized database users, database administrators, or

network/systems managers, or by unauthorized users or hackers (e.g. inappropriate access to sensitive data, metadata

or functions within databases, or inappropriate changes to the database programs, structures or security

configurations);

Malware infections causing incidents such as unauthorized access, leakage or disclosure of personal or proprietary data,

deletion of or damage to the data or programs, interruption or denial of authorized access to the database, attacks on

other systems and the unanticipated failure of database services;

http://en.wikipedia.org/wiki/Computer_security

http://en.wikipedia.org/wiki/Information_security


http://en.wikipedia.org/wiki/Risk_management

13

Overloads, performance constraints and capacity issues resulting in the inability of authorized users to use databases as

intended;

Physical damage to database servers caused by computer room fires or floods, overheating, lightning, accidental liquid

spills, static discharge, electronic breakdowns/equipment failures and obsolescence;

Design flaws and programming bugs in databases and the associated programs and systems, creating various security

vulnerabilities (e.g. unauthorized privilege escalation), data loss/corruption, performance degradation etc.;

Data corruption and/or loss caused by the entry of invalid data or commands, mistakes in database or system

administration processes, sabotage/criminal damage etc.

Many layers and types of information security control are appropriate to databases, including:

Access control

Auditing

Authentication

Encryption

Integrity controls

Backups

Application security

Database Security applying Statistical Method

Traditionally databases have been largely secured against hackers through network security measures such as firewalls,

and network-based intrusion detection systems. While network security controls remain valuable in this regard, securing

the database systems themselves, and the programs/functions and data within them, has arguably become more critical

as networks are increasingly opened to wider access, in particular access from the Internet. Furthermore, system,

program, function and data access controls, along with the associated user identification, authentication and rights

management functions, have always been important to limit and in some cases log the activities of authorized users and

administrators. In other words, these are complementary approaches to database security, working from both the

outside-in and the inside-out as it were.

Many organizations develop their own "baseline" security standards and designs detailing basic security control

measures for their database systems. These may reflect general information security requirements or obligations

imposed by corporate information security policies and applicable laws and regulations (e.g. concerning privacy,

financial management and reporting systems), along with generally accepted good database security practices (such as

appropriate hardening of the underlying systems) and perhaps security recommendations from the relevant database

system and software vendors. The security designs for specific database systems typically specify further security

administration and management functions (such as administration and reporting of user access rights, log management

and analysis, database replication/synchronization and backups) along with various business-driven information security

controls within the database programs and functions (e.g. data entry validation and audit trails). Furthermore, various

security-related activities (manual controls) are normally incorporated into the procedures, guidelines etc. relating to

the design, development, configuration, use, management and maintenance of databases.

Data Warehouse Architecture Different data warehousing systems have different structures. Some may have an ODS (operational data store), while

some may have multiple data marts. Some may have a small number of data sources, while some may have dozens of

data sources. In view of this, it is far more reasonable to present the different layers of a data warehouse architecture

rather than discussing the specifics of any one system. In general, al l data warehouse systems have the following layers:

Data Source Layer

Data Extraction Layer

Staging Area

ETL Layer

Data Storage Layer

http://en.wikipedia.org/wiki/Privilege_escalation


http://en.wikipedia.org/wiki/Access_control

http://en.wikipedia.org/wiki/Database_audit

http://en.wikipedia.org/wiki/Authentication

http://en.wikipedia.org/wiki/Encryption

http://en.wikipedia.org/wiki/Data_integrity

http://en.wikipedia.org/wiki/Backups

http://en.wikipedia.org/wiki/Application_security

http://en.wikipedia.org/w/index.php?title=Database_Security_applying_Statistical_Method&action=edit&redlink=1

http://en.wikipedia.org/wiki/Network_security

http://en.wikipedia.org/wiki/Firewall_(networking)

http://en.wikipedia.org/wiki/Intrusion_detection

http://en.wikipedia.org/wiki/Audit_trail

14

Data Logic Layer

Data Presentation Layer

Metadata Layer

System Operations Layer

Data Source Layer

This represents the different data sources that feed data into the data warehouse. The data source can be of any format

-- plain text file, relational database, other types of database, Excel file, etc., can all act as a data source.

Many different types of data can be a data source:

-- such as sales data, HR data, product data, inventory data, marketing data, systems data.

-party data, such as census data, demographics data, or survey data.

All these data sources together form the Data Source Layer.

Data Extraction Layer

Data gets pulled from the data source into the data warehouse system. There is likely some minimal data cleansing, but

there is unlikely any major data transformation.

Staging Area

This is where data sits prior to being scrubbed and transformed into a data warehouse / data mart. Having one common

area makes it easier for subsequent data processing / integration.

ETL Layer

This is where data gains its "intelligence", as logic is applied to transform the data from a transactional nature to an

analytical nature. This layer is also where data cleansing happens. The ETL design phase is often the most time-

consuming phase in a data warehousing project, and an ETL tool is often used in this layer.

Data Storage Layer

This is where the transformed and cleansed data sit. Based on scope and functionality, 3 types of entities can be found

here: data warehouse, data mart, and operational data store (ODS). In any given system, you may have just one of the

three, two of the three, or all three types.

Data Logic Layer

This is where business rules are stored. Business rules stored here do not affect the underlying data transformation

rules, but do affect what the report looks like.

Data Presentation Layer

This refers to the information that reaches the users. This can be in a form of a tabular / graphical report in a browser,

an emailed report that gets automatically generated and sent every day, or an alert that warns users of exceptions,

among others. Usually an tool and/or a reporting tool is used in this layer.

Metadata Layer

This is where information about the data stored in the data warehouse system is stored. A logical data model would be

an example of something that's in the metadata layer. Ametadata tool is often used to manage metadata.

System Operations Layer

This layer includes information on how the data warehouse system operates, such as ETL job status, system

performance, and user access history.

Evolution of data warehousing In the 1990's as organizations of scale began to need more timely data about their business, they found that traditional

information systems technology was simply too cumbersome to provide relevant data efficiently and quickly.

Completing reporting requests could take days or weeks using antiquated reporting tools that were designed more or

http://www.1keydata.com/datawarehousing/etl.html

http://www.1keydata.com/datawarehousing/tooletl.html

http://www.1keydata.com/datawarehousing/toolreporting.html

http://www.1keydata.com/datawarehousing/toolmetadata.html

15

less to 'execute' the business rather than 'run' the business.

From this idea, the data warehouse was born as a place where relevant data could be held for completing s trategic

reports for management. The key here is the word 'strategic' as most executives were less concerned with the day to

day operations than they were with a more overall look at the model and business functions.

As with all technology, over the course of the latter half of the 20th century, we saw increased numbers and types of

databases. Many large businesses found themselves with data scattered across multiple platforms and variations of

technology, making it almost impossible for any one individual to use data from multiple sources. A key idea within data

warehousing is to take data from multiple platforms/technologies (As varied as spreadsheets, DB2 databases, IDMS

records, and VSAM files) and place them in a common location that uses a common querying tool. In this way

operational databases could be held on whatever system was most efficient for the operational business, while the

reporting / strategic information could be held in a common location using a common language. Data Warehouses take

this even a step farther by giving the data itself commonality by defining what each term means and keeping it standard.

(An example of this would be gender which can be referred to in many ways, but should be standardized on a data

warehouse with one common way of referring to each sex).

All of this was designed to make decision support more readily available and without affecting day to day operations.

One aspect of a data warehouse that should be stressed is that it is NOT a location for ALL of a businesses data, but

rather a location for data that is 'interesting'. Data that is interesting will assist decision makers in making strategic

decisions relative to the organization's overall mission.

Benefits of Data Warehousing The successful implementation of a data warehouse can bring major, benefits to an organization including:

• Potential high returns on investment - Implementation of data warehousing by an organization requires a huge

investment typically from Rs 10 lac to 50 lacs. However, a study by the International Data Corporation (IDC) in 1996

reported that average three-year returns on investment (RO I) in data warehousing reached 401%.

• Competitive advantage - The huge returns on investment for those companies that have successfully implemented a

data warehouse is evidence of the enormous competitive advantage that accompanies this technology. The competitive

advantage is gained by allowing decision-makers access to data that can reveal previously unavailable, unknown, and

untapped information on, for example, customers, trends, and demands.

• Increased productivity of corporate decision-makers - Data warehousing improves the productivity of corporate

decision-makers by creating an integrated database of consistent, subject-oriented, historical data. It integrates data

from multiple incompatible systems into a form that provides one consistent view of the organization. By transforming

data into meaningful information, a data warehouse allows business managers to perform more substantive, accurate,

and consistent analysis.

• More cost-effective decision-making - Data warehousing helps to reduce the overall cost of the· product· by reducing

the number of channels.

• Better enterprise intelligence - It helps to provide better enterprise intelligence.

• Enhanced customer service.

• It is used to enhance customer" service.

Problems of Data Warehousing The problems associated with developing and managing a data warehousing are as follows:

Underestimation of resources of data loading - Sometimes we underestimate the time required to extract, clean, and

load the data into the warehouse. It may take the significant proportion of the total development time, although some

tools are there which are used to reduce the time and effort spent on this process.

Hidden problems with source systems - Sometimes hidden .problems associated with the source systems feeding the

data warehouse may be identified after years of being undetected. For example, when entering the details of a new

16

property, certain fields may allow nulls which may result in staff entering incomplete property data, even when available

and applicable.

Required data not captured - In some cases the required data is not captured by the source systems which may be very

important for the data warehouse purpose. For example the date of registration for the property may be not used in

source system but it may be very important analysis purpose.

Increased end-user demands - After satisfying some of end-users queries, requests for support from staff may increase

rather than decrease. This is caused by an increasing awareness of the users on the capabilities and value of the data

warehouse. Another reason for increasing demands is that once a data warehouse is online, it is often the case that the

number of users and queries increase together with requests for answers to more and more complex queries.

Data homogenization - The concept of data warehouse deals with similarity of data formats between different data

sources. Thus, results in to lose of some important value of the data.

High demand for resources - The data warehouse requires large amounts of data.

Data ownership - Data warehousing may change the attitude of end-users to the ownership of data. Sensitive data that

owned by one department has to be loaded in data warehouse for decision making purpose. But some time it results in

to reluctance of that department because it may hesitate to share it with others.

High maintenance - Data warehouses are high maintenance systems. Any reorganization· of the business processes and

the source systems may affect the data warehouse and it results high maintenance cost.

Long-duration projects - The building of a warehouse can take up to three years, which is why some organizations are

reluctant in investigating in to data warehouse. Some only the historical data of a particular department is captured in

the data warehouse resulting data marts. Data marts support only the requirements of a particular department and

limited the functionality to that department or area only.

Complexity of integration - The most important area for the management of a data warehouse is the integration

capabilities. An organization must spend a significant amount of time determining how well the various different data

warehousing tools can be integrated into the overall solution that is needed. This can be a very difficult task, as there are

a number of tools for every operation of the data warehouse.

Data mining A process used by companies to turn raw data into useful information. By using software to look for patterns in large

batches of data, businesses can learn more about their customers and develop more effective marketing strategies as

well as increase sales and decrease costs. Data mining depends on effective data collection and warehousing as well as

computer processing. Grocery stores are well-known users of data mining techniques. Many supermarkets offer free

loyalty cards to customers that give them access to reduced prices not available to non-members. The cards make it

easy for stores to track who is buying what, when they are buying it, and at what price. The stores can then use this

data, after analyzing it, for multiple purposes, such as offering customers coupons that are targeted to their buying

habits and deciding when to put items on sale and when to sell them at full price.

Data mining can be a cause for concern when only selected information, which is not representative of the overall

sample group, is used to prove a certain hypothesis.

Data mining process Cross-Industry Standard Process for Data Mining (CRISP-DM) consists of six phases intended as a cyclical process as the

following figure:

17

Cross-Industry Standard Process for Data Mining (CRISP-DM)

Business understanding

In the business understanding phase: First, it is required to understand business objectives clearly and find out what are

the business’s needs.

Next, we have to assess the current situation by finding about the resources, assumptions, constraints and other

important factors which should be considered. Then, from the business objectives and current situations, we need to

create data mining goals to achieve the business objectives within the current situation. Finally, a good data mining plan

has to be established to achieve both business and data mining goals. The plan should be as detailed as possible.

Data understanding

First, the data understanding phase starts with initial data collection, which we collect from available data sources, to

help us get familiar with the data. Some important activities must be performed including data load and data integration

in order to make the data collection successfully. Next, the “gross” or “surface” properties of acquired data needs to be

examined carefully and reported. Then, the data needs to be explored by tackling the data mining questions, which can

be addressed using querying, reporting and visualization. Finally, the data quality must be examined by answering some

important questions such as “Is the acquired data complete?”, “Is there any missing values in the acquired data?”

Data preparation

The data preparation typically consumes about 90% of the time of the project. The outcome of the data preparation

phase is the final data set. Once available data sources are identified, they need to be selected, cleaned, constructed and

formatted into the desired form. The data exploration task at a greater depth may be carried during this phase to notice

the patterns based on business understanding.

Modeling

First, modeling techniques have to be selected to be used for the prepared dataset.

Next, the test scenario must be generated to validate the quality and validity of the model.

Then, one or more models are created by running the modeling tool on the prepared dataset.

Finally, models need to be assessed carefully involving stakeholders to make sure that created models are met business

initiatives.

Evaluation

http://www.zentut.com/wp-content/uploads/2012/10/CRISP-DM.png

18

In the evaluation phase, the model results must be evaluated in the context of business objectives in the first phase. In

this phase, new business requirements may be raised due to the new patterns that has been discovered in the model

results or from other factors. Gaining business understanding is an iterative process in data mining. The go or no -go

decision must be made in this step to move to the deployment phase.

Deployment

The knowledge or information, which we gain through data mining process, needs to be presented in such a way that

stakeholders can use it when they want it. Based on the business requirements, the deployment phase could be as

simple as creating a report or as complex as a repeatable data mining process across the organization. In the

deployment phase, the plans for deployment, maintenance and monitoring have to be created for implementation and

also future supports. From the project point of view, the final report of the project needs to summary the project

experiences and review the project to see what need to improved created learned lessons.

Data mining techniques Association

Association (or relation) is probably the better known and most familiar and straightforward data mining technique.

Here, you make a simple correlation between two or more items, often of the same type to identify patterns. For

example, when tracking people's buying habits, you might identify that a customer always buys cream when they buy

strawberries, and therefore suggest that the next time that they buy strawberries they might also want to buy cream.

Classification

You can use classification to build up an idea of the type of customer, item, or object by describing multiple attributes to

identify a particular class. For example, you can easily classify cars into different types (sedan, 4x4, convertible) by

identifying different attributes (number of seats, car shape, driven wheels). Given a new car, you might apply it into a

particular class by comparing the attributes with our known definition. You can apply the same principles to customers,

for example by classifying them by age and social group.

Clustering

By examining one or more attributes or classes, you can group individual pieces of data together to form a structure

opinion. At a simple level, clustering is using one or more attributes as your basis for identifying a cluster of correlating

results. Clustering is useful to identify different information because it correlates with other examples so you can see

where the similarities and ranges agree. Clustering can work both ways. You can assume that there is a cluster at certain

point and then use our identification criteria to see if you are correct. In this example, a sample of sales data compares

the age of the customer to the size of the sale. It is not unreasonable to expect that people in their twenties (before

marriage and kids), fifties, and sixties (when the children have left home), have more disposable income.

Prediction

Prediction is a wide topic and runs from predicting the failure of components or machinery, to identifying fraud and

even the prediction of company profits. Used in combination with the other data mining techniques, prediction involves

analyzing trends, classification, pattern matching, and relation. By analyzing past events or instances, you can make a

prediction about an event. Using the credit card authorization, for example, you might combine decision tree analysis of

individual past transactions with classification and historical pattern matches to identify whether a transaction is

fraudulent. Making a match between the purchase of flights to the US and transactions in the US, it is likely that the

transaction is valid.

Sequential patterns

Often used over longer-term data, sequential patterns are a useful method for identifying trends, or regular occurrences

of similar events. For example, with customer data you can identify that customers buy a particular collection of

products together at different times of the year. In a shopping basket application, you can use this information to

automatically suggest that certain items be added to a basket based on their frequency and past purchasing history.

Decision trees

19

Related to most of the other techniques (primarily classification and prediction), the decision tree can be used either as

a part of the selection criteria, or to support the use and selection of specific data within the overall structure. Within

the decision tree, you start with a simple question that has two (or sometimes more) answers. Each answer leads to a

further question to help classify or identify the data so that it can be categorized, or so that a prediction can be made

based on each answer.

Decision tree

Combinations

In practice, it's very rare that you would use one of these exclusively. Classification and clustering are similar techniques.

By using clustering to identify nearest neighbors, you can further refine your classifications. Often, we use decision trees

to help build and identify classifications that we can track for a longer period to identify sequences and patterns.

Long-term (memory) processing

Within all of the core methods, there is often reason to record and learn from the information. In some techniques, it is

entirely obvious. For example, with sequential patterns and predictive learning you look back at data from multiple

sources and instances of information to build a pattern. In others, the process might be more explicit. Decision trees are

rarely built one time and are never forgotten. As new information, events, and data points are identified, it might be

necessary to build more branches, or even entirely new trees, to cope with the additional infor mation. You can

automate some of this process. For example, building a predictive model for identifying credit card fraud is about

building probabilities that you can use for the current transaction, and then updating that model with the new

(approved) transaction. This information is then recorded so that the decision can be made quickly the next time.

Ecommerce & Web application security Issues 1. Introduction

E-commerce is defined as the buying and selling of products or services over electronic systems such as the Internet and

to a lesser extent, other computer networks. It is generally regarded as the sales and commercial function of eBusiness.

There has been a massive increase in the level of trade conducted electronically since the widespread penetration of the

Internet. A wide variety of commerce is conducted via eCommerce, including electronic funds transfer, supply chain

management, Internet marketing, online transaction processing, electronic data interchange (EDI), inventory

management systems, and automated data collection systems. US online retail sales reached $175 billion in 2007 and

are projected to grow to $335 billion by 2012 (Mulpuru, 2008).

This massive increase in the uptake of eCommerce has led to a new generation of associated security threats, but any

eCommerce system must meet four integral requirements:

a) privacy – information exchanged must be kept from unauthorized parties

20

b) integrity – the exchanged information must not be altered or tampered with

c) authentication – both sender and recipient must prove their identities to each other and

d) non-repudiation – proof is required that the exchanged information was indeed received (Holcombe, 2007).

These basic maxims of eCommerce are fundamental to the conduct of secure business onli ne. Further to the

fundamental maxims of eCommerce above, eCommerce providers must also protect against a number of different

external security threats, most notably Denial of Service (DoS). These are where an attempt is made to make a computer

resource unavailable to its intended users through a variety of mechanisms discussed below. The financial services

sector still bears the brunt of e-crime, accounting for 72% of all attacks. But the sector that experienced the greatest

increase in the number of attacks was eCommerce. Attacks in this sector have risen by 15% from 2006 to 2007

(Symantec, 2007).

2. Privacy

Privacy has become a major concern for consumers with the rise of identity theft and impersonation, and any concern

for consumers must be treated as a major concern for eCommerce providers. According to Consumer Reports Money

Adviser (Perrotta, 2008), the US Attorney General has announced multiple indictments relating to a massive

international security breach involving nine major retailers and more than 40 million credit- and debit-card numbers. US

attorneys think that this may be the largest hacking and identity-theft case ever prosecuted by the justice department.

Both EU and US legislation at both the federal and state levels mandates certain organizations to inform customers

about information uses and disclosures. Such disclosures are typically accomplished through privacy policies, both online

and offline (Vail et al., 2008).

In a study by Lauer and Deng (2008), a model is presented linking privacy policy, through trustworthiness, to online

trust, and then to customers’ loyalty and their willingness to provide truthful information. The model was tested using a

sample of 269 responses. The findings suggested that consumers’ trust in a company is close ly linked with the

perception of the company’s respect for customer privacy (Lauer and Deng, 2007). Trust in turn is linked to increased

customer loyalty that can be manifested through increased purchases, openness to trying new products, and willingness

to participate in programs that use additional personal information. Privacy now forms an integral part of any e -

commerce strategy and investment in privacy protection has been shown to increase consumer’s spend, trustworthiness

and loyalty.

The converse of this can be shown to be true when things go wrong. In March 2008, the Irish online jobs board, jobs.ie,

was compromised by criminals and users’ personal data (in the form of CV’s) were taken (Ryan, 2008). Looking at the

real-time responses of users to this event on the popular Irish forum, Boards.ie, we can see that privacy is of major

concern to users and in the event of their privacy being compromised users become very agitated and there is an overall

negative effect on trust in e-commerce. User comments in the forum included: “I’m well p*ssed off about them keeping

my CV on the sly”; “I am just angry that this could have happened and to so many people”; “Mine was taken too. How

do I terminate my acc with jobs.ie”; “Grr, so annoyed, feel I should report i t to the Gardai now” (Boards.ie, 2008).

3. Integrity, Authentication & Non-Repudiation

In any e-commence system the factors of data integrity, customer & client authentication and non-repudiation are

critical to the success of any online business. Data integrity is the assurance that data transmitted is consistent and

correct, that is, it has not been tampered or altered in any way during transmission. Authentication is a means by which

both parties in an online transaction can be confident that they are who they say they are and non-repudiation is the

idea that no party can dispute that an actual event online took place. Proof of data integrity is typically the easiest of

these factors to successfully accomplish. A data hash or checksum, such as MD5 or CRC, is usually sufficient to establish

that the likelihood of data being undetectably changed is extremely low (Schlaeger and Pernul, 2005). Notwithstanding

these security measures, it is still possible to compromise data in transit through techniques such as phishing or man-in-

the-middle attacks (Desmedt, 2005). These flaws have led to the need for the development of strong verification and

security measurements such as digital signatures and public key infrastructures (PKI).

One of the key developments in e-commerce security and one which has led to the widespread growth of e -commerce is

the introduction of digital signatures as a means of verification of data integrity and authentication. In 1995, Utah

became the first jurisdiction in the world to enact an electronic signature law. An electronic signature may be defined as

21

“any letters, characters, or symbols manifested by electronic or similar means and executed or adopted by a party with

the intent to authenticate a writing” (Blythe, 2006). In order for a digital signature to attain the same legal status as an

ink-on-paper signature, asymmetric key cryptology must have been employed in its production (Blythe, 2006). Such a

system employs double keys; one key is used to encrypt the message by the sender, and a different, albeit

mathematically related, key is used by the recipient to decrypt the message (Antoniou et al., 2008). This is a very good

system for electronic transactions, since two stranger-parties, perhaps living far apart, can confirm each other’s identity

and thereby reduce the likelihood of fraud in the transaction. Non-repudiation techniques prevent the sender of a

message from subsequently denying that they sent the message. Digital Signatures using public-key cryptography and

hash functions are the generally accepted means of providing non-repudiation of communications

4. Technical Attacks

Technical attacks are one of the most challenging types of security compromise an e -commerce provider must face.

Perpetrators of technical attacks, and in particular Denial-of-Service attacks, typically target sites or services hosted on

high-profile web servers such as banks, credit card payment gateways, large online retailers and popular social

networking sites.

Denial of Service Attacks

Denial of Service (DoS) attacks consist of overwhelming a server, a network or a website in order to paralyze its normal

activity (Lejeune, 2002). Defending against DoS attacks is one of the most challenging security problems on the Internet

today. A major difficulty in thwarting these attacks is to trace the source of the attack, as they often use incorrect or

spoofed IP source addresses to disguise the true origin of the attack (Kim and Kim, 2006).

The United States Computer Emergency Readiness Team defines symptoms of deni al-of-service attacks to include

(McDowell, 2007):

• Unusually slow network performance

• Unavailability of a particular web site

• Inability to access any web site

• Dramatic increase in the number of spam emails received

DoS attacks can be executed in a number of different ways including:

ICMP Flood (Smurf Attack) – where perpetrators will send large numbers of IP packets with the source address faked to

appear to be the address of the victim. The network’s bandwidth is quick ly used up, preventing legitimate packets from

getting through to their destination

Teardrop Attack – A Teardrop attack involves sending mangled IP fragments with overlapping, over-sized, payloads to

the target machine. A bug in the TCP/IP fragmentation re-assembly code of various operating systems causes the

fragments to be improperly handled, crashing them as a result of this.

Phlashing – Also known as a Permanent denial-of-service (PDoS) is an attack that damages a system so badly that it

requires replacement or reinstallation of hardware. Perpetrators exploit security flaws in the remote management

interfaces of the victim’s hardware, be it routers, printers, or other networking hardware. These flaws leave the door

open for an attacker to remotely ‘update’ the device firmware to a modified, corrupt or defective firmware image,

therefore bricking the device and making it permanently unusable for its original purpose.

Distributed Denial-of-Service Attacks - Distributed Denial of Service (DDoS) attacks are one of the greatest security fear

for IT managers. In a matter of minutes, thousands of vulnerable computers can flood the victim website by choking

legitimate traffic (Tariq et al., 2006). A distributed denial of service attack (DDoS) occurs when multiple compromised

systems flood the bandwidth or resources of a targeted system, usually one or more web servers. The most famous

DDoS attacks occurred in February 2000 where websites including Yahoo, Buy.com, eBay, Amazon and CNN were

attacked and left unreachable for several hours each (Todd, 2000).

Brute Force Attacks – A brute force attack is a method of defeating a cryptographic scheme by trying a large number of

possibilities; for example, a large number of the possible keys in a key space in order to decry pt a message. Brute Force

Attacks, although perceived to be low-tech in nature are not a thing of the past. In May 2007 the internet infrastructure

in Estonia was crippled by multiple sustained brute force attacks against government and commercial institut ions in the

22

country (Sausner, 2008). The attacks followed the relocation of a Soviet World War II memorial in Tallinn in late April

made news around the world.

5. Non-Technical Attacks

Phishing Attacks

Phishing is the criminally fraudulent process of attempting to acquire sensitive information such as usernames,

passwords and credit card details, by masquerading as a trustworthy entity in an electronic communication. Phishing

scams generally are carried out by emailing the victim with a ‘fraudulent’ email from what purports to be a legitimate

organization requesting sensitive information. When the victim follows the link embedded within the email they are

brought to an elaborate and sophisticated duplicate of the legitimate organizations website. Phishi ng attacks generally

target bank customers, online auction sites (such as eBay), online retailers (such as amazon) and services providers (such

as PayPal). According to community banker (Swann, 2008), in more recent times cybercriminals have got more

sophisticated in the timing of their attacks with them posing as charities in times of natural disaster.

Social Engineering

Social engineering is the art of manipulating people into performing actions or divulging confidential information. Social

engineering techniques include pretexting (where the fraudster creates an invented scenario to get the victim to divulge

information), Interactive voice recording (IVR) or phone phishing (where the fraudster gets the victim to divulge

sensitive information over the phone) and baiting with Trojans horses (where the fraudster ‘baits’ the victim to load

malware unto a system). Social engineering has become a serious threat to e -commerce security since it is difficult to

detect and to combat as it involves ‘human’ factors which cannot be patched akin to hardware or software, albeit staff

training and education can somewhat thwart the attack (Hasle et al., 2005).

6. Conclusions

In conclusion the e-commerce industry faces a challenging future in terms of the security risks it must avert. With

increasing technical knowledge, and its widespread availability on the internet, criminals are becoming more and more

sophisticated in the deceptions and attacks they can perform. Novel attack strategies and vulnerabilities only really

become known once a perpetrator has uncovered and exploited them. In saying this, there are multiple security

strategies which any e-commerce provider can instigate to reduce the risk of attack and compromise significantly.

Awareness of the risks and the implementation of multi-layered security protocols, detailed and open privacy policies

and strong authentication and encryption measures will go a long way to assure the consumer and insure the risk of

compromise is kept minimal.

What is MySQL?

MySQL is a database system used on the web MySQL is a database system that runs on a server

MySQL is ideal for both small and large applications MySQL is very fast, reliable, and easy to use

MySQL supports standard SQL MySQL compiles on a number of platforms

MySQL is free to download and use MySQL is developed, distributed, and supported by Oracle Corporation

The data in MySQL is stored in tables. A table is a collection of related data, and it consists of columns and rows. Databases are useful when storing information categorically. A company may have a database with the

following tables: Employees

Products

Customers Orders

What is PHP? PHP is an acronym for "PHP Hypertext Preprocessor"

PHP is a widely-used, open source scripting language

23

PHP scripts are executed on the server

PHP costs nothing, it is free to download and use What is a PHP File?

PHP files can contain text, HTML, CSS, JavaScript, and PHP code PHP code are executed on the server, and the result is returned to the browser as plain HTML

PHP files have extension ".php" What Can PHP Do?

PHP can generate dynamic page content PHP can create, open, read, write, delete, and close files on the server PHP can collect form data PHP can send and receive cookies PHP can add, delete, modify data in your database PHP can restrict users to access some pages on your website PHP can encrypt data

With PHP you are not limited to output HTML. You can output images, PDF files, and even Flash movies. You can also output any text, such as XHTML and XML.

Connecting to and Disconnecting from the Server

To connect to the server, you will usually need to provide a MySQL user name when you invoke MySQL and, most likely,

a password. If the server runs on a machine other than the one where you log in, you will also need to specify a host

name. Contact your administrator to find out what connection parameters you should use to connect (that is, what host,

user name, and password to use). Once you know the proper parameters, you should be able to connect like this:

shell> mysql -h host -u user -p

Enter password: ********

host and user represent the host name where your MySQL server is running and the user name of your MySQL account.

Substitute appropriate values for your setup. The ******** represents your password; enter it when MySQL displays

the Enter password: prompt.

If that works, you should see some introductory information followed by a mysql> prompt:

shell> mysql -h host -u user -p

Enter password: ********

Welcome to the MySQL monitor. Commands end with ; or \g.

Your MySQL connection id is 25338 to server version: 5.0.96-standard

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql>

The mysql> prompt tells you that mysql is ready for you to enter commands.

If you are logging in on the same machine that MySQL is running on, you can omit the host, and simply use the

following:

shell> mysql -u user -p

If, when you attempt to log in, you get an error message such as ERROR 2002 (HY000): Can't connect to local MySQL

server through socket '/tmp/mysql.sock' (2), it means that the MySQL server daemon (Unix) or service (Windows) is not

running. Consult the administrator that is appropriate to your operating system.

Some MySQL installations permit users to connect as the anonymous (unnamed) user to the server running on the local

host. If this is the case on your machine, you should be able to connect to that server by invoking mysql without any

options:

http://dev.mysql.com/doc/refman/5.0/en/mysql.html



24

shell> mysql

After you have connected successfully, you can disconnect any time by typing QUIT (or \q) at the mysql> prompt:

mysql> QUIT

Bye

On Unix, you can also disconnect by pressing Control+D.

Most examples in the following sections assume that you are connected to the server. They indicate this by

the mysql> prompt.

Data type In computer science and computer programming, a data type or simply type is a classification identifying one of various

types of data, such as real, integer or Boolean, that determines the possible values for that type; the operations that can

be done on values of that type; the meaning of the data; and the way values of that type can be stored.

In MySQL there are three main types : text, number, and Date/Time types. Refer mysql book (Aplus).

The Java programming language is statically-typed, which means that all variables must first be declared before they can

be used.

All programs involve storing and manipulating data. Luckily (???) the computer only knows about a few types of data. These include, numbers, true/fa lse values, characters (a,b,c,1,2,3,etc), lists of data, and complex "Structures" of data, which build up new data types by combining the other data types.

Creating & Using Database , getting information about database and table– refer mysql book.

Batch mode - To run your SQL batch file from the command line, enter the following:

In Windows:

mysql < c:\commands.sql

Don’t forget to enclose the file path in quotes if there are any spaces.

Running the Batch Job as a Scheduled Task

In Windows

Batch jobs can be even more automated by running them as a scheduled task. In Windows, batch files are

used to execute DOS commands. We can schedule our batch job by placing the command code that we entered earlier in a file, such as “runsql.bat”. This file will contain only one line:

mysql < c:\commands.sql

To schedule the batch job:

1. Open Scheduled Tasks.

Click Start, click All Programs, point to Accessories, point to System Tools, and then click Scheduled Tasks:

25

2. Double-click Add Scheduled Task to start the Scheduled Task Wizard, and then click Next in the first dialog box.

3. The next dialog box displays a list of programs that are installed on your computer, either as part of

the Windows operating system, or as a result of software installation. Click Browse and select your SQL file, and then click Open.

4. Type a name for the task, and then choose when and how often you want the task to run, from one of the following options:

Daily

Weekly

Monthly

One time only

When my computer starts (before a user logs on) When I log on (only after the current user logs on)

5. Click Next, specify the information about the day and time to run the task, and then click Next.

6. OPTIONAL: Enter the name and password of the user who is associated with this task. Make sure that

you choose a user with sufficient permissions to run the program. By default, the wizard selects the name of the user who is currently logged on.

Scheduled Tasks in Windows

If at a later time you’d like to suspend this task, you can open it via the Scheduled Tasks dialog (pictured above) and deselect the Enabled checkbox on the “Task” tab:

26

The “Task” Tab Containing the “Enabled” Checkbox

Similarly, you can remove the task by deleting it like any file. In fact, the task is saved as a .job f ile in the WINNT\Tasks folder.

Mysql in Cloud

A database accessible to clients from the cloud and delivered to users on demand via the Internet from a cloud database

provider's servers. Also referred to as Database-as-a-Service (DBaaS), cloud databases can use cloud computing to

achieve optimized scaling, high availabil ity, multi-tenancy and effective resource allocation.

While a cloud database can be a traditional database such as a MySQL or SQL Server database that has been adopted for

cloud use, a native cloud database such as Xeround's MySQL Cloud database tends to better equipped to optimally use

cloud resources and to guarantee scalability as well as availability and stability.

Cloud databases can offer significant advantages over their traditional counterparts, including increased accessibility,

automatic failover and fast automated recovery from failures, automated on-the-go scaling, minimal investment and

maintenance of in-house hardware, and potentially better performance. At the same time, cloud databases have their

share of potential drawbacks, including security and privacy issues as well as the potential loss of or inability to access

critical data in the event of a disaster or bankruptcy of the cloud database service provider.

http://www.webopedia.com/TERM/D/database.html

http://www.webopedia.com/TERM/C/cloud.html

http://www.webopedia.com/TERM/C/cloud_computing.html

Education

Database Management Systems (Mcom Ecommerce)