DISTRIBUTED DATABASES AND CLIENT-SERVER ARCHITECHURES

DISTRIBUTED DATABASESDISTRIBUTED DATABASESANDAND

CLIENT-SERVER ARCHITECHURESCLIENT-SERVER ARCHITECHURES

CONTENTS CONTENTS .. Distributed Database ConceptsDistributed Database Concepts

Parallel Vs Distributed TechnologyParallel Vs Distributed Technology Advantages Advantages Additional Functions Additional Functions

Distribution Database DesignDistribution Database Design Data FragmentationData Fragmentation Data ReplicationData Replication Data AllocationData Allocation ExampleExample

CONTENTS CONTENTS (cont..)(cont..)

Types Of Distributed Database SystemsTypes Of Distributed Database Systems Query Processing in Distributed DatabaseQuery Processing in Distributed Database

Data Transfer CostsData Transfer Costs SemijoinSemijoin Query & Update DecompositionQuery & Update Decomposition

Overview Of Concurrency Control & Recovery in Overview Of Concurrency Control & Recovery in Distributed DatabasesDistributed Databases

Concurrency Control Based on Distributed Copy of a Data Concurrency Control Based on Distributed Copy of a Data ItemItem

Concurrency Control Based on VotingConcurrency Control Based on Voting Distributed Recovery Distributed Recovery

CONTENTS CONTENTS (cont..)(cont..)

Overview Of 3-Tier Client-Server Overview Of 3-Tier Client-Server ArchitectureArchitecture

Interaction between Application Server & Client Interaction between Application Server & Client ServerServer

Distributed Database In ORACLEDistributed Database In ORACLE

DISTRIBUTED DATABASE DISTRIBUTED DATABASE CONCEPTSCONCEPTS

DISTRIBUTED DATABASE DISTRIBUTED DATABASE CONCEPTSCONCEPTS

Distributed Computing SystemDistributed Computing System Consists of a number of processing elements Consists of a number of processing elements

interconnected by a computer network that interconnected by a computer network that cooperate in processing certain taskscooperate in processing certain tasks

Distributed DatabaseDistributed Database Collection of logically interrelated databases over Collection of logically interrelated databases over

a computer networka computer network Distributed DBMSDistributed DBMS

Software system that manages a distributed DBSoftware system that manages a distributed DB

PARALLEL vs. DISTRIBUTED PARALLEL vs. DISTRIBUTED TECHNOLOGYTECHNOLOGY

Parallel system architectures:Parallel system architectures: Shared Memory ArchitectureShared Memory Architecture

Multiple processors that share both secondary Multiple processors that share both secondary disk storage and primary memorydisk storage and primary memory

Tightly coupled architectureTightly coupled architecture Shared everything architectureShared everything architecture

Shared Disk ArchitectureShared Disk Architecture Multiple processors that share secondary disk Multiple processors that share secondary disk

storage but have their own primary memorystorage but have their own primary memory Loosely coupled architectureLoosely coupled architecture

PARALLEL vs. DISTRIBUTED PARALLEL vs. DISTRIBUTED TECHNOLOGY (contd…)TECHNOLOGY (contd…)

Shared Nothing ArchitectureShared Nothing Architecture Multiple processors that have their own secondary Multiple processors that have their own secondary

disk storage and primary memorydisk storage and primary memory Processes communicate over a high speed Processes communicate over a high speed

interconnection networkinterconnection network Symmetry or homogeneity of nodesSymmetry or homogeneity of nodes

Distributed TechnologyDistributed Technology Heterogeneity of hardware and operating system Heterogeneity of hardware and operating system

at every nodeat every node

ADVANTAGE OF DISTRIBUTED ADVANTAGE OF DISTRIBUTED DATABASESDATABASES

Management of distributed data with different levels of Management of distributed data with different levels of transparency (transparency (This refers to the physical placement of data (files, relations, This refers to the physical placement of data (files, relations, etc.) which is not known to the user (distribution transparency).etc.) which is not known to the user (distribution transparency). Distribution or network transparency- Distribution or network transparency- Users do not have to worry about Users do not have to worry about

operational details of the network. operational details of the network. Location transparency (Location transparency (refers to freedom of issuing command from any refers to freedom of issuing command from any

location without affecting its working). location without affecting its working). Naming transparency (Naming transparency (allows access to any names object (files, relations, allows access to any names object (files, relations,

etc.) from any location).etc.) from any location). Replication transparency- Replication transparency- allows to store copies of a data at multiple allows to store copies of a data at multiple

sites. This is done to minimize access time to the required data.sites. This is done to minimize access time to the required data. User is unaware of the existence of multiple copiesUser is unaware of the existence of multiple copies

Fragmentation transparency-Fragmentation transparency-Allows to fragment a relation horizontally Allows to fragment a relation horizontally (create a subset of tuples of a relation) or vertically (create a subset of (create a subset of tuples of a relation) or vertically (create a subset of columns of a relation).columns of a relation).

Horizontal fragmentationHorizontal fragmentation Vertical fragmentationVertical fragmentation

ADVANTAGE OF DISTRIBUTED ADVANTAGE OF DISTRIBUTED DATABASES (contd…)DATABASES (contd…)

Increased Reliability and AvailabilityIncreased Reliability and Availability Reliability – Probability that a system is running at a given timeReliability – Probability that a system is running at a given time Availability – Probability that a system is continuously available Availability – Probability that a system is continuously available

during a time intervalduring a time interval When the data and the DBMS software are distributed Over several sites ,one site

may fail other sites continue to Operate. Only the data and the software that exist at the failed site cannot be accessed. This improves both reliability and availability

Improved PerformanceImproved Performance Data Localization – Data Localization – A Distributed database management system fragments the database

by keeping the data closer to where it is needed. Data Localization reduces the contention for CPU and I/O services and simultaneously reduces access delays involved in wide area networks.

Easier Expansion- Easier Expansion- In a Distributed environment , expansion of the system in terms of adding more data, increasing the database sizes or adding more processors is much more easier.

ADDITIONAL FUNCTIONS OF DDBsADDITIONAL FUNCTIONS OF DDBs

Keeping track of dataKeeping track of data Ability to keep track of data distributionAbility to keep track of data distribution

Distributed query processingDistributed query processing Ability to access remote sites and transmit queriesAbility to access remote sites and transmit queries

Distributed transaction managementDistributed transaction management Ability to devise execution strategies for queries and Ability to devise execution strategies for queries and

transactions that access data from more than one sitetransactions that access data from more than one site Synchronize access to distributed dataSynchronize access to distributed data Maintain integrity of the overall databaseMaintain integrity of the overall database

ADDITIONAL FUNCTIONS OF DDBs ADDITIONAL FUNCTIONS OF DDBs (contd…) (contd…)

Replicated data managementReplicated data management Ability to decide which copy of the replicated data item Ability to decide which copy of the replicated data item

to accessto access Maintain the consistency of copies of a replicated data Maintain the consistency of copies of a replicated data

itemitem

Distributed database recoveryDistributed database recovery Ability to recover from individual site crashes and Ability to recover from individual site crashes and

failure of communication linksfailure of communication links

ADDITIONAL FUNCTIONS OF DDBs ADDITIONAL FUNCTIONS OF DDBs (contd…) (contd…)

SecuritySecurity Proper management of security of the dataProper management of security of the data Proper authorization/access privileges of usersProper authorization/access privileges of users

Distributed directory (catalog) managementDistributed directory (catalog) management Directory contains information about data in the Directory contains information about data in the

databasedatabase Directory may be global for the entire DDB or local for Directory may be global for the entire DDB or local for

each siteeach site

DDBMS vs. CENTRALIZED SYSTEMDDBMS vs. CENTRALIZED SYSTEM

Multiple computers called sites and nodesMultiple computers called sites and nodes Sites connected by some type of Sites connected by some type of

communication network to transmit data and communication network to transmit data and commandscommands

Sites located in physical proximity connected Sites located in physical proximity connected via LANsvia LANs

Sites geographically distributed over large Sites geographically distributed over large distances connected via WANsdistances connected via WANs

Distribution Database DesignDistribution Database Design

DATA FRAGMENTATION, REPLICATION, AND ALLOCATION DATA FRAGMENTATION, REPLICATION, AND ALLOCATION TECHNIQUES FOR DISTRIBUTED DATABASE DESIGNTECHNIQUES FOR DISTRIBUTED DATABASE DESIGN

Fragmentation: Breaking up the database into logical units called Fragmentation: Breaking up the database into logical units called fragments fragments and assigned for storage at various sites.and assigned for storage at various sites.

Data replicationData replication:: The process of storing fragments in more than one site The process of storing fragments in more than one site

Data AllocationData Allocation:: The process of assigning a particular fragment to a The process of assigning a particular fragment to a particular site in a distributed system.particular site in a distributed system.

The information concerning the data fragmentation, allocation and The information concerning the data fragmentation, allocation and replication is stored in a global directory.replication is stored in a global directory.

DATA FRAGMENTATIONDATA FRAGMENTATION

Breaking up the database into logical units called Breaking up the database into logical units called fragments fragments and assigned for storage at various and assigned for storage at various sites.sites.

Types of FragmentationTypes of Fragmentation Horizontal FragmentationHorizontal Fragmentation Vertical FragmentationVertical Fragmentation Mixed (Hybrid) FragmentationMixed (Hybrid) Fragmentation

Fragmentation SchemaFragmentation Schema Definition of a set of fragments that include all attributes Definition of a set of fragments that include all attributes

and tuples in the databaseand tuples in the database The whole database can be reconstructed from the The whole database can be reconstructed from the

fragmentsfragments

Horizontal fragmentation:

It is a horizontal subset of a relation which contain those tuples which satisfy selection conditions.

Consider the Employee relation with selection condition (DNO = 5). All tuples satisfy this condition will create a subset which will be a horizontal fragment of Employee relation.

Horizontal fragmentation divides a relation horizontally by grouping rows to create subsets of tuples where each subset has a certain logical meaning.

HORIZONTAL FRAGMENTATIONHORIZONTAL FRAGMENTATION

Horizontal fragment is a subset of tuples in that Horizontal fragment is a subset of tuples in that relationrelation

Tuples are specified by a condition on one or Tuples are specified by a condition on one or more attributes of the relationmore attributes of the relation

Divides a relation horizontally by grouping rows Divides a relation horizontally by grouping rows to create subset of tuplesto create subset of tuples

Derived Horizontal Fragmentation – partitioning Derived Horizontal Fragmentation – partitioning a primary relation into secondary relations a primary relation into secondary relations related to primary through a foreign keyrelated to primary through a foreign key

Vertical fragmentationVertical fragmentation

It is a subset of a relation which is created by a subset It is a subset of a relation which is created by a subset of columns. Thus a vertical fragment of a relation will of columns. Thus a vertical fragment of a relation will contain values of selected columns. There is no contain values of selected columns. There is no selection condition used in vertical fragmentation.selection condition used in vertical fragmentation.

Consider the Employee relation. A vertical fragment Consider the Employee relation. A vertical fragment can be created by keeping the values of Name, can be created by keeping the values of Name, Bdate, Sex, and Address.Bdate, Sex, and Address.

Because there is no condition for creating a vertical Because there is no condition for creating a vertical fragment, each fragment must include the primary key fragment, each fragment must include the primary key attribute of the parent relation Employee. In this way attribute of the parent relation Employee. In this way all vertical fragments of a relation are connected.all vertical fragments of a relation are connected.

VERTICAL FRAGMENTATIONVERTICAL FRAGMENTATION

A vertical fragment keeps only certain A vertical fragment keeps only certain attributes of that relationattributes of that relation

Divides a relation vertically by columnsDivides a relation vertically by columns It is necessary to include primary key or It is necessary to include primary key or

some candidate key attribute some candidate key attribute The full relation can be reconstructed from The full relation can be reconstructed from

the fragmentsthe fragments

MIXED FRAGMENTATIONMIXED FRAGMENTATION

Intermixing the two types of fragmentationIntermixing the two types of fragmentation Original relation can be reconstructed by Original relation can be reconstructed by

applying UNION and OUTER JOIN applying UNION and OUTER JOIN operations in the appropriate orderoperations in the appropriate order

DATA FRAGMENTATIONDATA FRAGMENTATION

Complete Horizontal FragmentationComplete Horizontal Fragmentation Set of horizontal fragments that include all the tuples in Set of horizontal fragments that include all the tuples in

a relationa relation To reconstruct a relation, apply the UNION operation to To reconstruct a relation, apply the UNION operation to

the horizontal fragmentsthe horizontal fragments

Complete Vertical FragmentationComplete Vertical Fragmentation Set of vertical fragments whose projection lists include Set of vertical fragments whose projection lists include

all the attributes but share only the primary key attributeall the attributes but share only the primary key attribute To reconstruct a relation, apply the OUTER UNION To reconstruct a relation, apply the OUTER UNION

operation to the vertical fragmentsoperation to the vertical fragments

DATA REPLICATIONDATA REPLICATION

Process of storing data in more than one siteProcess of storing data in more than one site Replication SchemaReplication Schema

Description of the replication of fragmentsDescription of the replication of fragments Fully replicated distributed database Fully replicated distributed database

Replicating the whole database at every siteReplicating the whole database at every site Improves availabilityImproves availability Improves performance of retrievalImproves performance of retrieval Can slow down update operations drasticallyCan slow down update operations drastically Expensive concurrency control and recovery Expensive concurrency control and recovery

techniquestechniques

DATA REPLICATION (contd…)DATA REPLICATION (contd…)

No replication distributed databaseNo replication distributed database Each fragment is stored exactly at one siteEach fragment is stored exactly at one site All fragments must be disjoint except primary keysAll fragments must be disjoint except primary keys Also called Non-redundant allocationAlso called Non-redundant allocation

Partial ReplicationPartial Replication Some fragments may be replicated while others Some fragments may be replicated while others

may notmay not Number of copies range from one to total number of Number of copies range from one to total number of

sites in a distributed systemsites in a distributed system

DATA ALLOCATIONDATA ALLOCATION

Each fragment or each copy of the fragment must Each fragment or each copy of the fragment must be assigned to a particular site be assigned to a particular site

Also called Data DistributionAlso called Data Distribution Choice of sites and degree of replication depend onChoice of sites and degree of replication depend on

Performance of the systemPerformance of the system Availability goals of the systemAvailability goals of the system Types of transactionsTypes of transactions Frequencies of transactions submitted at any siteFrequencies of transactions submitted at any site

Allocation SchemaAllocation Schema Describes the allocation of fragments to sites of the DDBsDescribes the allocation of fragments to sites of the DDBs

TYPES OF DISTRIBUTED TYPES OF DISTRIBUTED DATABASE SYSTEMDATABASE SYSTEM

HomogeneousHomogeneous

All sites of the database All sites of the database system have identical system have identical setup, i.e., same database setup, i.e., same database system software. The system software. The underlying operating underlying operating system may be different. system may be different. For example, all sites run For example, all sites run Oracle or DB2, or Sybase Oracle or DB2, or Sybase or some other database or some other database system. The underlying system. The underlying operating systems can be operating systems can be a mixture of Linux, a mixture of Linux, Window, Unix, etc. The Window, Unix, etc. The clients thus have to use clients thus have to use identical client software.identical client software.

Communicationsneteork

Site 5Site 1

Site 2Site 3Oracle Oracle

OracleOracle

Site 4

Oracle

LinuxLinux

Window

WindowUnix

HeterogeneousHeterogeneous

Federated: Each site Federated: Each site may run different may run different database system but database system but the data access is the data access is managed through a managed through a single conceptual single conceptual schema. This implies schema. This implies that the degree of that the degree of local autonomy is local autonomy is minimum. Each site minimum. Each site must adhere to a must adhere to a centralized access centralized access policy. There may be policy. There may be a global schema.a global schema.

Communicationsnetwork

Site 5Site 1

Site 2Site 3

NetworkDBMS

Relational

Site 4

ObjectOriented

LinuxLinux

Unix

Hierarchical

ObjectOriented

RelationalUnix

Window

Types of Distributed Database Types of Distributed Database SystemsSystems

Factors that make DDS differentFactors that make DDS different Degree of homogeneityDegree of homogeneity

If all the servers use identical software and If all the servers use identical software and all the users use identical software.all the users use identical software.

Degree of local autonomyDegree of local autonomy

If there is no provision for the local site to If there is no provision for the local site to function as a stand-alone DBMS, then the function as a stand-alone DBMS, then the system as no local autonomy.system as no local autonomy.

cont…cont…

Types of Distributed Database Types of Distributed Database SystemsSystems

Centralized Database SystemCentralized Database System• No local autonomy exists.No local autonomy exists.

Federated Distributed Database SystemFederated Distributed Database System• Each server is an independent and Each server is an independent and

autonomous centralized DBMS that has its autonomous centralized DBMS that has its own local users, local transaction, and DBA own local users, local transaction, and DBA and hence has a very high degree of local and hence has a very high degree of local autonomy.autonomy.

• Used when there is some global view of Used when there is some global view of databases shared by applications.databases shared by applications.

Federated Database Management Federated Database Management Systems IssuesSystems Issues

Differences in data modelsDifferences in data models• Deal with different data models via a single global Deal with different data models via a single global

schema or to process them in a single language is schema or to process them in a single language is challenging. challenging.

Differences in constraintsDifferences in constraints• Constraint facilities for specification and Constraint facilities for specification and

implementation vary from system to system which implementation vary from system to system which should be dealt using global schemashould be dealt using global schema

Differences in languagesDifferences in languages• Same data model but different languages could be Same data model but different languages could be

used and their version may vary.used and their version may vary.

Semantic HeterogeneitySemantic Heterogeneity

Occurs when there are differences in the meaning, Occurs when there are differences in the meaning, interpretation, and intented use or related data.interpretation, and intented use or related data. Design autonomyDesign autonomy

Refers to their freedom of choosing design patterns.Refers to their freedom of choosing design patterns. Communication autonomyCommunication autonomy

Refers to the ability to decide whether to Refers to the ability to decide whether to communicate with another component DBS.communicate with another component DBS.

Association AutonomyAssociation AutonomyAbility to decide whether and how much to share its Ability to decide whether and how much to share its functionality and resources with the other component functionality and resources with the other component DBs.DBs.

Five-level schema architecture to Five-level schema architecture to support global applications in the FDBSsupport global applications in the FDBS

Component

Local schema

Component Schema

Export schemaExport schema

Federated schema

External Schema External Schema

cont..cont..

Five-level schema architecture to Five-level schema architecture to support global applications in the FDBSsupport global applications in the FDBS Local schema: Is the conceptual schema of the Local schema: Is the conceptual schema of the

component database.component database. Component schema: Derived by translating the local Component schema: Derived by translating the local

schema into canonical data model or common data schema into canonical data model or common data model for the FDBS.model for the FDBS.

Export model: Represents the subset of a component Export model: Represents the subset of a component schema that is available to the FDBS.schema that is available to the FDBS.

Federated schema: Is the global schema or view, which Federated schema: Is the global schema or view, which is the result of integrating all the shareable export is the result of integrating all the shareable export schemas.schemas.

External schema: Schema for a user group or an External schema: Schema for a user group or an application, as in the three-level schema architecture. application, as in the three-level schema architecture.

QUERY PROCESSING IN QUERY PROCESSING IN DISTRIBUTED DATABASESDISTRIBUTED DATABASES

Query Processing in Distributed Query Processing in Distributed DatabasesDatabases

Cost of transferring data (files and results) over the network.Cost of transferring data (files and results) over the network.

FnameFname MinitMinit LnameLname SSNSSN BdateBdate AddressAddress SexSex SalarySalary SuperssnSuperssn DnoDno

This cost is usually high so some optimization is necessary.Example relations: Employee at site 1 and Department at Site 2

DnameDname DnumberDnumber MgrssnMgrssn MgrstartdateMgrstartdate

Employee at site 1. 10, 000 rows. Row size = 100 bytes. Table size = 106 bytes.

Department at Site 2. 100 rows. Row size = 35 bytes. Table size = 3500 bytes.

Q: For each employee, retrieve employee name and department nameWhere the employee works.

Q: Fname,Lname,Dname (Employee Dno = Dnumber Department)

cont…cont…

Query Processing In Distributed Query Processing In Distributed DatabasesDatabases

Factor which effects query processingFactor which effects query processing• The cost of transferring data over the network.The cost of transferring data over the network.

Goal of query processingGoal of query processing• The goal of reducing the amount of data transfer in choosing a The goal of reducing the amount of data transfer in choosing a

distributed query execution strategy.distributed query execution strategy.

Eg : At site 1:Eg : At site 1:EmployeeEmployee(Fname,Lname,SSN,Address,Superssn,Dno)(Fname,Lname,SSN,Address,Superssn,Dno)10,000 records each record is 100 bytes long10,000 records each record is 100 bytes longSSN field is 9 bytes long ,Fname field is 15bytesSSN field is 9 bytes long ,Fname field is 15bytesDno field is 4 bytes long, Lname field is 15 bytes longDno field is 4 bytes long, Lname field is 15 bytes long

Site 2:Site 2:DepartmentDepartment((Dname,Dnumber,MGRSSN,MGRSTARTDATE)Dname,Dnumber,MGRSSN,MGRSTARTDATE)100 records100 recordsEach record is 35 bytes longEach record is 35 bytes longDnumber field is 4 bytes long,Dname field is 10 bytesDnumber field is 4 bytes long,Dname field is 10 bytesMGRSSN field is 9 bytes longMGRSSN field is 9 bytes longSuppose you ask a query Suppose you ask a query Q: For each employee, retrieve employee name and

department name Where the employee works.Q: Fname,Lname,Dname (Employee Dno = Dnumber Department)

cont…cont…


cont…cont…


The result of this query will select 10,000 record assuming that The result of this query will select 10,000 record assuming that every employee is related to a department.every employee is related to a department.Each record in the query result will be of 40 bytes long.Each record in the query result will be of 40 bytes long.This query is submitted at site 3 (result site)This query is submitted at site 3 (result site)There are three different strategies for executing this distributed There are three different strategies for executing this distributed

queryquery1) Transfer both the employee and the department relations to the 1) Transfer both the employee and the department relations to the

result site and form a join at site 3.In this case a total of result site and form a join at site 3.In this case a total of 1,000,000+3500=1,003,500 bytes must be transferred .1,000,000+3500=1,003,500 bytes must be transferred .

2) Transfer the Employee to site 2, execute the join at site 2, and 2) Transfer the Employee to site 2, execute the join at site 2, and send the result to site 3.The size of the query is send the result to site 3.The size of the query is 40*10,000=400,000 bytes, so 400,000+1,000,000=1,400,000 40*10,000=400,000 bytes, so 400,000+1,000,000=1,400,000 bytes must be transferred.bytes must be transferred.

3) Transfer the Department relation to site 3) Transfer the Department relation to site 1,execute the join at site 1 and send the result to 1,execute the join at site 1 and send the result to site 3.un this case 400,000+3500=403,500 bytes site 3.un this case 400,000+3500=403,500 bytes must be transferred.must be transferred.

To minimize the amount of data transfer we To minimize the amount of data transfer we should use the strategy 3.should use the strategy 3.

So we should select the strategy for which the So we should select the strategy for which the data transfer is minimum.data transfer is minimum.

cont…cont…


Distributed Query Processing Using Distributed Query Processing Using SemijoinSemijoin Goal: To reduce the number of tuples in a relation before transferring it to Goal: To reduce the number of tuples in a relation before transferring it to

another site.another site.

Eg: For Q (previous query)Eg: For Q (previous query)1) Project the join attributes of Department at site 2, and transfer them to site 1) Project the join attributes of Department at site 2, and transfer them to site 11F= Pro F= Pro Dnumber Dnumber (Department) whose size is 4* 100=400 bytes.(Department) whose size is 4* 100=400 bytes.

2) Join the transferred file with the Employee2) Join the transferred file with the Employeerelation at site 1, and transfer the required attributes from resulting file to site relation at site 1, and transfer the required attributes from resulting file to site 2. For Q, we transfer 2. For Q, we transfer R= Pro R= Pro Dno,Fname,Lname Dno,Fname,Lname (F join (F join Dnumber=DnoDnumber=Dno Employee) whose Employee) whose

size is 39*100=3900 bytes.size is 39*100=3900 bytes. 3) Execute the query by joining the transferred file R with Department , and 3) Execute the query by joining the transferred file R with Department , and

present the result at site 2.present the result at site 2.

Consider the query Q’: For each department, retrieve the

department name and the name of the department manager

Relational Algebra expression: Fname,Lname,Dname (Employee Mgrssn = SSN Department)

Query Processing in Distributed DatabasesQuery Processing in Distributed Databases

The result of this query will have 100 tuples, assuming that every department has a manager, the execution strategies are:

Strategies:1. Transfer Employee and Department to the result site and perorm

the join at site 3. Total bytes transferred = 1,000,000 + 3500 = 1,003,500 bytes.

2. Transfer Employee to site 2, execute join at site 2 and send the result to site 3. Query result size = 40 * 100 = 4000 bytes. Total transfer size = 4000 + 1,000,000 = 1,004,000 bytes.

3. Transfer Department relation to site 1, execute join at site 1 and send the result to site 3. Total transfer size = 4000 + 3500 = 7500 bytes.

Query Processing in Distributed DatabasesQuery Processing in Distributed Databases

Preferred strategy: Chose strategy 3.

Now suppose the result site is 2. Possible strategies:

Possible strategies :1. Transfer Employee relation to site 2, execute the query and

present the result to the user at site 2. Total transfer size = 1,000,000 bytes for both queries Q and Q’.

2. Transfer Department relation to site 1, execute join at site 1 and send the result back to site 2. Total transfer size for Q = 400,000 + 3500 = 403,500 bytes and for Q’ = 4000 + 3500 = 7500 bytes.

cont..cont..

Distributed Query Distributed Query Processing Using SemijoinProcessing Using Semijoin A semi join operation R Semijoin A semi join operation R Semijoin A=B A=B S where A S where A

and B are domain-compatible attributes of R and and B are domain-compatible attributes of R and S, respectively, and produces the same result as S, respectively, and produces the same result as the relational algebra expression Prothe relational algebra expression ProR R (Rjoin (Rjoin A=BA=B S).S).In a distributed environment where R and S In a distributed environment where R and S reside at different sites, the semijoin is typically reside at different sites, the semijoin is typically implemented by first transferring F=Pro implemented by first transferring F=Pro B B (S) to (S) to the site where R resides and then joining F with the site where R resides and then joining F with R.R.Note that the semijoin operation is not Note that the semijoin operation is not commutative, that iscommutative, that isR semijoin S not equal to S semijoin RR semijoin S not equal to S semijoin R..

Semijoin Query Processing in Distributed Semijoin Query Processing in Distributed DatabasesDatabases

Semijoin: Objective is to reduce the number of tuples in a relation before transferring it to another site.

Example execution of Q or Q’:1. Project the join attributes of Department at site 2, and transfer

them to site 1. For Q, 4 * 100 = 400 bytes are transferred and for Q’, 9 * 100 = 900 bytes are transferred.

2. Join the transferred file with the Employee relation at site 1, and transfer the required attributes from the resulting file to site 2. For Q, 34 * 10,000 = 340,000 bytes are transferred and for Q’, 39 * 100 = 3900 bytes are transferred.

3. Execute the query by joining the transferred file with Department and present the result to the user at site 2.

Query and Update DecompositionQuery and Update Decomposition

The user must also maintain consistency of replicated data The user must also maintain consistency of replicated data items when updating a DDBMS with no replication items when updating a DDBMS with no replication transparency.transparency.

The DDBMS supports full distribution, fragmentation and The DDBMS supports full distribution, fragmentation and replication transparency and allows the user to specify a replication transparency and allows the user to specify a query or update request on the schema as though the DBMS query or update request on the schema as though the DBMS were centralized.were centralized.

For queries the query decomposition module must break up For queries the query decomposition module must break up or decompose a query into subqueries that can be executed or decompose a query into subqueries that can be executed at the individual sites and combining the results of the at the individual sites and combining the results of the subqueries to form the query result.subqueries to form the query result.

To determine which replicas include the data To determine which replicas include the data items referenced in a query, the DDBMS refers items referenced in a query, the DDBMS refers to the fragmentation, replication, and distribution to the fragmentation, replication, and distribution information stored in the DDBMS catalog.information stored in the DDBMS catalog.

For vertical fragmentation the attribute list for For vertical fragmentation the attribute list for each fragment is kept in catalog.each fragment is kept in catalog.

For horizontal fragmentation, a condition, some For horizontal fragmentation, a condition, some times called a guard, is kept for each fragment.times called a guard, is kept for each fragment.

Guard is a selection condition which specifies Guard is a selection condition which specifies which tuples exist in the fragment.which tuples exist in the fragment.

CONT…CONT…


cont…cont…

Query and Update DecompositionQuery and Update DecompositionEg: A user requests to insert a new tupleEg: A user requests to insert a new tuple<‘Alex’, ‘B’, ,’Coleman’, ‘348889793’,’22-apr-64’, ‘3306 <‘Alex’, ‘B’, ,’Coleman’, ‘348889793’,’22-apr-64’, ‘3306 sandstone, houston, TX’, M,33000,’234412414’,4> would be sandstone, houston, TX’, M,33000,’234412414’,4> would be decomposed into two insert requests.decomposed into two insert requests.The first insert inserts the preceding tuple in the Employee The first insert inserts the preceding tuple in the Employee fragment at site1, and the second inserts the projected tuple fragment at site1, and the second inserts the projected tuple <‘Alex’, ’B’, ‘Coleman’, ‘348889793’, 33000, ’234412414’, 4> in <‘Alex’, ’B’, ‘Coleman’, ‘348889793’, 33000, ’234412414’, 4> in the Empd4 fragment at site 3 for easy retrieval.the Empd4 fragment at site 3 for easy retrieval.For query decomposition ,the DDBMS can determine which For query decomposition ,the DDBMS can determine which fragments may contain the required tuples by comparing the fragments may contain the required tuples by comparing the query condition with the guard conditions.query condition with the guard conditions.

Eg: Retrieve the names and hours per week for each employee who works on Eg: Retrieve the names and hours per week for each employee who works on some project controlled by department 5.some project controlled by department 5.

SQL statement will beSQL statement will beSelect Fname, Lname, HoursSelect Fname, Lname, HoursFrom Employee , Project, Works_OnFrom Employee , Project, Works_OnWhere Dnum=5 and Pnumber = Pno and Where Dnum=5 and Pnumber = Pno and ESSN=SSN.ESSN=SSN.Suppose that the query is submitted at site 2,where the query result is also Suppose that the query is submitted at site 2,where the query result is also needed. The DDBMS can determine from guard condition on Projs5 and needed. The DDBMS can determine from guard condition on Projs5 and Works_On5 that the tuple satisfy the condition (Dnum=5 and Pnumber=Pno)Works_On5 that the tuple satisfy the condition (Dnum=5 and Pnumber=Pno)where Projs5 iswhere Projs5 isattribute list: *(all attributes Pname, Pnumber,Plocation,Dnum)attribute list: *(all attributes Pname, Pnumber,Plocation,Dnum)guard condition: Dnum=5guard condition: Dnum=5

cont…cont…


Works_On5Works_On5 Attribute list:*(all attributes ESSN, PNO, HOURS)Attribute list:*(all attributes ESSN, PNO, HOURS) Guard condition: ESSN IN (Proj Guard condition: ESSN IN (Proj SSNSSN (EMPD5)) OR (EMPD5)) OR

PNO IN (Proj PNO IN (Proj PnumberPnumber(Projs5)(Projs5) Hence it may decompose the query into the Hence it may decompose the query into the

following relational algebra subqueries:following relational algebra subqueries:T1<- Pro T1<- Pro ESSNESSN (Projs5 Join (Projs5 Join Pnumber=PnoPnumber=Pno Works_On5) Works_On5)T2<-Pro T2<-Pro ESSN,Fname,LnameESSN,Fname,Lname(T1 Join (T1 Join ESSN=SSN ESSN=SSN Employee)Employee)Result<- Pro Result<- Pro Fname, Lname, HoursFname, Lname, Hours (T2 * Work_On5) (T2 * Work_On5)

This decomposition can be used to execute the This decomposition can be used to execute the query by using a semijoin strategy.query by using a semijoin strategy.

cont…cont…


The DDBMS knows from the guard condition that Projs5 contains The DDBMS knows from the guard condition that Projs5 contains exactly those tuples satisfy (Dnum=5) and works on contains all exactly those tuples satisfy (Dnum=5) and works on contains all the tuples to be joined with Projs5,hence the subquery T1 can be the tuples to be joined with Projs5,hence the subquery T1 can be executed at site2, and the projected columns ESSN can be sent executed at site2, and the projected columns ESSN can be sent to site 1.to site 1.Subquery T2 can then execute at site 1, and the result is sent Subquery T2 can then execute at site 1, and the result is sent back to site 2,where the final query result is calculated and back to site 2,where the final query result is calculated and displayed to the user. displayed to the user.

An alternative strategy would be to send the query Q itself to site An alternative strategy would be to send the query Q itself to site 1, which includes all the database tuples, where it would be 1, which includes all the database tuples, where it would be executed locally and from which result would be sent back to site executed locally and from which result would be sent back to site 2. 2.

The query optimizer would estimate the costs of both strategies The query optimizer would estimate the costs of both strategies and would choose the one with the lower cost estimate.and would choose the one with the lower cost estimate.

cont…cont…


OVERVIEW OF OVERVIEW OF CONCURRENCY CONTROLCONCURRENCY CONTROL

Overview Of Concurrency Control & Overview Of Concurrency Control & Recovery in Distributed DatabasesRecovery in Distributed Databases

Distributed Databases encounter a number of concurrency control and recovery problems which are not present in centralized databases. Some of them are listed below.

These techniques are needed to deal with following problems ->These techniques are needed to deal with following problems -> Dealing with multiple copies of data itemsDealing with multiple copies of data items :- :- The concurrency control must

maintain global consistency. Likewise the recovery mechanism must recover all copies and maintain consistency after recovery.

Failure of individual sitesFailure of individual sites :- :- Database availability must not be affected due to the failure of one or two sites and the recovery scheme must recover them before they are available for use.

Failure of communication linksFailure of communication links :- :- This failure may create network partition which would affect database availability even though all database sites may be running.

Distributed commitDistributed commit :- :- A transaction may be fragmented and they may be executed by a number of sites. This require a two or three-phase commit approach for transaction commit.

Distributed deadlockDistributed deadlock :- :- Since transactions are processed at multiple sites, two or more sites may get involved in deadlock. This must be resolved in a distributed manner.

..

Overview Of Concurrency Control & Recovery in Distributed Databases cont…Overview Of Concurrency Control & Recovery in Distributed Databases cont…

Concurrency Control Based on Distributed Concurrency Control Based on Distributed Copy of a Data ItemCopy of a Data Item

Terminology :- Terminology :- Distinguished CopyDistinguished Copy : particular copy of each data : particular copy of each data

item, and the lock for this data item is associated item, and the lock for this data item is associated with it.with it.

Techniques :-Techniques :-Primary SitePrimary Site : The single Primary site is : The single Primary site is

designated as Coordinator site for all dbase items. designated as Coordinator site for all dbase items. Hence, all Locking & Unlocking request are sent Hence, all Locking & Unlocking request are sent here.here.

Concurrency Control and RecoveryConcurrency Control and RecoveryDistributed Concurrency control based on a distributed copy of a data item

Primary site technique: A single site is designated as a primary site which serves as a coordinator for transaction management.

Communications neteork

Site 5Site 1

Site 2

Site 4

Site 3

Primary site

Concurrency Control and RecoveryConcurrency Control and RecoveryTransaction management: Concurrency control and commit are managed by this site. In two phase locking, this site manages locking and releasing data items. If all transactions follow two-phase policy at all sites, then serializability is guaranteed.

Advantages: An extension to the centralized two phase locking so implementation and management is simple. Data items are locked only at one site but they can be accessed at any site.

Disadvantages: All transaction management activities go to primary site which is likely to overload the site. If the primary site fails, the entire system is inaccessible.

To aid recovery a backup site is designated which behaves as a shadow of primary site. In case of primary site failure, backup site can act as primary site.


Concurrency Control Based on Concurrency Control Based on Distributed Copy of a Data ItemDistributed Copy of a Data Item

Techniques Techniques (cont..)(cont..):-:-Primary Site with Backup SitePrimary Site with Backup Site : All locking : All locking

information is maintained at both sites, in case, information is maintained at both sites, in case, Primary site fails the Backup site takes over Primary site fails the Backup site takes over Primary site.Primary site.

Primary CopyPrimary Copy : The distinguished copies of : The distinguished copies of different data items stored at different sites.different data items stored at different sites.

Choosing New Coordinator Site in Case of Choosing New Coordinator Site in Case of FailureFailure: In case if coordinator fails, the sites which : In case if coordinator fails, the sites which are running chooses new Coordinator are running chooses new Coordinator

Concurrency Control and RecoveryConcurrency Control and Recovery

Primary Copy Technique: This method attempts to distribute the load of lock coordination among various sites by having the distinguished copies of different data items stored at different sites.

Advantages: Since primary copies are distributed at various sites, a single site is not overloaded with locking and unlocking requests.

Disadvantages: Identification of a primary copy is complex. A distributed directory must be maintained, possibly at all sites.


Recovery from a coordinator failure

In both approaches a coordinator site or copy may become unavailable. This will require the selection of a new coordinator.

Primary site approach with no backup site: Aborts and restarts all active transactions at all sites. Elects a new coordinator and initiates transaction processing.

Primary site approach with backup site: Suspends all active transactions, designates the backup site as the primary site and identifies a new back up site. Primary site receives all transaction management information to resume processing.

Primary and backup sites fail or no backup site: Use election process to select a new coordinator site.


Concurrency Control Based on VotingConcurrency Control Based on Voting

Voting MethodVoting MethodThere is no distinguished copyThere is no distinguished copyAll sites includes a copy of data item, and also All sites includes a copy of data item, and also

each maintains its own lock.each maintains its own lock.When a transaction request lock ,then that request When a transaction request lock ,then that request

is sent to all sites, and it gets granted, when it is is sent to all sites, and it gets granted, when it is locked by majority of copies. And it informs all the locked by majority of copies. And it informs all the copies that Lock has been granted . copies that Lock has been granted .


Concurrency control based on voting: There is no primary copy of coordinator.

Send lock request to sites that have data item. If majority of sites grant lock then the requesting transaction gets

the data item. Locking information (grant or denied) is sent to all these sites. To avoid unacceptably long wait, a time-out period is defined. If

the requesting transaction does not get any vote information then the transaction is aborted.


Distributed RecoveryDistributed Recovery

Case ICase I :When X sends message to Y , expects, :When X sends message to Y , expects, response from Y, but Y fails.response from Y, but Y fails. Possibility :-Possibility :-

Message deliver fails because of Communication failure.Message deliver fails because of Communication failure. Site Y is down.Site Y is down. Response deliver fails.Response deliver fails.

Case IICase II : When Transaction is updating at several : When Transaction is updating at several sites, it cannot commit until it is sure that effect sites, it cannot commit until it is sure that effect of transaction is on every site.of transaction is on every site.

OVERVIEW OF 3-TIER CLIENT OVERVIEW OF 3-TIER CLIENT SERVER ARCHITECTURESERVER ARCHITECTURE

Overview of 3-Tier . Overview of 3-Tier . Client-Server ArchitectureClient-Server Architecture3-Tier Architecture3-Tier Architecture Presentation LayerPresentation Layer :- This provides the user interface :- This provides the user interface

and interacts with the user. The programs at this layer and interacts with the user. The programs at this layer present Web interfaces or forms to the client in order to present Web interfaces or forms to the client in order to interface with the application.interface with the application.

Application LayerApplication Layer :- This layer programs the application :- This layer programs the application logic. The queries can be formulated based on user input logic. The queries can be formulated based on user input from the client or query results can be formatted and from the client or query results can be formatted and sent to client for presentation.sent to client for presentation.

Database ServerDatabase Server :- This layer handles the query and :- This layer handles the query and update requests from the application layer, process the update requests from the application layer, process the requests, and send the results. Usually SQL is used to requests, and send the results. Usually SQL is used to access the database. access the database.

3-Tier Client-Server Database 3-Tier Client-Server Database ArchitectureArchitecture

The interaction between the three layers during the processing of an SQL The interaction between the three layers during the processing of an SQL query.query.

• The presentation layer first takes an user input and displays the needed The presentation layer first takes an user input and displays the needed information to the user.information to the user.

• The application server formulates a user query based on input from the The application server formulates a user query based on input from the client layer and decomposes it into a number of independent site queries. client layer and decomposes it into a number of independent site queries. Each site query is sent to appropriate database server site.Each site query is sent to appropriate database server site.

• Each database server processes the local query and sends the results to Each database server processes the local query and sends the results to the application server site.the application server site.

• The application server combines the results of the sub queries to produce The application server combines the results of the sub queries to produce the result of the originally required query, formats it into HTML or some the result of the originally required query, formats it into HTML or some other form accepted by the client, and sends it to the client site for display. other form accepted by the client, and sends it to the client site for display.

Distributed Database .Distributed Database .In ORACLE In ORACLE In Client-Server Arch., Oracle dbase is In Client-Server Arch., Oracle dbase is

divided into 2 partsdivided into 2 partsFront-end as ClientFront-end as Client : It interacts with user. Its : It interacts with user. Its

main purpose is to handle requesting, processing, main purpose is to handle requesting, processing, and presentation of data managed by server. and presentation of data managed by server.

Back-end as Server Back-end as Server : It runs Oracle and handles : It runs Oracle and handles the functions related to concurrent shared access. the functions related to concurrent shared access. And also process Client’s SQL & PL/SQL queries.And also process Client’s SQL & PL/SQL queries.

Oracle Client-Server Application provides Oracle Client-Server Application provides location Transparency, making data location Transparency, making data transparent to users.transparent to users.

Oracle dbases in a distributed dbase systems use Oracle’s Oracle dbases in a distributed dbase systems use Oracle’s networking software Net8 for inter-database communication.networking software Net8 for inter-database communication.

Oracles supports database links that define a one-way Oracles supports database links that define a one-way communication path from one Oracle database to another.communication path from one Oracle database to another.

For eg :For eg : CREATE DATABASE LINK sales.us.americas;CREATE DATABASE LINK sales.us.americas; establishes a connection to the “sales” dbase, under n/w establishes a connection to the “sales” dbase, under n/w

domain “us” that comes under domain “americas”.domain “us” that comes under domain “americas”. Data in a Oracle DDBS can be replicated.Data in a Oracle DDBS can be replicated.

Basic replicationBasic replication : Replicas of tables are managed for read-only : Replicas of tables are managed for read-only access.access.

Advanced replicationAdvanced replication : Allows to update table replica’s : Allows to update table replica’s throughout a replicated DDBS. Thus, data can be read or throughout a replicated DDBS. Thus, data can be read or updated a any site.updated a any site.

Distributed Database Distributed Database (cont..) (cont..) In ORACLEIn ORACLE

Distributed Database Distributed Database (cont..) (cont..) In ORACLEIn ORACLE

Heterogeneous DBASE in Oracle : Heterogeneous DBASE in Oracle : Here at least one dbase is a non-Oracle System.Here at least one dbase is a non-Oracle System. Oracle Open GatewayOracle Open Gateway provides access to a non- provides access to a non-

Oracle System.Oracle System. The features are :-The features are :-

Distributed TransactionsDistributed Transactions Transparent SQL accessTransparent SQL access Pass-through SQL & stored procedurePass-through SQL & stored procedure Global Query optimizationGlobal Query optimization Procedure accessProcedure access

Distributed Databases in OracleDistributed Databases in Oracle

• In the client-server architecture, the oracle database system is divided into two partsIn the client-server architecture, the oracle database system is divided into two parts1) A front end client portion which 1) A front end client portion which

interacts with the user.interacts with the user.2) A back –end server portion runs 2) A back –end server portion runs

oracle and handles the functions oracle and handles the functions related to concurrent shared access.related to concurrent shared access.

• Oracle client-server applications provide location transparency by making location of Oracle client-server applications provide location transparency by making location of data transparent to users, several features like views, procedures are used to achieve data transparent to users, several features like views, procedures are used to achieve this.this.

• Oracle uses a two phase commit protocol to deal with concurrent distributed Oracle uses a two phase commit protocol to deal with concurrent distributed transactions.transactions.

a) The COMMIT statement triggers the two phase commit mechanism.a) The COMMIT statement triggers the two phase commit mechanism.

b) The RECO (recoverer) background process automatically resolves the b) The RECO (recoverer) background process automatically resolves the outcome of those distributed transactions in which the commit was outcome of those distributed transactions in which the commit was

interrupted. interrupted.

Distributed Databases in OracleDistributed Databases in Oracle

• All oracle database in Distributed Database system uses Oracle’s Networking Software Net8 for All oracle database in Distributed Database system uses Oracle’s Networking Software Net8 for interdatabase communication.interdatabase communication.

• Oracle supports Database links that define a one-way communication path from one Oracle Oracle supports Database links that define a one-way communication path from one Oracle database to another. For example,database to another. For example,

CREATECREATE DATABASE LINKDATABASE LINK sales.us.americas; sales.us.americas;

• Data in Oracle DDBS can be replicated using snapshots or replicated master tables. This can be Data in Oracle DDBS can be replicated using snapshots or replicated master tables. This can be provided at the following two levels.provided at the following two levels.

1) Basic replication: Replicas of tables are managed for read-only access. For updates data must 1) Basic replication: Replicas of tables are managed for read-only access. For updates data must be be

accessed at a single primary site. accessed at a single primary site. 2)Advanced replication: This allows application to update table replicas throughout a 2)Advanced replication: This allows application to update table replicas throughout a

replicated DDBS. Data can be read and updated at any site. This requires additional Software replicated DDBS. Data can be read and updated at any site. This requires additional Software

called advanced replication option called advanced replication option

• A snapshot generates replicas by means of a query called the A snapshot generates replicas by means of a query called the snapshot defining query, snapshot defining query, an an example is shown below.example is shown below.

CREATE SNAPSHOTCREATE SNAPSHOT sales.orders sales.orders ASASSELECT * FROM SELECT * FROM [email protected]; [email protected];

..

A & QA & Q

Documents

DISTRIBUTED DATABASES AND CLIENT-SERVER ARCHITECHURES