23
DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

Embed Size (px)

Citation preview

Page 1: DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

DDBMS ArchitectureDDBMS Architecture

Session-8Data Management for Decision

Support

Page 2: DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

DDBMS ArchitectureDDBMS Architecture

DDBMS and Distribution Transparency

Architecture Alternatives DDBMS Components

Page 3: DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

Distributed Database Management Distributed Database Management SystemSystem

A distributed database collection of multiple, logically interrelated stores data on multiple computers (nodes) over

the network and permits access from any node to the joint data

A distributed database management system (DDBMS) is a software system that permits the management of the distributed databases and makes the distribution transparent to the users.

Page 4: DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

Reasons for Data Distribution Reasons for Data Distribution

Several factors have led to the development of DDBS: Distributed nature of some database applications Increased reliability and availability Allowing data sharing while maintaining some measure of

local control Improved performance

Page 5: DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

Distributed DBMS EnvironmentDistributed DBMS Environment

Site 1

Site 2

Site 4

Site 3

Site 5 Site 6

Communication Network

Page 6: DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

Additional Functionality of DDBMS Additional Functionality of DDBMS

Distribution leads to increased complexity in the system design and implementation

DDBMS must be able to provide additional functions to those of a centralized DBMS Some of these are: Access remote sites and transmit queries and data among the Track of the data distribution and replication Execution strategies for queries Copy Identification Consistency of copies of a replicated data item Global conceptual schema of the distributed database Recovery from individual site crashes

Page 7: DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

What is not a Distributed What is not a Distributed Database System? Database System? A DDBS is not a ``collection of files''

that can be individually stored at each node of a computer network files are not logically related no access via common interface

Page 8: DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

Centralized DBMS on a Centralized DBMS on a NetworkNetwork data resides only at one node the database management is no different from

centralized DBMS remote processing, single server multiple

clients

Site 1

Site 2

Site 4

Site 3

Site 5 Site 6

Communication Network

Page 9: DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

Distributed Database System Distributed Database System Technology Technology Distributed database technology

attempts to achieve integration without centralization

Database Technology Computers Networks

Distributed Database Systems

Integration Integration Without

Centralization

Distributed Computing

Page 10: DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

Example Example

Multinational manufacturing company: head quarters in New York manufacturing plants in Chicago and Montreal warehouses in Phoenix and Edmonton R&D facilities in San Francisco

Data and Information: employee records (working location) projects (R&D) engineering data (manufacturing plants, R&D) inventory (manufacturing, warehouse)

Page 11: DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

Promises of Distributed DBMS Promises of Distributed DBMS

transparent management of distributed, fragmented, and replicated data

improved reliability and availability through distributed transactions

improved performance higher system extendibility

Page 12: DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

TransparencyTransparency

Transparency refers to separation of the higher-level semantics of a system from lower-level implementation details.

From data independence in centralized DBMS to fragmentation transparency in DDBMS.

Issues Who should provide transparency? What is the state of the art in the industry?

Page 13: DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

Improved ReliabilityImproved Reliability

Distributed DBMS can use replicated components to eliminate single point failure.

The users can still access part of the distributed database with “proper care” even though some of the data is unreachable.

Distributed transactions facilitate maintenance of consistent database state even when failures occur.

Page 14: DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

Improved PerformanceImproved Performance

Since each site handles only a portion of a database, the contention for CPU and I/O resources is not that severe. Data localization reduces communication overheads.

Inherent parallelism of distributed systems may be exploited inter-query parallelism intra-query parallelism

Performance models are not sufficiently developed.

Page 15: DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

Easier System ExpansionEasier System Expansion

Ability to add new sites, data, and users over time without major restructuring.

Huge centralized database systems (mainframes) are history (almost!).

PC revolution (Compaq buying Digital, 1998) will make natural distributed processing environments.

New applications (such as, supply chain) are naturally distributed - centralized systems will just not work.

Page 16: DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

Disadvantages of DDBMSs Disadvantages of DDBMSs

Lack of Experience No operating true distributed database systems in

existence Complexity

DDBMS problems are inherently more complex than centralized DBMS ones

Cost More hardware, software and people costs

Distribution of control Problems of synchronization and coordination to

maintain data consistency Security

Database security + network security Difficult to convert

No tools to convert centralized DBMSs to DDBMSs

Page 17: DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

Complicating Factors Complicating Factors

Data may be replicated in a distributed environment, consequently the DDBMS is responsible for choosing one of the stored copies of the

requested data for access in case of retrievals making sure that the effect of an update is

reflected on each and every copy of that data item

If there is site/link failure while an update is being executed, the DDBMS must make sure that the effects will be reflected on the data residing at the failing or unreachable sites as soon as the system recovers from the failure

Page 18: DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

Complicating FactorsComplicating Factors

Maintaining consistency of distributed/replicated data.

Since each site cannot have instantaneous information on the actions currently carried out in other sites, the synchronization of transactions at multiple sites is harder than centralized system.

Page 19: DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

Distributed DBMS IssuesDistributed DBMS Issues

Distributed Database Design Distributed Query Processing Distributed Directory Management Distributed Concurrency Control Distributed Deadlock Management Reliability of Distributed Databases Operating Systems Support Heterogeneous Databases

Page 20: DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

Distributed Database Design Distributed Database Design

The problem is how the database and the applications that run against it should be placed across the sites.

The two fundamental design issues are fragmentation (the separation of the database into partitions called fragments), and allocation (distribution), the optimum distribution of fragments. The general problem is NP hard.

Page 21: DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

Distributed Query Processing Distributed Query Processing

Query processing deals with designing algorithms that analyze queries and convert them into a series of data manipulation operations.

The problem is how to decide on strategy for executing each query over the network in the most cost effective way, however the cost is defined. The objective is to optimize where the inherent parallelism is used to improve the performance of executing the transaction

Page 22: DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

Distributed Directory Distributed Directory Management Management A directory contains information (such

as descriptions and locations) about data items in the database.

A directory may be global to the entire DDBMS, or local to each site, distributed, multiple copies, etc.

Page 23: DDBMS Architecture DDBMS Architecture Session-8 Data Management for Decision Support

Distributed Concurrency Distributed Concurrency Control Control Concurrency control involves the

synchronization of accesses to the distributed database, such that the integrity of the database is maintained.

One not only has to worry about the integrity of a single database, but also about the consistency of multiple copies of the database (mutual consistency)