41
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/1 CSI5311 Distributed Databases and Transaction Processing Iluju Kiringa Text book: T. Ozsu and P. Valduriez, Principles of Distributed Database Systems, 3rd edition, Springer 2011 Notes based on those by TO and PV Ch

CSI5311 Distributed Databases and Transaction Processing

  • Upload
    quinta

  • View
    174

  • Download
    10

Embed Size (px)

DESCRIPTION

CSI5311 Distributed Databases and Transaction Processing. Iluju Kiringa Text book: T. Ozsu and P. Valduriez , Principles of Distributed Database Systems, 3rd edition, Springer 2011 Notes based on those by TO and PV. Outline. Introduction What is a distributed DBMS - PowerPoint PPT Presentation

Citation preview

Page 1: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/1

CSI5311 Distributed Databases and

Transaction ProcessingIluju Kiringa

Text book: T. Ozsu and P. Valduriez, Principles of Distributed Database Systems, 3rd edition,

Springer 2011Notes based on those by TO and PV

Ch.x/1

Page 2: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/2

Outline• Introduction

➡ What is a distributed DBMS➡ Distributed DBMS Architecture

• Background• Distributed Database Design• Database Integration• Semantic Data Control• Distributed Query Processing• Multidatabase query processing• Distributed Transaction Management• Data Replication• Parallel Database Systems• Distributed Object DBMS• Peer-to-Peer Data Management• Web Data Management • Current Issues: Streams and Clouds

Page 3: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/3

File Systems

program 1data description 1

program 2data description 2

program 3data description 3

File 1

File 2

File 3

Page 4: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/4

Database Management

database

DBMS

Applicationprogram 1(with datasemantics)

Applicationprogram 2(with datasemantics)

Applicationprogram 3(with datasemantics)

descriptionmanipulation

control

Page 5: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/5

Motivation

DatabaseTechnology

ComputerNetworks

integration distribution

integration

integration ≠ centralization

DistributedDatabaseSystems

Page 6: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/6

Distributed Computing•A number of autonomous processing elements (not necessarily

homogeneous) that are interconnected by a computer network and that cooperate in performing their assigned tasks.

•What is being distributed?➡ Processing logic➡ Function➡ Data➡ Control

Page 7: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/7

What is a Distributed Database System?A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network.

A distributed database management system (D–DBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users.

Distributed database system (DDBS) = DDB + D–DBMS

Page 8: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/8

What is not a DDBS?•A timesharing computer system•A loosely or tightly coupled multiprocessor system•A database system which resides at one of the nodes of a

network of computers - this is a centralized database on a network node

Page 9: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/9

Centralized DBMS on a Network

Site 5

Site 1Site 2

Site 3Site 4

CommunicationNetwork

Page 10: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/10

Distributed DBMS Environment

Site 5

Site 1Site 2

Site 3Site 4

CommunicationNetwork

Page 11: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/11

Implicit Assumptions•Data stored at a number of sites each site logically consists of a

single processor.•Processors at different sites are interconnected by a computer

network not a multiprocessor system➡ Parallel database systems

•Distributed database is a database, not a collection of files data logically related as exhibited in the users’ access patterns➡ Relational data model

•D-DBMS is a full-fledged DBMS➡ Not remote file system, not a TP system

Page 12: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/12

Data Delivery Alternatives•Delivery modes

➡ Pull-only➡ Push-only➡ Hybrid

•Frequency➡ Periodic➡ Conditional➡ Ad-hoc or irregular

•Communication Methods➡ Unicast➡ One-to-many

•Note: not all combinations make sense

Page 13: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/13

Distributed DBMS PromisesTransparent management of distributed, fragmented, and

replicated data Improved reliability/availability through distributed transactions Improved performanceEasier and more economical system expansion

Ch.x/13

Page 14: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/14

Transparency•Transparency is the separation of the higher level semantics of a

system from the lower level implementation issues.•Fundamental issue is to provide

data independence in the distributed environment

➡ Network (distribution) transparency➡ Replication transparency➡ Fragmentation transparency

✦ horizontal fragmentation: selection✦ vertical fragmentation: projection✦ hybrid

Ch.x/14

Page 15: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/15

Example

Page 16: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/16

Transparent AccessSELECT ENAME,SALFROM EMP,ASG,PAY

WHERE DUR > 12

AND EMP.ENO = ASG.ENO

AND PAY.TITLE = EMP.TITLEParis projects

Paris employeesParis assignmentsBoston employees

Montreal projectsParis projects

New York projects with budget > 200000

Montreal employeesMontreal assignments

Boston

CommunicationNetwork

Montreal

Paris

NewYork

Boston projectsBoston employees

Boston assignments

Boston projectsNew York employees

New York projectsNew York assignments

Tokyo

Page 17: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/17

Distributed Database - User View

Distributed Database

Page 18: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/18

Distributed DBMS - Reality

CommunicationSubsystem

DBMSSoftware

UserApplicationUser

Query

DBMSSoftware

DBMSSoftware

DBMSSoftware

UserQuery

DBMSSoftware

UserQuery

UserApplication

Page 19: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/19

Types of Transparency•Data independence•Network transparency (or distribution transparency)

➡ Location transparency➡ Fragmentation transparency

•Replication transparency•Fragmentation transparency

Page 20: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/20

Reliability Through Transactions•Replicated components and data should make distributed DBMS

more reliable.•Distributed transactions provide

➡ Concurrency transparency➡ Failure atomicity

• Distributed transaction support requires implementation of ➡ Distributed concurrency control protocols➡ Commit protocols

•Data replication➡ Great for read-intensive workloads, problematic for updates➡ Replication protocols

Page 21: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/21

Potentially Improved Performance•Proximity of data to its points of use

➡ Requires some support for fragmentation and replication

•Parallelism in execution➡ Inter-query parallelism➡ Intra-query parallelism

Page 22: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/22

Parallelism Requirements•Have as much of the data required by each application at the site

where the application executes➡ Full replication

•How about updates?➡ Mutual consistency➡ Freshness of copies

Page 23: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/23

System Expansion• Issue is database scaling

•Emergence of microprocessor and workstation technologies➡ Demise of Grosh's law➡ Client-server model of computing

•Data communication cost vs telecommunication cost

Page 24: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/24

Distributed DBMS Issues•Distributed Database Design

➡ How to distribute the database➡ Replicated & non-replicated database distribution➡ A related problem in directory management

•Query Processing➡ Convert user transactions to data manipulation instructions➡ Optimization problem

✦ min{cost = data transmission + local processing}➡ General formulation is NP-hard

Page 25: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/25

Distributed DBMS Issues•Concurrency Control

➡ Synchronization of concurrent accesses➡ Consistency and isolation of transactions' effects➡ Deadlock management

• Reliability➡ How to make the system resilient to failures➡ Atomicity and durability

Page 26: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/26

DirectoryManagement

Relationship Between Issues

Reliability

DeadlockManagement

QueryProcessing

ConcurrencyControl

DistributionDesign

Page 27: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/27

Related Issues•Operating System Support

➡ Operating system with proper support for database operations➡ Dichotomy between general purpose processing requirements and

database processing requirements•Open Systems and Interoperability

➡ Distributed Multidatabase Systems➡ More probable scenario➡ Parallel issues

Page 28: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/28

Architecture•Defines the structure of the system

➡ components identified➡ functions of each component defined➡ interrelationships and interactions between components defined

Page 29: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/29

ANSI/SPARC Architecture

ExternalSchema

ConceptualSchema

InternalSchema

Internal view

Users

External view

Conceptual view

External view

External view

Page 30: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/30

Generic DBMS Architecture

Page 31: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/31

DBMS Implementation Alternatives

Page 32: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/32

Dimensions of the Problem• Distribution

➡ Whether the components of the system are located on the same machine or not

• Heterogeneity➡ Various levels (hardware, communications, operating system)➡ DBMS important one

✦ data model, query language,transaction management algorithms• Autonomy

➡ Most troublesome➡ Various versions

✦ Design autonomy: Ability of a component DBMS to decide on issues related to its own design.

✦ Communication autonomy: Ability of a component DBMS to decide whether and how to communicate with other DBMSs.

✦ Execution autonomy: Ability of a component DBMS to execute local operations in any manner it wants to.

Page 33: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/33

Client/Server Architecture

Page 34: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/34

Advantages of Client-Server Architectures•More efficient division of labor •Horizontal and vertical scaling of resources•Better price/performance on client machines•Ability to use familiar tools on client machines•Client access to remote data (via standards)•Full DBMS functionality provided to client workstations•Overall better system price/performance

Page 35: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/35

Database Server

Page 36: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/36

Distributed Database Servers

Page 37: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/37

Datalogical Distributed DBMS Architecture

...

...

...

ES1 ES2 ESn

GCS

LCS1 LCS2 LCSn

LIS1 LIS2 LISn

Page 38: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/38

Peer-to-Peer Component Architecture

Database

DATA PROCESSORUSER PROCESSOR

USER

Userrequests

Systemresponses

ExternalSchema

Use

r In

terf

ace

Han

dler

GlobalConceptual

Schema

Sem

anti

c D

ata

Cont

rolle

r

Glo

bal

Exec

utio

nM

onit

or

SystemLog

Loca

l Rec

over

yM

anag

er

LocalInternalSchema

Runt

ime

Supp

ort

Proc

esso

r

Loca

l Que

ryPr

oces

sor

LocalConceptual

Schema

Glo

bal Q

uery

Opt

imiz

er

GD/D

Page 39: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/39

Datalogical Multi-DBMS Architecture

...

GCS… …

GES1

LCS2 LCSn…

…LIS2 LISn

LES11 LES1n LESn1 LESnm

GES2 GESn

LIS1

LCS1

Page 40: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/40

MDBS Components & Execution

Multi-DBMSLayer

DBMS1 DBMS3DBMS2

GlobalUser

Request

LocalUser

RequestGlobal

SubrequestGlobal

SubrequestGlobal

Subrequest

LocalUser

Request

Page 41: CSI5311  Distributed  Databases and Transaction Processing

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/41

Mediator/Wrapper Architecture