Upload
angelica-harris
View
222
Download
2
Embed Size (px)
Citation preview
Distributed DB 12. 1
CSE2132 Database Systems
Week 12 Lecture
Distributed Database
Distributed DB 12. 2
Evolution of Distributed DBMS CENTRALISED systems - all system components located on a single
computer• data• DBMS• secondary storage (disks etc)
– Access via serially connected 'dumb terminals' - all processing occurs at central site
1980s changes:– business operations decentralised geographically, need to become
'lean-and-meaner, quick reacting, dispersed operations'– technological change low cost powerful computing platforms
led to DISTRIBUTED SYSTEMS
Distributed DB 12. 3
Distributed Data Base Management Systems system components distributed over multiple sites, interconnected via
communication system (network) managed by Distributed Data Base Management System (DDBMS) DDBMS advantages:
– data located near the greatest demand site
– faster data access - desired data subset locally available
– faster data processing - system processing load spread out over multiple cpu's
– growth facilitation - easy to add new sites to network
– less danger of single point failure DDBMS disadvantages:
– complexity of management and control
– security - weaker due to distribution (thus more people involved) and network traffic
– lack of standards - many communication protocols exist eg. tcp/ip netbios DECnet etc.
Distributed DB 12. 4
Distributed Processing database's logical processing shared among multiple cpu's actual database resides on a single computer
Distributed DB 12. 5
Distributed Database actual database stored over two or more independent cpu's - each part
know as a database fragment
Distributed DB 12. 6
Distribution Options
Two important components of a DDBMS are– Transaction Processor (TP) - receives and processes the applications
data request– Data Processor (DP) - stores and retrieves data located at the site
Data And Processing Distribution Options:
– Single site processing, single site data
• all processing done on single cpu (host computer)• all data stored on host computers local disk/s• eg. traditional mainframe / minicomputer DBMS with dumb terminals
or single user microcomputer DBMS
Distributed DB 12. 7
Distribution Options– Multi site processing, single site data
• multiple processes run on different computers
• all data stored on single computers local disk/s
• e.g. LAN file server
TP acts as redirector - routes all network data requests to file server
file server appears to end user as a hard disk eg. F:
all data selection, search and update occur at workstation entire file must be transported across network
inefficient, costly (communication)
Distributed DB 12. 8
Distribution Options (1) Client-server is a term used very loosely - difficult to clearly define
Client-server, similar to file server except database processing occurs at the server site - know as a database server. Client server is about a split in processing rather than a split in data. A 3 tier architecture is often employed. This requires communications middleware to resolve translation issues by the use of an agreed protocol(e.g. TCP/IP, IPX/SPX or NetBEUI). Client-server approaches overlap and and are used in conjunction with distributed database.
Multiple site processing, multiple site data
This describes a fully distributed DBMS. They can then be classified as homogeneous or heterogeneous.
Distributed DB 12. 9
Distribution Options (2)
• homogeneous DDBMS
The same DBMS running at each site. While there is much complexity to deal with the task of accessing data over many sites is simpler than in the heterogeneous case.
• heterogeneous DDBMS
Will support different DBMSs and even different DBMS models (relational, network, hierarchical) at each site. Such implementations of a DDBMS operate under certain restrictions. (e.g. the access to other databasesis read only or only other relational databases can be accessed.)
Distributed DB 12. 10
Dates Distributed Database RulesDISTRIBUTED DATABASE SHOULD
LOOK EXACTLY LIKE A NON DISTRIBUTED 'LOCAL' DATABASE
LOCAL ACCESS TO LOCAL DATA
Many vendors claim to be delivering distributed database applications - need criteria:
DATES 12 RULES
– 0 - TO USER A DISTRIBUTED DATABASE SHOULD LOOK NO DIFFERENT TO A NON DDB - Distribution transparency, database treated as single logical database
– 1 - LOCAL AUTONOMY, local data owned and managed locally
– 2 - NO RELIANCE ON CENTRAL SITE
– 3 - CONTINUOUS OPERATION, failure transparency - site independence, even in the event of a node failure the system continues to operate
– 4 - LOCATION INDEPENDENCE, location transparency
– 5 - FRAGMENTATION INDEPENDENCE, fragmentation transparency
Distributed DB 12. 11
Dates rules continued
– 6 - REPLICATION INDEPENDENCE
– 7 - DISTRIBUTED QUERY PROCESSING, multiple site queries
– 8 - DISTRIBUTED TRANSACTION MANAGEMENT, transaction transparency - multiple site updates
– 9 - HARDWARE INDEPENDENCE
– 10 - OPERATING SYSTEM INDEPENDENCE
– 11 - COMMUNICATION NETWORK INDEPENDENCE
– 12 - DATABASE INDEPENDENCE
Distributed DB 12. 12
Levels of Distribution Transparency Can be used as a method of
classification by determining level of transparency supported by DDBMS at highest level:
– FRAGMENTATION transparency
• No need to specify fragment names or locations
• select *
from employee
where dob < 01/01/40
– LOCATION transparency
• Specify fragment names but not locations
select *
from e1
where dob < 01/01/40;
UNION
select *
from e2
where dob < 01/01/40;
UNION
select *
from e3
where dob < 01/01/40;
Distributed DB 12. 13
Levels of Distribution Transparency– local MAPPING transparency (lowest level)
Need to specify both fragment and location (using pseudo-SQL)
select * from e1 node melbourne
where dob < 01/01/40;
UNION
select * from e2 node sydney
where dob < 01/01/40;
UNION
select * from e3 node adelaide
where dob < 01/01/40;
Distributed DB 12. 14
DDBMS Operations Join operation (most vendors supply)
– Easier to achieve
– Query optimisation critical Update operation
– eg debit / credit of two accounts at different sites
– more difficult to manage
– Need sophisticated transaction management - most popular strategy : TWO PHASE COMMIT
Two Phase Commit requires three operations:
– DO - performs operation & records before and after image in transaction log
– UNDO - undoes an operation using log entries created in DO
– REDO - redoes an operation using log entries created in DO
Distributed DB 12. 15
Two Phase Commit Site originating transaction
(coordinator) sends request to sites (subordinates), each site processes sub transaction but does not commit.
Phase 1 - Preparation:
– 1. coordinator sends prepare to commit to all subordinates
– 2. subordinates receive message, write log entries and reply to coordinator - READY to COMMIT or NOT READY
– 3. coordinator checks all nodes ready to commit - if not broadcasts an ABORT, if all ready:
Phase 2 - Final COMMIT
– 1. coordinator broadcasts a commit message to all subordinates and awaits a reply
– 2. subordinate receives commit and updates database
– 3. subordinates reply with COMMITTED or NOT COMMITTED to the coordinator
– If any subordinates did not commit, coordinator sends ABORT forcing an UNDO
Distributed DB 12. 16
DISTRIBUTION STRATEGIES
The overall data model - the company view Distribution principles
– Examine geography / frequency of access
– Guiding principle -minimise network traffic and communication costs
Distributed DB 12. 17
The sales team– Situated in Sydney– Need most access to customer/order/order-line/product
The supply branch– Situated in Melbourne– Need most access to warehouse/inventory/product
Data model partitioning
What do we do with RELATIONS at the BOUNDARY ?FRAGMENTATION vs REPLICATION
Distributed DB 12. 18
Possible starting points:
– Which site accesses it most?
• Storage at a single site
• Minimises update complexity
– Is there a case for fragmentation?
• Minimises local access time
• Minimises network traffic
– Is there a case for replication?
• Minimises local access time
• Minimises network traffic Fragmentation
– Horizontal fragmentation
• based on SELECTION, eg fragment customer table on customer_city
Distributed DB 12. 19
– Vertical fragmentation
• Based on projection, eg fragment product table on attributes needed by each group
SALES : P_CODE, DESC, UNIT_PRICE
SUPPLY : P_CODE, PACK_SIZE
– Hybrid Fragmentation
• Assume 3 warehouses
Footscray - p_code < 100
Collingwood - p_code = 100
Dandenong - p_code > 100
• further subdivide vertically supply fragment
SELECT P_CODE < 100 --> FOOTSCRAY
SELECT P_CODE = 100 --> COLLINGWOOD
SELECT P_CODE > 100 --> DANDENONG
Distributed DB 12. 20
Replication Efficient retrievals vs
Multi site updating
– Costly, accident prone
Updating techniques
– Conservative
– Don't commit until all sites accept
– Primary node, one site accept updates and broadcast
– Majority voting
– Snapshots, etc
Distributed DB 12. 21
Query Optimization Query - list the supplier numbers for cleveland suppliers of red parts ?
– SUPPLIER(SUPPLIER#, CITY) 10,000 DETROIT
– PART(PART#, COLOUR) 100,000 CHICAGO
– SHIPMENT(SUPPLIER#, PART#) 1,000,000 DETROIT
SELECT
S.SUPPLIER#
FROM
SUPPLIER S, PART P, SHIPMENT H
WHERE
S.SUPPLIER# = H.SUPPLIER# AND
H.PART# = P.PART# AND
P.COLOUR = 'RED'; Time for query varies from 1 second to 2.3 days depending on the Query Plan
selected