Upload
tess-simcox
View
236
Download
7
Embed Size (px)
Citation preview
Dr Mohamed Osman Hegazi
Distributed Database Systems
Dr Mohamed Osman Hegazi
Definitions
Distributed Database is a collection of multiple logically interrelated databases distributed over a computer networkDistributed database management systems (DDBMS) The software that permits the management of DDBS and makes the distribution transparent to the usersDistributed database system (DDBS) = DDB + DndashDBMS
The two important terms in this definitions are-Logically interrelated (The Application)-Distributed over a network
Dr Mohamed Osman Hegazi
1 The development of computer network promotes de-centralization
2 In a company the database organization might reflect the organizational structure which is distributed into units Each unit maintains its own database
3 Sharing of data can be achieved by developing a distributed database system whichbull Makes data accessible by all unitsbull Stores data close to where it is most
frequently used
Motivation for Distributed Database
Dr Mohamed Osman Hegazi
DDBMS Advantages
bull Data are located near ldquogreatest demandrdquo sitebull Faster data accessbull Faster data processingbull Growth facilitationbull Improved communicationsbull Reduced operating costsbull User-friendly interfacebull Less danger of a single-point failurebull Processor independence
Dr Mohamed Osman Hegazi
DDBMS Disadvantages
bull Complexity of management and controlbull Securitybull Lack of standardsbull Increased storage requirementsbull Greater difficulty in managing the data
environmentbull Increased training cost
Dr Mohamed Osman Hegazi
The concept of DDBA DDBS is not a collection of files that can be individually stored at each node of computer network To form a DDBS files should not only be logically related but there should be structure among the files and access should be via a common interface
Dr Mohamed Osman Hegazi
Distributed Database Management Systems
Dr Mohamed Osman Hegazi
An ExampleEMP(ENO ENAME TITLE)
ASG(ENO PNO DUR RESP)
PROJ(PNO PNAME BUDGET)PAY(TITLESAL)
Dr Mohamed Osman Hegazi
Distributed Query
bullIf these table is stored in one place then we can ldquofor examplerdquo using the following query to get the name and the salary of the employee who works more
than 12 months SELECT ENAME SALFROM EMP ASG PAYWHERE ASG DUR gt12AND EMPENO=ASGENOAND PAYTITLE=EMPTITLE
But if these table are distributed over deferent site then the execution of this query needs allot of process to be done DDMS do this process and let the end user feel like databasersquos only user (transparence)
Dr Mohamed Osman Hegazi
bullThe concepts of DDB is to fragment the data and store each fragment on its site
bullData may be replicated on different site (replication)
bullDDBMS hide these details from the user and makes the distribution transparent to the users Distributed Database Transparency FeaturesDistribution transparency Transaction transparencyFailure transparency Performance transparency Heterogeneity transparency
Distributed Database Transparency
Dr Mohamed Osman Hegazi
Distributed DB Design
Top-down approach bull have a databasebull how to split and allocate to individual sitesTwo issues in top-down designFragmentationAllocation
Multi-databases (or bottom-up)bull combine existing databasesbull how to deal with heterogeneity amp autonomy
Dr Mohamed Osman Hegazi
Fragmentationbull Horizontal Primary
depends on local attributesR Derived
depends on foreign relation
bull Vertical
R
Dr Mohamed Osman Hegazi
Example
Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select
from E from Ewhere loc=Sa where loc=Sbandhellip and
Motivation Two sites Sa Sb Qa Qb
Sa Sb
Dr Mohamed Osman Hegazi
Name Loc Sal578
Sa 10Sally Sb 25Tom Sa 15
Joe
58
Sa 10Tom Sa 15Joe 7 Sb 25Sally
F = F1F2
At Sa At Sb
E
F1 = loc=Sa(E) F2 = loc=Sb(E)
primary horizontal fragmentation
Dr Mohamed Osman Hegazi
Loc=SA sal lt 10
Loc=SA
sal 10
Loc=SB sal lt 10
Loc=SB
sal 10
F1
F3F2
Qa Select hellip loc = SA
Qb Select hellip loc = SB
Prefer F2 to F1 and F3
Dr Mohamed Osman Hegazi
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(
(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(
Horizontal Fragmentation Peer to peer relationship ndash brothers
Dr Mohamed Osman Hegazi
Vertical fragmentation
E1
NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip
NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip
Sal5 107 258 15hellip
E
E2
Example
R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T
Just like normalization of relations
Dr Mohamed Osman Hegazi
Vertical Fragmentation example
PROJ1 information about project budgets
PROJ2 information about project names and locations
PNO BUDGET
P1 150000
P3 250000P2 135000
P4 310000P5 500000
PNO PNAME LOC
P1 Instrumentation Montreal
P3 CADCAM New YorkP2 Database DevelopNew York
P4 Maintenance ParisP5 CADCAM Boston
PROJ1 PROJ2
New YorkNew York
PROJ
PNO PNAME BUDGET LOC
P1 Instrumentation 150000 Montreal
P3 CADCAM 250000P2 Database Develop135000
P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston
New YorkNew York
Dr Mohamed Osman Hegazi
E1(NMLOC)E2(SAL)
ExampleE(NMLOCSAL) E1(NM)
E2(LOC)E3(SAL)
Which is the right vertical fragmentationhellip
Grouping Attributes
Dr Mohamed Osman Hegazi
Vertical Fragmentation branch relationship ndash parents and son
ΩήϓϷϥϮΌη
ΕΎΒΗή
Ϥϟ
(Sal
ary
allo
wan
ces
Tax
(
ΎϨϴϴόΘ
ϟΕ
(N
ame ad
dre
ss g
rade(
Dr Mohamed Osman Hegazi
Hybrid Fragmentation
R
HFHF
R1
VF VFVFVFVF
R11 R12 R21 R22 R23
R2
Dr Mohamed Osman Hegazi
AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)
Site a
Site b
Fragment E
Do we replicate fragments
Where do we place each copy of each fragment
Site c
F1F1
F2
Dr Mohamed Osman Hegazi
Allocation Alternativesbull Non-replicated
ndash partitioned each fragment resides at only one site
bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the
sitesbull Rule
If replication is advantageous
otherwise replication may cause problems
read - only queriesupdate queries
1
Dr Mohamed Osman Hegazi
Optimization problem
bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash
bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash
Very hard problem
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
Definitions
Distributed Database is a collection of multiple logically interrelated databases distributed over a computer networkDistributed database management systems (DDBMS) The software that permits the management of DDBS and makes the distribution transparent to the usersDistributed database system (DDBS) = DDB + DndashDBMS
The two important terms in this definitions are-Logically interrelated (The Application)-Distributed over a network
Dr Mohamed Osman Hegazi
1 The development of computer network promotes de-centralization
2 In a company the database organization might reflect the organizational structure which is distributed into units Each unit maintains its own database
3 Sharing of data can be achieved by developing a distributed database system whichbull Makes data accessible by all unitsbull Stores data close to where it is most
frequently used
Motivation for Distributed Database
Dr Mohamed Osman Hegazi
DDBMS Advantages
bull Data are located near ldquogreatest demandrdquo sitebull Faster data accessbull Faster data processingbull Growth facilitationbull Improved communicationsbull Reduced operating costsbull User-friendly interfacebull Less danger of a single-point failurebull Processor independence
Dr Mohamed Osman Hegazi
DDBMS Disadvantages
bull Complexity of management and controlbull Securitybull Lack of standardsbull Increased storage requirementsbull Greater difficulty in managing the data
environmentbull Increased training cost
Dr Mohamed Osman Hegazi
The concept of DDBA DDBS is not a collection of files that can be individually stored at each node of computer network To form a DDBS files should not only be logically related but there should be structure among the files and access should be via a common interface
Dr Mohamed Osman Hegazi
Distributed Database Management Systems
Dr Mohamed Osman Hegazi
An ExampleEMP(ENO ENAME TITLE)
ASG(ENO PNO DUR RESP)
PROJ(PNO PNAME BUDGET)PAY(TITLESAL)
Dr Mohamed Osman Hegazi
Distributed Query
bullIf these table is stored in one place then we can ldquofor examplerdquo using the following query to get the name and the salary of the employee who works more
than 12 months SELECT ENAME SALFROM EMP ASG PAYWHERE ASG DUR gt12AND EMPENO=ASGENOAND PAYTITLE=EMPTITLE
But if these table are distributed over deferent site then the execution of this query needs allot of process to be done DDMS do this process and let the end user feel like databasersquos only user (transparence)
Dr Mohamed Osman Hegazi
bullThe concepts of DDB is to fragment the data and store each fragment on its site
bullData may be replicated on different site (replication)
bullDDBMS hide these details from the user and makes the distribution transparent to the users Distributed Database Transparency FeaturesDistribution transparency Transaction transparencyFailure transparency Performance transparency Heterogeneity transparency
Distributed Database Transparency
Dr Mohamed Osman Hegazi
Distributed DB Design
Top-down approach bull have a databasebull how to split and allocate to individual sitesTwo issues in top-down designFragmentationAllocation
Multi-databases (or bottom-up)bull combine existing databasesbull how to deal with heterogeneity amp autonomy
Dr Mohamed Osman Hegazi
Fragmentationbull Horizontal Primary
depends on local attributesR Derived
depends on foreign relation
bull Vertical
R
Dr Mohamed Osman Hegazi
Example
Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select
from E from Ewhere loc=Sa where loc=Sbandhellip and
Motivation Two sites Sa Sb Qa Qb
Sa Sb
Dr Mohamed Osman Hegazi
Name Loc Sal578
Sa 10Sally Sb 25Tom Sa 15
Joe
58
Sa 10Tom Sa 15Joe 7 Sb 25Sally
F = F1F2
At Sa At Sb
E
F1 = loc=Sa(E) F2 = loc=Sb(E)
primary horizontal fragmentation
Dr Mohamed Osman Hegazi
Loc=SA sal lt 10
Loc=SA
sal 10
Loc=SB sal lt 10
Loc=SB
sal 10
F1
F3F2
Qa Select hellip loc = SA
Qb Select hellip loc = SB
Prefer F2 to F1 and F3
Dr Mohamed Osman Hegazi
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(
(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(
Horizontal Fragmentation Peer to peer relationship ndash brothers
Dr Mohamed Osman Hegazi
Vertical fragmentation
E1
NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip
NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip
Sal5 107 258 15hellip
E
E2
Example
R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T
Just like normalization of relations
Dr Mohamed Osman Hegazi
Vertical Fragmentation example
PROJ1 information about project budgets
PROJ2 information about project names and locations
PNO BUDGET
P1 150000
P3 250000P2 135000
P4 310000P5 500000
PNO PNAME LOC
P1 Instrumentation Montreal
P3 CADCAM New YorkP2 Database DevelopNew York
P4 Maintenance ParisP5 CADCAM Boston
PROJ1 PROJ2
New YorkNew York
PROJ
PNO PNAME BUDGET LOC
P1 Instrumentation 150000 Montreal
P3 CADCAM 250000P2 Database Develop135000
P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston
New YorkNew York
Dr Mohamed Osman Hegazi
E1(NMLOC)E2(SAL)
ExampleE(NMLOCSAL) E1(NM)
E2(LOC)E3(SAL)
Which is the right vertical fragmentationhellip
Grouping Attributes
Dr Mohamed Osman Hegazi
Vertical Fragmentation branch relationship ndash parents and son
ΩήϓϷϥϮΌη
ΕΎΒΗή
Ϥϟ
(Sal
ary
allo
wan
ces
Tax
(
ΎϨϴϴόΘ
ϟΕ
(N
ame ad
dre
ss g
rade(
Dr Mohamed Osman Hegazi
Hybrid Fragmentation
R
HFHF
R1
VF VFVFVFVF
R11 R12 R21 R22 R23
R2
Dr Mohamed Osman Hegazi
AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)
Site a
Site b
Fragment E
Do we replicate fragments
Where do we place each copy of each fragment
Site c
F1F1
F2
Dr Mohamed Osman Hegazi
Allocation Alternativesbull Non-replicated
ndash partitioned each fragment resides at only one site
bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the
sitesbull Rule
If replication is advantageous
otherwise replication may cause problems
read - only queriesupdate queries
1
Dr Mohamed Osman Hegazi
Optimization problem
bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash
bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash
Very hard problem
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
1 The development of computer network promotes de-centralization
2 In a company the database organization might reflect the organizational structure which is distributed into units Each unit maintains its own database
3 Sharing of data can be achieved by developing a distributed database system whichbull Makes data accessible by all unitsbull Stores data close to where it is most
frequently used
Motivation for Distributed Database
Dr Mohamed Osman Hegazi
DDBMS Advantages
bull Data are located near ldquogreatest demandrdquo sitebull Faster data accessbull Faster data processingbull Growth facilitationbull Improved communicationsbull Reduced operating costsbull User-friendly interfacebull Less danger of a single-point failurebull Processor independence
Dr Mohamed Osman Hegazi
DDBMS Disadvantages
bull Complexity of management and controlbull Securitybull Lack of standardsbull Increased storage requirementsbull Greater difficulty in managing the data
environmentbull Increased training cost
Dr Mohamed Osman Hegazi
The concept of DDBA DDBS is not a collection of files that can be individually stored at each node of computer network To form a DDBS files should not only be logically related but there should be structure among the files and access should be via a common interface
Dr Mohamed Osman Hegazi
Distributed Database Management Systems
Dr Mohamed Osman Hegazi
An ExampleEMP(ENO ENAME TITLE)
ASG(ENO PNO DUR RESP)
PROJ(PNO PNAME BUDGET)PAY(TITLESAL)
Dr Mohamed Osman Hegazi
Distributed Query
bullIf these table is stored in one place then we can ldquofor examplerdquo using the following query to get the name and the salary of the employee who works more
than 12 months SELECT ENAME SALFROM EMP ASG PAYWHERE ASG DUR gt12AND EMPENO=ASGENOAND PAYTITLE=EMPTITLE
But if these table are distributed over deferent site then the execution of this query needs allot of process to be done DDMS do this process and let the end user feel like databasersquos only user (transparence)
Dr Mohamed Osman Hegazi
bullThe concepts of DDB is to fragment the data and store each fragment on its site
bullData may be replicated on different site (replication)
bullDDBMS hide these details from the user and makes the distribution transparent to the users Distributed Database Transparency FeaturesDistribution transparency Transaction transparencyFailure transparency Performance transparency Heterogeneity transparency
Distributed Database Transparency
Dr Mohamed Osman Hegazi
Distributed DB Design
Top-down approach bull have a databasebull how to split and allocate to individual sitesTwo issues in top-down designFragmentationAllocation
Multi-databases (or bottom-up)bull combine existing databasesbull how to deal with heterogeneity amp autonomy
Dr Mohamed Osman Hegazi
Fragmentationbull Horizontal Primary
depends on local attributesR Derived
depends on foreign relation
bull Vertical
R
Dr Mohamed Osman Hegazi
Example
Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select
from E from Ewhere loc=Sa where loc=Sbandhellip and
Motivation Two sites Sa Sb Qa Qb
Sa Sb
Dr Mohamed Osman Hegazi
Name Loc Sal578
Sa 10Sally Sb 25Tom Sa 15
Joe
58
Sa 10Tom Sa 15Joe 7 Sb 25Sally
F = F1F2
At Sa At Sb
E
F1 = loc=Sa(E) F2 = loc=Sb(E)
primary horizontal fragmentation
Dr Mohamed Osman Hegazi
Loc=SA sal lt 10
Loc=SA
sal 10
Loc=SB sal lt 10
Loc=SB
sal 10
F1
F3F2
Qa Select hellip loc = SA
Qb Select hellip loc = SB
Prefer F2 to F1 and F3
Dr Mohamed Osman Hegazi
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(
(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(
Horizontal Fragmentation Peer to peer relationship ndash brothers
Dr Mohamed Osman Hegazi
Vertical fragmentation
E1
NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip
NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip
Sal5 107 258 15hellip
E
E2
Example
R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T
Just like normalization of relations
Dr Mohamed Osman Hegazi
Vertical Fragmentation example
PROJ1 information about project budgets
PROJ2 information about project names and locations
PNO BUDGET
P1 150000
P3 250000P2 135000
P4 310000P5 500000
PNO PNAME LOC
P1 Instrumentation Montreal
P3 CADCAM New YorkP2 Database DevelopNew York
P4 Maintenance ParisP5 CADCAM Boston
PROJ1 PROJ2
New YorkNew York
PROJ
PNO PNAME BUDGET LOC
P1 Instrumentation 150000 Montreal
P3 CADCAM 250000P2 Database Develop135000
P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston
New YorkNew York
Dr Mohamed Osman Hegazi
E1(NMLOC)E2(SAL)
ExampleE(NMLOCSAL) E1(NM)
E2(LOC)E3(SAL)
Which is the right vertical fragmentationhellip
Grouping Attributes
Dr Mohamed Osman Hegazi
Vertical Fragmentation branch relationship ndash parents and son
ΩήϓϷϥϮΌη
ΕΎΒΗή
Ϥϟ
(Sal
ary
allo
wan
ces
Tax
(
ΎϨϴϴόΘ
ϟΕ
(N
ame ad
dre
ss g
rade(
Dr Mohamed Osman Hegazi
Hybrid Fragmentation
R
HFHF
R1
VF VFVFVFVF
R11 R12 R21 R22 R23
R2
Dr Mohamed Osman Hegazi
AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)
Site a
Site b
Fragment E
Do we replicate fragments
Where do we place each copy of each fragment
Site c
F1F1
F2
Dr Mohamed Osman Hegazi
Allocation Alternativesbull Non-replicated
ndash partitioned each fragment resides at only one site
bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the
sitesbull Rule
If replication is advantageous
otherwise replication may cause problems
read - only queriesupdate queries
1
Dr Mohamed Osman Hegazi
Optimization problem
bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash
bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash
Very hard problem
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
DDBMS Advantages
bull Data are located near ldquogreatest demandrdquo sitebull Faster data accessbull Faster data processingbull Growth facilitationbull Improved communicationsbull Reduced operating costsbull User-friendly interfacebull Less danger of a single-point failurebull Processor independence
Dr Mohamed Osman Hegazi
DDBMS Disadvantages
bull Complexity of management and controlbull Securitybull Lack of standardsbull Increased storage requirementsbull Greater difficulty in managing the data
environmentbull Increased training cost
Dr Mohamed Osman Hegazi
The concept of DDBA DDBS is not a collection of files that can be individually stored at each node of computer network To form a DDBS files should not only be logically related but there should be structure among the files and access should be via a common interface
Dr Mohamed Osman Hegazi
Distributed Database Management Systems
Dr Mohamed Osman Hegazi
An ExampleEMP(ENO ENAME TITLE)
ASG(ENO PNO DUR RESP)
PROJ(PNO PNAME BUDGET)PAY(TITLESAL)
Dr Mohamed Osman Hegazi
Distributed Query
bullIf these table is stored in one place then we can ldquofor examplerdquo using the following query to get the name and the salary of the employee who works more
than 12 months SELECT ENAME SALFROM EMP ASG PAYWHERE ASG DUR gt12AND EMPENO=ASGENOAND PAYTITLE=EMPTITLE
But if these table are distributed over deferent site then the execution of this query needs allot of process to be done DDMS do this process and let the end user feel like databasersquos only user (transparence)
Dr Mohamed Osman Hegazi
bullThe concepts of DDB is to fragment the data and store each fragment on its site
bullData may be replicated on different site (replication)
bullDDBMS hide these details from the user and makes the distribution transparent to the users Distributed Database Transparency FeaturesDistribution transparency Transaction transparencyFailure transparency Performance transparency Heterogeneity transparency
Distributed Database Transparency
Dr Mohamed Osman Hegazi
Distributed DB Design
Top-down approach bull have a databasebull how to split and allocate to individual sitesTwo issues in top-down designFragmentationAllocation
Multi-databases (or bottom-up)bull combine existing databasesbull how to deal with heterogeneity amp autonomy
Dr Mohamed Osman Hegazi
Fragmentationbull Horizontal Primary
depends on local attributesR Derived
depends on foreign relation
bull Vertical
R
Dr Mohamed Osman Hegazi
Example
Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select
from E from Ewhere loc=Sa where loc=Sbandhellip and
Motivation Two sites Sa Sb Qa Qb
Sa Sb
Dr Mohamed Osman Hegazi
Name Loc Sal578
Sa 10Sally Sb 25Tom Sa 15
Joe
58
Sa 10Tom Sa 15Joe 7 Sb 25Sally
F = F1F2
At Sa At Sb
E
F1 = loc=Sa(E) F2 = loc=Sb(E)
primary horizontal fragmentation
Dr Mohamed Osman Hegazi
Loc=SA sal lt 10
Loc=SA
sal 10
Loc=SB sal lt 10
Loc=SB
sal 10
F1
F3F2
Qa Select hellip loc = SA
Qb Select hellip loc = SB
Prefer F2 to F1 and F3
Dr Mohamed Osman Hegazi
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(
(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(
Horizontal Fragmentation Peer to peer relationship ndash brothers
Dr Mohamed Osman Hegazi
Vertical fragmentation
E1
NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip
NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip
Sal5 107 258 15hellip
E
E2
Example
R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T
Just like normalization of relations
Dr Mohamed Osman Hegazi
Vertical Fragmentation example
PROJ1 information about project budgets
PROJ2 information about project names and locations
PNO BUDGET
P1 150000
P3 250000P2 135000
P4 310000P5 500000
PNO PNAME LOC
P1 Instrumentation Montreal
P3 CADCAM New YorkP2 Database DevelopNew York
P4 Maintenance ParisP5 CADCAM Boston
PROJ1 PROJ2
New YorkNew York
PROJ
PNO PNAME BUDGET LOC
P1 Instrumentation 150000 Montreal
P3 CADCAM 250000P2 Database Develop135000
P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston
New YorkNew York
Dr Mohamed Osman Hegazi
E1(NMLOC)E2(SAL)
ExampleE(NMLOCSAL) E1(NM)
E2(LOC)E3(SAL)
Which is the right vertical fragmentationhellip
Grouping Attributes
Dr Mohamed Osman Hegazi
Vertical Fragmentation branch relationship ndash parents and son
ΩήϓϷϥϮΌη
ΕΎΒΗή
Ϥϟ
(Sal
ary
allo
wan
ces
Tax
(
ΎϨϴϴόΘ
ϟΕ
(N
ame ad
dre
ss g
rade(
Dr Mohamed Osman Hegazi
Hybrid Fragmentation
R
HFHF
R1
VF VFVFVFVF
R11 R12 R21 R22 R23
R2
Dr Mohamed Osman Hegazi
AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)
Site a
Site b
Fragment E
Do we replicate fragments
Where do we place each copy of each fragment
Site c
F1F1
F2
Dr Mohamed Osman Hegazi
Allocation Alternativesbull Non-replicated
ndash partitioned each fragment resides at only one site
bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the
sitesbull Rule
If replication is advantageous
otherwise replication may cause problems
read - only queriesupdate queries
1
Dr Mohamed Osman Hegazi
Optimization problem
bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash
bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash
Very hard problem
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
DDBMS Disadvantages
bull Complexity of management and controlbull Securitybull Lack of standardsbull Increased storage requirementsbull Greater difficulty in managing the data
environmentbull Increased training cost
Dr Mohamed Osman Hegazi
The concept of DDBA DDBS is not a collection of files that can be individually stored at each node of computer network To form a DDBS files should not only be logically related but there should be structure among the files and access should be via a common interface
Dr Mohamed Osman Hegazi
Distributed Database Management Systems
Dr Mohamed Osman Hegazi
An ExampleEMP(ENO ENAME TITLE)
ASG(ENO PNO DUR RESP)
PROJ(PNO PNAME BUDGET)PAY(TITLESAL)
Dr Mohamed Osman Hegazi
Distributed Query
bullIf these table is stored in one place then we can ldquofor examplerdquo using the following query to get the name and the salary of the employee who works more
than 12 months SELECT ENAME SALFROM EMP ASG PAYWHERE ASG DUR gt12AND EMPENO=ASGENOAND PAYTITLE=EMPTITLE
But if these table are distributed over deferent site then the execution of this query needs allot of process to be done DDMS do this process and let the end user feel like databasersquos only user (transparence)
Dr Mohamed Osman Hegazi
bullThe concepts of DDB is to fragment the data and store each fragment on its site
bullData may be replicated on different site (replication)
bullDDBMS hide these details from the user and makes the distribution transparent to the users Distributed Database Transparency FeaturesDistribution transparency Transaction transparencyFailure transparency Performance transparency Heterogeneity transparency
Distributed Database Transparency
Dr Mohamed Osman Hegazi
Distributed DB Design
Top-down approach bull have a databasebull how to split and allocate to individual sitesTwo issues in top-down designFragmentationAllocation
Multi-databases (or bottom-up)bull combine existing databasesbull how to deal with heterogeneity amp autonomy
Dr Mohamed Osman Hegazi
Fragmentationbull Horizontal Primary
depends on local attributesR Derived
depends on foreign relation
bull Vertical
R
Dr Mohamed Osman Hegazi
Example
Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select
from E from Ewhere loc=Sa where loc=Sbandhellip and
Motivation Two sites Sa Sb Qa Qb
Sa Sb
Dr Mohamed Osman Hegazi
Name Loc Sal578
Sa 10Sally Sb 25Tom Sa 15
Joe
58
Sa 10Tom Sa 15Joe 7 Sb 25Sally
F = F1F2
At Sa At Sb
E
F1 = loc=Sa(E) F2 = loc=Sb(E)
primary horizontal fragmentation
Dr Mohamed Osman Hegazi
Loc=SA sal lt 10
Loc=SA
sal 10
Loc=SB sal lt 10
Loc=SB
sal 10
F1
F3F2
Qa Select hellip loc = SA
Qb Select hellip loc = SB
Prefer F2 to F1 and F3
Dr Mohamed Osman Hegazi
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(
(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(
Horizontal Fragmentation Peer to peer relationship ndash brothers
Dr Mohamed Osman Hegazi
Vertical fragmentation
E1
NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip
NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip
Sal5 107 258 15hellip
E
E2
Example
R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T
Just like normalization of relations
Dr Mohamed Osman Hegazi
Vertical Fragmentation example
PROJ1 information about project budgets
PROJ2 information about project names and locations
PNO BUDGET
P1 150000
P3 250000P2 135000
P4 310000P5 500000
PNO PNAME LOC
P1 Instrumentation Montreal
P3 CADCAM New YorkP2 Database DevelopNew York
P4 Maintenance ParisP5 CADCAM Boston
PROJ1 PROJ2
New YorkNew York
PROJ
PNO PNAME BUDGET LOC
P1 Instrumentation 150000 Montreal
P3 CADCAM 250000P2 Database Develop135000
P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston
New YorkNew York
Dr Mohamed Osman Hegazi
E1(NMLOC)E2(SAL)
ExampleE(NMLOCSAL) E1(NM)
E2(LOC)E3(SAL)
Which is the right vertical fragmentationhellip
Grouping Attributes
Dr Mohamed Osman Hegazi
Vertical Fragmentation branch relationship ndash parents and son
ΩήϓϷϥϮΌη
ΕΎΒΗή
Ϥϟ
(Sal
ary
allo
wan
ces
Tax
(
ΎϨϴϴόΘ
ϟΕ
(N
ame ad
dre
ss g
rade(
Dr Mohamed Osman Hegazi
Hybrid Fragmentation
R
HFHF
R1
VF VFVFVFVF
R11 R12 R21 R22 R23
R2
Dr Mohamed Osman Hegazi
AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)
Site a
Site b
Fragment E
Do we replicate fragments
Where do we place each copy of each fragment
Site c
F1F1
F2
Dr Mohamed Osman Hegazi
Allocation Alternativesbull Non-replicated
ndash partitioned each fragment resides at only one site
bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the
sitesbull Rule
If replication is advantageous
otherwise replication may cause problems
read - only queriesupdate queries
1
Dr Mohamed Osman Hegazi
Optimization problem
bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash
bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash
Very hard problem
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
The concept of DDBA DDBS is not a collection of files that can be individually stored at each node of computer network To form a DDBS files should not only be logically related but there should be structure among the files and access should be via a common interface
Dr Mohamed Osman Hegazi
Distributed Database Management Systems
Dr Mohamed Osman Hegazi
An ExampleEMP(ENO ENAME TITLE)
ASG(ENO PNO DUR RESP)
PROJ(PNO PNAME BUDGET)PAY(TITLESAL)
Dr Mohamed Osman Hegazi
Distributed Query
bullIf these table is stored in one place then we can ldquofor examplerdquo using the following query to get the name and the salary of the employee who works more
than 12 months SELECT ENAME SALFROM EMP ASG PAYWHERE ASG DUR gt12AND EMPENO=ASGENOAND PAYTITLE=EMPTITLE
But if these table are distributed over deferent site then the execution of this query needs allot of process to be done DDMS do this process and let the end user feel like databasersquos only user (transparence)
Dr Mohamed Osman Hegazi
bullThe concepts of DDB is to fragment the data and store each fragment on its site
bullData may be replicated on different site (replication)
bullDDBMS hide these details from the user and makes the distribution transparent to the users Distributed Database Transparency FeaturesDistribution transparency Transaction transparencyFailure transparency Performance transparency Heterogeneity transparency
Distributed Database Transparency
Dr Mohamed Osman Hegazi
Distributed DB Design
Top-down approach bull have a databasebull how to split and allocate to individual sitesTwo issues in top-down designFragmentationAllocation
Multi-databases (or bottom-up)bull combine existing databasesbull how to deal with heterogeneity amp autonomy
Dr Mohamed Osman Hegazi
Fragmentationbull Horizontal Primary
depends on local attributesR Derived
depends on foreign relation
bull Vertical
R
Dr Mohamed Osman Hegazi
Example
Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select
from E from Ewhere loc=Sa where loc=Sbandhellip and
Motivation Two sites Sa Sb Qa Qb
Sa Sb
Dr Mohamed Osman Hegazi
Name Loc Sal578
Sa 10Sally Sb 25Tom Sa 15
Joe
58
Sa 10Tom Sa 15Joe 7 Sb 25Sally
F = F1F2
At Sa At Sb
E
F1 = loc=Sa(E) F2 = loc=Sb(E)
primary horizontal fragmentation
Dr Mohamed Osman Hegazi
Loc=SA sal lt 10
Loc=SA
sal 10
Loc=SB sal lt 10
Loc=SB
sal 10
F1
F3F2
Qa Select hellip loc = SA
Qb Select hellip loc = SB
Prefer F2 to F1 and F3
Dr Mohamed Osman Hegazi
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(
(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(
Horizontal Fragmentation Peer to peer relationship ndash brothers
Dr Mohamed Osman Hegazi
Vertical fragmentation
E1
NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip
NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip
Sal5 107 258 15hellip
E
E2
Example
R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T
Just like normalization of relations
Dr Mohamed Osman Hegazi
Vertical Fragmentation example
PROJ1 information about project budgets
PROJ2 information about project names and locations
PNO BUDGET
P1 150000
P3 250000P2 135000
P4 310000P5 500000
PNO PNAME LOC
P1 Instrumentation Montreal
P3 CADCAM New YorkP2 Database DevelopNew York
P4 Maintenance ParisP5 CADCAM Boston
PROJ1 PROJ2
New YorkNew York
PROJ
PNO PNAME BUDGET LOC
P1 Instrumentation 150000 Montreal
P3 CADCAM 250000P2 Database Develop135000
P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston
New YorkNew York
Dr Mohamed Osman Hegazi
E1(NMLOC)E2(SAL)
ExampleE(NMLOCSAL) E1(NM)
E2(LOC)E3(SAL)
Which is the right vertical fragmentationhellip
Grouping Attributes
Dr Mohamed Osman Hegazi
Vertical Fragmentation branch relationship ndash parents and son
ΩήϓϷϥϮΌη
ΕΎΒΗή
Ϥϟ
(Sal
ary
allo
wan
ces
Tax
(
ΎϨϴϴόΘ
ϟΕ
(N
ame ad
dre
ss g
rade(
Dr Mohamed Osman Hegazi
Hybrid Fragmentation
R
HFHF
R1
VF VFVFVFVF
R11 R12 R21 R22 R23
R2
Dr Mohamed Osman Hegazi
AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)
Site a
Site b
Fragment E
Do we replicate fragments
Where do we place each copy of each fragment
Site c
F1F1
F2
Dr Mohamed Osman Hegazi
Allocation Alternativesbull Non-replicated
ndash partitioned each fragment resides at only one site
bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the
sitesbull Rule
If replication is advantageous
otherwise replication may cause problems
read - only queriesupdate queries
1
Dr Mohamed Osman Hegazi
Optimization problem
bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash
bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash
Very hard problem
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
Distributed Database Management Systems
Dr Mohamed Osman Hegazi
An ExampleEMP(ENO ENAME TITLE)
ASG(ENO PNO DUR RESP)
PROJ(PNO PNAME BUDGET)PAY(TITLESAL)
Dr Mohamed Osman Hegazi
Distributed Query
bullIf these table is stored in one place then we can ldquofor examplerdquo using the following query to get the name and the salary of the employee who works more
than 12 months SELECT ENAME SALFROM EMP ASG PAYWHERE ASG DUR gt12AND EMPENO=ASGENOAND PAYTITLE=EMPTITLE
But if these table are distributed over deferent site then the execution of this query needs allot of process to be done DDMS do this process and let the end user feel like databasersquos only user (transparence)
Dr Mohamed Osman Hegazi
bullThe concepts of DDB is to fragment the data and store each fragment on its site
bullData may be replicated on different site (replication)
bullDDBMS hide these details from the user and makes the distribution transparent to the users Distributed Database Transparency FeaturesDistribution transparency Transaction transparencyFailure transparency Performance transparency Heterogeneity transparency
Distributed Database Transparency
Dr Mohamed Osman Hegazi
Distributed DB Design
Top-down approach bull have a databasebull how to split and allocate to individual sitesTwo issues in top-down designFragmentationAllocation
Multi-databases (or bottom-up)bull combine existing databasesbull how to deal with heterogeneity amp autonomy
Dr Mohamed Osman Hegazi
Fragmentationbull Horizontal Primary
depends on local attributesR Derived
depends on foreign relation
bull Vertical
R
Dr Mohamed Osman Hegazi
Example
Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select
from E from Ewhere loc=Sa where loc=Sbandhellip and
Motivation Two sites Sa Sb Qa Qb
Sa Sb
Dr Mohamed Osman Hegazi
Name Loc Sal578
Sa 10Sally Sb 25Tom Sa 15
Joe
58
Sa 10Tom Sa 15Joe 7 Sb 25Sally
F = F1F2
At Sa At Sb
E
F1 = loc=Sa(E) F2 = loc=Sb(E)
primary horizontal fragmentation
Dr Mohamed Osman Hegazi
Loc=SA sal lt 10
Loc=SA
sal 10
Loc=SB sal lt 10
Loc=SB
sal 10
F1
F3F2
Qa Select hellip loc = SA
Qb Select hellip loc = SB
Prefer F2 to F1 and F3
Dr Mohamed Osman Hegazi
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(
(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(
Horizontal Fragmentation Peer to peer relationship ndash brothers
Dr Mohamed Osman Hegazi
Vertical fragmentation
E1
NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip
NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip
Sal5 107 258 15hellip
E
E2
Example
R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T
Just like normalization of relations
Dr Mohamed Osman Hegazi
Vertical Fragmentation example
PROJ1 information about project budgets
PROJ2 information about project names and locations
PNO BUDGET
P1 150000
P3 250000P2 135000
P4 310000P5 500000
PNO PNAME LOC
P1 Instrumentation Montreal
P3 CADCAM New YorkP2 Database DevelopNew York
P4 Maintenance ParisP5 CADCAM Boston
PROJ1 PROJ2
New YorkNew York
PROJ
PNO PNAME BUDGET LOC
P1 Instrumentation 150000 Montreal
P3 CADCAM 250000P2 Database Develop135000
P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston
New YorkNew York
Dr Mohamed Osman Hegazi
E1(NMLOC)E2(SAL)
ExampleE(NMLOCSAL) E1(NM)
E2(LOC)E3(SAL)
Which is the right vertical fragmentationhellip
Grouping Attributes
Dr Mohamed Osman Hegazi
Vertical Fragmentation branch relationship ndash parents and son
ΩήϓϷϥϮΌη
ΕΎΒΗή
Ϥϟ
(Sal
ary
allo
wan
ces
Tax
(
ΎϨϴϴόΘ
ϟΕ
(N
ame ad
dre
ss g
rade(
Dr Mohamed Osman Hegazi
Hybrid Fragmentation
R
HFHF
R1
VF VFVFVFVF
R11 R12 R21 R22 R23
R2
Dr Mohamed Osman Hegazi
AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)
Site a
Site b
Fragment E
Do we replicate fragments
Where do we place each copy of each fragment
Site c
F1F1
F2
Dr Mohamed Osman Hegazi
Allocation Alternativesbull Non-replicated
ndash partitioned each fragment resides at only one site
bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the
sitesbull Rule
If replication is advantageous
otherwise replication may cause problems
read - only queriesupdate queries
1
Dr Mohamed Osman Hegazi
Optimization problem
bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash
bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash
Very hard problem
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
An ExampleEMP(ENO ENAME TITLE)
ASG(ENO PNO DUR RESP)
PROJ(PNO PNAME BUDGET)PAY(TITLESAL)
Dr Mohamed Osman Hegazi
Distributed Query
bullIf these table is stored in one place then we can ldquofor examplerdquo using the following query to get the name and the salary of the employee who works more
than 12 months SELECT ENAME SALFROM EMP ASG PAYWHERE ASG DUR gt12AND EMPENO=ASGENOAND PAYTITLE=EMPTITLE
But if these table are distributed over deferent site then the execution of this query needs allot of process to be done DDMS do this process and let the end user feel like databasersquos only user (transparence)
Dr Mohamed Osman Hegazi
bullThe concepts of DDB is to fragment the data and store each fragment on its site
bullData may be replicated on different site (replication)
bullDDBMS hide these details from the user and makes the distribution transparent to the users Distributed Database Transparency FeaturesDistribution transparency Transaction transparencyFailure transparency Performance transparency Heterogeneity transparency
Distributed Database Transparency
Dr Mohamed Osman Hegazi
Distributed DB Design
Top-down approach bull have a databasebull how to split and allocate to individual sitesTwo issues in top-down designFragmentationAllocation
Multi-databases (or bottom-up)bull combine existing databasesbull how to deal with heterogeneity amp autonomy
Dr Mohamed Osman Hegazi
Fragmentationbull Horizontal Primary
depends on local attributesR Derived
depends on foreign relation
bull Vertical
R
Dr Mohamed Osman Hegazi
Example
Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select
from E from Ewhere loc=Sa where loc=Sbandhellip and
Motivation Two sites Sa Sb Qa Qb
Sa Sb
Dr Mohamed Osman Hegazi
Name Loc Sal578
Sa 10Sally Sb 25Tom Sa 15
Joe
58
Sa 10Tom Sa 15Joe 7 Sb 25Sally
F = F1F2
At Sa At Sb
E
F1 = loc=Sa(E) F2 = loc=Sb(E)
primary horizontal fragmentation
Dr Mohamed Osman Hegazi
Loc=SA sal lt 10
Loc=SA
sal 10
Loc=SB sal lt 10
Loc=SB
sal 10
F1
F3F2
Qa Select hellip loc = SA
Qb Select hellip loc = SB
Prefer F2 to F1 and F3
Dr Mohamed Osman Hegazi
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(
(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(
Horizontal Fragmentation Peer to peer relationship ndash brothers
Dr Mohamed Osman Hegazi
Vertical fragmentation
E1
NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip
NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip
Sal5 107 258 15hellip
E
E2
Example
R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T
Just like normalization of relations
Dr Mohamed Osman Hegazi
Vertical Fragmentation example
PROJ1 information about project budgets
PROJ2 information about project names and locations
PNO BUDGET
P1 150000
P3 250000P2 135000
P4 310000P5 500000
PNO PNAME LOC
P1 Instrumentation Montreal
P3 CADCAM New YorkP2 Database DevelopNew York
P4 Maintenance ParisP5 CADCAM Boston
PROJ1 PROJ2
New YorkNew York
PROJ
PNO PNAME BUDGET LOC
P1 Instrumentation 150000 Montreal
P3 CADCAM 250000P2 Database Develop135000
P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston
New YorkNew York
Dr Mohamed Osman Hegazi
E1(NMLOC)E2(SAL)
ExampleE(NMLOCSAL) E1(NM)
E2(LOC)E3(SAL)
Which is the right vertical fragmentationhellip
Grouping Attributes
Dr Mohamed Osman Hegazi
Vertical Fragmentation branch relationship ndash parents and son
ΩήϓϷϥϮΌη
ΕΎΒΗή
Ϥϟ
(Sal
ary
allo
wan
ces
Tax
(
ΎϨϴϴόΘ
ϟΕ
(N
ame ad
dre
ss g
rade(
Dr Mohamed Osman Hegazi
Hybrid Fragmentation
R
HFHF
R1
VF VFVFVFVF
R11 R12 R21 R22 R23
R2
Dr Mohamed Osman Hegazi
AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)
Site a
Site b
Fragment E
Do we replicate fragments
Where do we place each copy of each fragment
Site c
F1F1
F2
Dr Mohamed Osman Hegazi
Allocation Alternativesbull Non-replicated
ndash partitioned each fragment resides at only one site
bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the
sitesbull Rule
If replication is advantageous
otherwise replication may cause problems
read - only queriesupdate queries
1
Dr Mohamed Osman Hegazi
Optimization problem
bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash
bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash
Very hard problem
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
Distributed Query
bullIf these table is stored in one place then we can ldquofor examplerdquo using the following query to get the name and the salary of the employee who works more
than 12 months SELECT ENAME SALFROM EMP ASG PAYWHERE ASG DUR gt12AND EMPENO=ASGENOAND PAYTITLE=EMPTITLE
But if these table are distributed over deferent site then the execution of this query needs allot of process to be done DDMS do this process and let the end user feel like databasersquos only user (transparence)
Dr Mohamed Osman Hegazi
bullThe concepts of DDB is to fragment the data and store each fragment on its site
bullData may be replicated on different site (replication)
bullDDBMS hide these details from the user and makes the distribution transparent to the users Distributed Database Transparency FeaturesDistribution transparency Transaction transparencyFailure transparency Performance transparency Heterogeneity transparency
Distributed Database Transparency
Dr Mohamed Osman Hegazi
Distributed DB Design
Top-down approach bull have a databasebull how to split and allocate to individual sitesTwo issues in top-down designFragmentationAllocation
Multi-databases (or bottom-up)bull combine existing databasesbull how to deal with heterogeneity amp autonomy
Dr Mohamed Osman Hegazi
Fragmentationbull Horizontal Primary
depends on local attributesR Derived
depends on foreign relation
bull Vertical
R
Dr Mohamed Osman Hegazi
Example
Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select
from E from Ewhere loc=Sa where loc=Sbandhellip and
Motivation Two sites Sa Sb Qa Qb
Sa Sb
Dr Mohamed Osman Hegazi
Name Loc Sal578
Sa 10Sally Sb 25Tom Sa 15
Joe
58
Sa 10Tom Sa 15Joe 7 Sb 25Sally
F = F1F2
At Sa At Sb
E
F1 = loc=Sa(E) F2 = loc=Sb(E)
primary horizontal fragmentation
Dr Mohamed Osman Hegazi
Loc=SA sal lt 10
Loc=SA
sal 10
Loc=SB sal lt 10
Loc=SB
sal 10
F1
F3F2
Qa Select hellip loc = SA
Qb Select hellip loc = SB
Prefer F2 to F1 and F3
Dr Mohamed Osman Hegazi
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(
(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(
Horizontal Fragmentation Peer to peer relationship ndash brothers
Dr Mohamed Osman Hegazi
Vertical fragmentation
E1
NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip
NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip
Sal5 107 258 15hellip
E
E2
Example
R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T
Just like normalization of relations
Dr Mohamed Osman Hegazi
Vertical Fragmentation example
PROJ1 information about project budgets
PROJ2 information about project names and locations
PNO BUDGET
P1 150000
P3 250000P2 135000
P4 310000P5 500000
PNO PNAME LOC
P1 Instrumentation Montreal
P3 CADCAM New YorkP2 Database DevelopNew York
P4 Maintenance ParisP5 CADCAM Boston
PROJ1 PROJ2
New YorkNew York
PROJ
PNO PNAME BUDGET LOC
P1 Instrumentation 150000 Montreal
P3 CADCAM 250000P2 Database Develop135000
P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston
New YorkNew York
Dr Mohamed Osman Hegazi
E1(NMLOC)E2(SAL)
ExampleE(NMLOCSAL) E1(NM)
E2(LOC)E3(SAL)
Which is the right vertical fragmentationhellip
Grouping Attributes
Dr Mohamed Osman Hegazi
Vertical Fragmentation branch relationship ndash parents and son
ΩήϓϷϥϮΌη
ΕΎΒΗή
Ϥϟ
(Sal
ary
allo
wan
ces
Tax
(
ΎϨϴϴόΘ
ϟΕ
(N
ame ad
dre
ss g
rade(
Dr Mohamed Osman Hegazi
Hybrid Fragmentation
R
HFHF
R1
VF VFVFVFVF
R11 R12 R21 R22 R23
R2
Dr Mohamed Osman Hegazi
AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)
Site a
Site b
Fragment E
Do we replicate fragments
Where do we place each copy of each fragment
Site c
F1F1
F2
Dr Mohamed Osman Hegazi
Allocation Alternativesbull Non-replicated
ndash partitioned each fragment resides at only one site
bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the
sitesbull Rule
If replication is advantageous
otherwise replication may cause problems
read - only queriesupdate queries
1
Dr Mohamed Osman Hegazi
Optimization problem
bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash
bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash
Very hard problem
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
bullThe concepts of DDB is to fragment the data and store each fragment on its site
bullData may be replicated on different site (replication)
bullDDBMS hide these details from the user and makes the distribution transparent to the users Distributed Database Transparency FeaturesDistribution transparency Transaction transparencyFailure transparency Performance transparency Heterogeneity transparency
Distributed Database Transparency
Dr Mohamed Osman Hegazi
Distributed DB Design
Top-down approach bull have a databasebull how to split and allocate to individual sitesTwo issues in top-down designFragmentationAllocation
Multi-databases (or bottom-up)bull combine existing databasesbull how to deal with heterogeneity amp autonomy
Dr Mohamed Osman Hegazi
Fragmentationbull Horizontal Primary
depends on local attributesR Derived
depends on foreign relation
bull Vertical
R
Dr Mohamed Osman Hegazi
Example
Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select
from E from Ewhere loc=Sa where loc=Sbandhellip and
Motivation Two sites Sa Sb Qa Qb
Sa Sb
Dr Mohamed Osman Hegazi
Name Loc Sal578
Sa 10Sally Sb 25Tom Sa 15
Joe
58
Sa 10Tom Sa 15Joe 7 Sb 25Sally
F = F1F2
At Sa At Sb
E
F1 = loc=Sa(E) F2 = loc=Sb(E)
primary horizontal fragmentation
Dr Mohamed Osman Hegazi
Loc=SA sal lt 10
Loc=SA
sal 10
Loc=SB sal lt 10
Loc=SB
sal 10
F1
F3F2
Qa Select hellip loc = SA
Qb Select hellip loc = SB
Prefer F2 to F1 and F3
Dr Mohamed Osman Hegazi
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(
(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(
Horizontal Fragmentation Peer to peer relationship ndash brothers
Dr Mohamed Osman Hegazi
Vertical fragmentation
E1
NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip
NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip
Sal5 107 258 15hellip
E
E2
Example
R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T
Just like normalization of relations
Dr Mohamed Osman Hegazi
Vertical Fragmentation example
PROJ1 information about project budgets
PROJ2 information about project names and locations
PNO BUDGET
P1 150000
P3 250000P2 135000
P4 310000P5 500000
PNO PNAME LOC
P1 Instrumentation Montreal
P3 CADCAM New YorkP2 Database DevelopNew York
P4 Maintenance ParisP5 CADCAM Boston
PROJ1 PROJ2
New YorkNew York
PROJ
PNO PNAME BUDGET LOC
P1 Instrumentation 150000 Montreal
P3 CADCAM 250000P2 Database Develop135000
P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston
New YorkNew York
Dr Mohamed Osman Hegazi
E1(NMLOC)E2(SAL)
ExampleE(NMLOCSAL) E1(NM)
E2(LOC)E3(SAL)
Which is the right vertical fragmentationhellip
Grouping Attributes
Dr Mohamed Osman Hegazi
Vertical Fragmentation branch relationship ndash parents and son
ΩήϓϷϥϮΌη
ΕΎΒΗή
Ϥϟ
(Sal
ary
allo
wan
ces
Tax
(
ΎϨϴϴόΘ
ϟΕ
(N
ame ad
dre
ss g
rade(
Dr Mohamed Osman Hegazi
Hybrid Fragmentation
R
HFHF
R1
VF VFVFVFVF
R11 R12 R21 R22 R23
R2
Dr Mohamed Osman Hegazi
AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)
Site a
Site b
Fragment E
Do we replicate fragments
Where do we place each copy of each fragment
Site c
F1F1
F2
Dr Mohamed Osman Hegazi
Allocation Alternativesbull Non-replicated
ndash partitioned each fragment resides at only one site
bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the
sitesbull Rule
If replication is advantageous
otherwise replication may cause problems
read - only queriesupdate queries
1
Dr Mohamed Osman Hegazi
Optimization problem
bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash
bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash
Very hard problem
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
Distributed DB Design
Top-down approach bull have a databasebull how to split and allocate to individual sitesTwo issues in top-down designFragmentationAllocation
Multi-databases (or bottom-up)bull combine existing databasesbull how to deal with heterogeneity amp autonomy
Dr Mohamed Osman Hegazi
Fragmentationbull Horizontal Primary
depends on local attributesR Derived
depends on foreign relation
bull Vertical
R
Dr Mohamed Osman Hegazi
Example
Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select
from E from Ewhere loc=Sa where loc=Sbandhellip and
Motivation Two sites Sa Sb Qa Qb
Sa Sb
Dr Mohamed Osman Hegazi
Name Loc Sal578
Sa 10Sally Sb 25Tom Sa 15
Joe
58
Sa 10Tom Sa 15Joe 7 Sb 25Sally
F = F1F2
At Sa At Sb
E
F1 = loc=Sa(E) F2 = loc=Sb(E)
primary horizontal fragmentation
Dr Mohamed Osman Hegazi
Loc=SA sal lt 10
Loc=SA
sal 10
Loc=SB sal lt 10
Loc=SB
sal 10
F1
F3F2
Qa Select hellip loc = SA
Qb Select hellip loc = SB
Prefer F2 to F1 and F3
Dr Mohamed Osman Hegazi
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(
(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(
Horizontal Fragmentation Peer to peer relationship ndash brothers
Dr Mohamed Osman Hegazi
Vertical fragmentation
E1
NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip
NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip
Sal5 107 258 15hellip
E
E2
Example
R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T
Just like normalization of relations
Dr Mohamed Osman Hegazi
Vertical Fragmentation example
PROJ1 information about project budgets
PROJ2 information about project names and locations
PNO BUDGET
P1 150000
P3 250000P2 135000
P4 310000P5 500000
PNO PNAME LOC
P1 Instrumentation Montreal
P3 CADCAM New YorkP2 Database DevelopNew York
P4 Maintenance ParisP5 CADCAM Boston
PROJ1 PROJ2
New YorkNew York
PROJ
PNO PNAME BUDGET LOC
P1 Instrumentation 150000 Montreal
P3 CADCAM 250000P2 Database Develop135000
P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston
New YorkNew York
Dr Mohamed Osman Hegazi
E1(NMLOC)E2(SAL)
ExampleE(NMLOCSAL) E1(NM)
E2(LOC)E3(SAL)
Which is the right vertical fragmentationhellip
Grouping Attributes
Dr Mohamed Osman Hegazi
Vertical Fragmentation branch relationship ndash parents and son
ΩήϓϷϥϮΌη
ΕΎΒΗή
Ϥϟ
(Sal
ary
allo
wan
ces
Tax
(
ΎϨϴϴόΘ
ϟΕ
(N
ame ad
dre
ss g
rade(
Dr Mohamed Osman Hegazi
Hybrid Fragmentation
R
HFHF
R1
VF VFVFVFVF
R11 R12 R21 R22 R23
R2
Dr Mohamed Osman Hegazi
AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)
Site a
Site b
Fragment E
Do we replicate fragments
Where do we place each copy of each fragment
Site c
F1F1
F2
Dr Mohamed Osman Hegazi
Allocation Alternativesbull Non-replicated
ndash partitioned each fragment resides at only one site
bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the
sitesbull Rule
If replication is advantageous
otherwise replication may cause problems
read - only queriesupdate queries
1
Dr Mohamed Osman Hegazi
Optimization problem
bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash
bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash
Very hard problem
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
Fragmentationbull Horizontal Primary
depends on local attributesR Derived
depends on foreign relation
bull Vertical
R
Dr Mohamed Osman Hegazi
Example
Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select
from E from Ewhere loc=Sa where loc=Sbandhellip and
Motivation Two sites Sa Sb Qa Qb
Sa Sb
Dr Mohamed Osman Hegazi
Name Loc Sal578
Sa 10Sally Sb 25Tom Sa 15
Joe
58
Sa 10Tom Sa 15Joe 7 Sb 25Sally
F = F1F2
At Sa At Sb
E
F1 = loc=Sa(E) F2 = loc=Sb(E)
primary horizontal fragmentation
Dr Mohamed Osman Hegazi
Loc=SA sal lt 10
Loc=SA
sal 10
Loc=SB sal lt 10
Loc=SB
sal 10
F1
F3F2
Qa Select hellip loc = SA
Qb Select hellip loc = SB
Prefer F2 to F1 and F3
Dr Mohamed Osman Hegazi
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(
(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(
Horizontal Fragmentation Peer to peer relationship ndash brothers
Dr Mohamed Osman Hegazi
Vertical fragmentation
E1
NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip
NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip
Sal5 107 258 15hellip
E
E2
Example
R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T
Just like normalization of relations
Dr Mohamed Osman Hegazi
Vertical Fragmentation example
PROJ1 information about project budgets
PROJ2 information about project names and locations
PNO BUDGET
P1 150000
P3 250000P2 135000
P4 310000P5 500000
PNO PNAME LOC
P1 Instrumentation Montreal
P3 CADCAM New YorkP2 Database DevelopNew York
P4 Maintenance ParisP5 CADCAM Boston
PROJ1 PROJ2
New YorkNew York
PROJ
PNO PNAME BUDGET LOC
P1 Instrumentation 150000 Montreal
P3 CADCAM 250000P2 Database Develop135000
P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston
New YorkNew York
Dr Mohamed Osman Hegazi
E1(NMLOC)E2(SAL)
ExampleE(NMLOCSAL) E1(NM)
E2(LOC)E3(SAL)
Which is the right vertical fragmentationhellip
Grouping Attributes
Dr Mohamed Osman Hegazi
Vertical Fragmentation branch relationship ndash parents and son
ΩήϓϷϥϮΌη
ΕΎΒΗή
Ϥϟ
(Sal
ary
allo
wan
ces
Tax
(
ΎϨϴϴόΘ
ϟΕ
(N
ame ad
dre
ss g
rade(
Dr Mohamed Osman Hegazi
Hybrid Fragmentation
R
HFHF
R1
VF VFVFVFVF
R11 R12 R21 R22 R23
R2
Dr Mohamed Osman Hegazi
AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)
Site a
Site b
Fragment E
Do we replicate fragments
Where do we place each copy of each fragment
Site c
F1F1
F2
Dr Mohamed Osman Hegazi
Allocation Alternativesbull Non-replicated
ndash partitioned each fragment resides at only one site
bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the
sitesbull Rule
If replication is advantageous
otherwise replication may cause problems
read - only queriesupdate queries
1
Dr Mohamed Osman Hegazi
Optimization problem
bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash
bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash
Very hard problem
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
Example
Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select
from E from Ewhere loc=Sa where loc=Sbandhellip and
Motivation Two sites Sa Sb Qa Qb
Sa Sb
Dr Mohamed Osman Hegazi
Name Loc Sal578
Sa 10Sally Sb 25Tom Sa 15
Joe
58
Sa 10Tom Sa 15Joe 7 Sb 25Sally
F = F1F2
At Sa At Sb
E
F1 = loc=Sa(E) F2 = loc=Sb(E)
primary horizontal fragmentation
Dr Mohamed Osman Hegazi
Loc=SA sal lt 10
Loc=SA
sal 10
Loc=SB sal lt 10
Loc=SB
sal 10
F1
F3F2
Qa Select hellip loc = SA
Qb Select hellip loc = SB
Prefer F2 to F1 and F3
Dr Mohamed Osman Hegazi
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(
(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(
Horizontal Fragmentation Peer to peer relationship ndash brothers
Dr Mohamed Osman Hegazi
Vertical fragmentation
E1
NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip
NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip
Sal5 107 258 15hellip
E
E2
Example
R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T
Just like normalization of relations
Dr Mohamed Osman Hegazi
Vertical Fragmentation example
PROJ1 information about project budgets
PROJ2 information about project names and locations
PNO BUDGET
P1 150000
P3 250000P2 135000
P4 310000P5 500000
PNO PNAME LOC
P1 Instrumentation Montreal
P3 CADCAM New YorkP2 Database DevelopNew York
P4 Maintenance ParisP5 CADCAM Boston
PROJ1 PROJ2
New YorkNew York
PROJ
PNO PNAME BUDGET LOC
P1 Instrumentation 150000 Montreal
P3 CADCAM 250000P2 Database Develop135000
P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston
New YorkNew York
Dr Mohamed Osman Hegazi
E1(NMLOC)E2(SAL)
ExampleE(NMLOCSAL) E1(NM)
E2(LOC)E3(SAL)
Which is the right vertical fragmentationhellip
Grouping Attributes
Dr Mohamed Osman Hegazi
Vertical Fragmentation branch relationship ndash parents and son
ΩήϓϷϥϮΌη
ΕΎΒΗή
Ϥϟ
(Sal
ary
allo
wan
ces
Tax
(
ΎϨϴϴόΘ
ϟΕ
(N
ame ad
dre
ss g
rade(
Dr Mohamed Osman Hegazi
Hybrid Fragmentation
R
HFHF
R1
VF VFVFVFVF
R11 R12 R21 R22 R23
R2
Dr Mohamed Osman Hegazi
AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)
Site a
Site b
Fragment E
Do we replicate fragments
Where do we place each copy of each fragment
Site c
F1F1
F2
Dr Mohamed Osman Hegazi
Allocation Alternativesbull Non-replicated
ndash partitioned each fragment resides at only one site
bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the
sitesbull Rule
If replication is advantageous
otherwise replication may cause problems
read - only queriesupdate queries
1
Dr Mohamed Osman Hegazi
Optimization problem
bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash
bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash
Very hard problem
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
Name Loc Sal578
Sa 10Sally Sb 25Tom Sa 15
Joe
58
Sa 10Tom Sa 15Joe 7 Sb 25Sally
F = F1F2
At Sa At Sb
E
F1 = loc=Sa(E) F2 = loc=Sb(E)
primary horizontal fragmentation
Dr Mohamed Osman Hegazi
Loc=SA sal lt 10
Loc=SA
sal 10
Loc=SB sal lt 10
Loc=SB
sal 10
F1
F3F2
Qa Select hellip loc = SA
Qb Select hellip loc = SB
Prefer F2 to F1 and F3
Dr Mohamed Osman Hegazi
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(
(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(
Horizontal Fragmentation Peer to peer relationship ndash brothers
Dr Mohamed Osman Hegazi
Vertical fragmentation
E1
NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip
NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip
Sal5 107 258 15hellip
E
E2
Example
R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T
Just like normalization of relations
Dr Mohamed Osman Hegazi
Vertical Fragmentation example
PROJ1 information about project budgets
PROJ2 information about project names and locations
PNO BUDGET
P1 150000
P3 250000P2 135000
P4 310000P5 500000
PNO PNAME LOC
P1 Instrumentation Montreal
P3 CADCAM New YorkP2 Database DevelopNew York
P4 Maintenance ParisP5 CADCAM Boston
PROJ1 PROJ2
New YorkNew York
PROJ
PNO PNAME BUDGET LOC
P1 Instrumentation 150000 Montreal
P3 CADCAM 250000P2 Database Develop135000
P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston
New YorkNew York
Dr Mohamed Osman Hegazi
E1(NMLOC)E2(SAL)
ExampleE(NMLOCSAL) E1(NM)
E2(LOC)E3(SAL)
Which is the right vertical fragmentationhellip
Grouping Attributes
Dr Mohamed Osman Hegazi
Vertical Fragmentation branch relationship ndash parents and son
ΩήϓϷϥϮΌη
ΕΎΒΗή
Ϥϟ
(Sal
ary
allo
wan
ces
Tax
(
ΎϨϴϴόΘ
ϟΕ
(N
ame ad
dre
ss g
rade(
Dr Mohamed Osman Hegazi
Hybrid Fragmentation
R
HFHF
R1
VF VFVFVFVF
R11 R12 R21 R22 R23
R2
Dr Mohamed Osman Hegazi
AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)
Site a
Site b
Fragment E
Do we replicate fragments
Where do we place each copy of each fragment
Site c
F1F1
F2
Dr Mohamed Osman Hegazi
Allocation Alternativesbull Non-replicated
ndash partitioned each fragment resides at only one site
bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the
sitesbull Rule
If replication is advantageous
otherwise replication may cause problems
read - only queriesupdate queries
1
Dr Mohamed Osman Hegazi
Optimization problem
bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash
bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash
Very hard problem
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
Loc=SA sal lt 10
Loc=SA
sal 10
Loc=SB sal lt 10
Loc=SB
sal 10
F1
F3F2
Qa Select hellip loc = SA
Qb Select hellip loc = SB
Prefer F2 to F1 and F3
Dr Mohamed Osman Hegazi
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(
(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(
Horizontal Fragmentation Peer to peer relationship ndash brothers
Dr Mohamed Osman Hegazi
Vertical fragmentation
E1
NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip
NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip
Sal5 107 258 15hellip
E
E2
Example
R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T
Just like normalization of relations
Dr Mohamed Osman Hegazi
Vertical Fragmentation example
PROJ1 information about project budgets
PROJ2 information about project names and locations
PNO BUDGET
P1 150000
P3 250000P2 135000
P4 310000P5 500000
PNO PNAME LOC
P1 Instrumentation Montreal
P3 CADCAM New YorkP2 Database DevelopNew York
P4 Maintenance ParisP5 CADCAM Boston
PROJ1 PROJ2
New YorkNew York
PROJ
PNO PNAME BUDGET LOC
P1 Instrumentation 150000 Montreal
P3 CADCAM 250000P2 Database Develop135000
P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston
New YorkNew York
Dr Mohamed Osman Hegazi
E1(NMLOC)E2(SAL)
ExampleE(NMLOCSAL) E1(NM)
E2(LOC)E3(SAL)
Which is the right vertical fragmentationhellip
Grouping Attributes
Dr Mohamed Osman Hegazi
Vertical Fragmentation branch relationship ndash parents and son
ΩήϓϷϥϮΌη
ΕΎΒΗή
Ϥϟ
(Sal
ary
allo
wan
ces
Tax
(
ΎϨϴϴόΘ
ϟΕ
(N
ame ad
dre
ss g
rade(
Dr Mohamed Osman Hegazi
Hybrid Fragmentation
R
HFHF
R1
VF VFVFVFVF
R11 R12 R21 R22 R23
R2
Dr Mohamed Osman Hegazi
AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)
Site a
Site b
Fragment E
Do we replicate fragments
Where do we place each copy of each fragment
Site c
F1F1
F2
Dr Mohamed Osman Hegazi
Allocation Alternativesbull Non-replicated
ndash partitioned each fragment resides at only one site
bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the
sitesbull Rule
If replication is advantageous
otherwise replication may cause problems
read - only queriesupdate queries
1
Dr Mohamed Osman Hegazi
Optimization problem
bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash
bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash
Very hard problem
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(
(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(
(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(
Horizontal Fragmentation Peer to peer relationship ndash brothers
Dr Mohamed Osman Hegazi
Vertical fragmentation
E1
NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip
NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip
Sal5 107 258 15hellip
E
E2
Example
R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T
Just like normalization of relations
Dr Mohamed Osman Hegazi
Vertical Fragmentation example
PROJ1 information about project budgets
PROJ2 information about project names and locations
PNO BUDGET
P1 150000
P3 250000P2 135000
P4 310000P5 500000
PNO PNAME LOC
P1 Instrumentation Montreal
P3 CADCAM New YorkP2 Database DevelopNew York
P4 Maintenance ParisP5 CADCAM Boston
PROJ1 PROJ2
New YorkNew York
PROJ
PNO PNAME BUDGET LOC
P1 Instrumentation 150000 Montreal
P3 CADCAM 250000P2 Database Develop135000
P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston
New YorkNew York
Dr Mohamed Osman Hegazi
E1(NMLOC)E2(SAL)
ExampleE(NMLOCSAL) E1(NM)
E2(LOC)E3(SAL)
Which is the right vertical fragmentationhellip
Grouping Attributes
Dr Mohamed Osman Hegazi
Vertical Fragmentation branch relationship ndash parents and son
ΩήϓϷϥϮΌη
ΕΎΒΗή
Ϥϟ
(Sal
ary
allo
wan
ces
Tax
(
ΎϨϴϴόΘ
ϟΕ
(N
ame ad
dre
ss g
rade(
Dr Mohamed Osman Hegazi
Hybrid Fragmentation
R
HFHF
R1
VF VFVFVFVF
R11 R12 R21 R22 R23
R2
Dr Mohamed Osman Hegazi
AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)
Site a
Site b
Fragment E
Do we replicate fragments
Where do we place each copy of each fragment
Site c
F1F1
F2
Dr Mohamed Osman Hegazi
Allocation Alternativesbull Non-replicated
ndash partitioned each fragment resides at only one site
bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the
sitesbull Rule
If replication is advantageous
otherwise replication may cause problems
read - only queriesupdate queries
1
Dr Mohamed Osman Hegazi
Optimization problem
bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash
bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash
Very hard problem
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
Vertical fragmentation
E1
NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip
NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip
Sal5 107 258 15hellip
E
E2
Example
R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T
Just like normalization of relations
Dr Mohamed Osman Hegazi
Vertical Fragmentation example
PROJ1 information about project budgets
PROJ2 information about project names and locations
PNO BUDGET
P1 150000
P3 250000P2 135000
P4 310000P5 500000
PNO PNAME LOC
P1 Instrumentation Montreal
P3 CADCAM New YorkP2 Database DevelopNew York
P4 Maintenance ParisP5 CADCAM Boston
PROJ1 PROJ2
New YorkNew York
PROJ
PNO PNAME BUDGET LOC
P1 Instrumentation 150000 Montreal
P3 CADCAM 250000P2 Database Develop135000
P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston
New YorkNew York
Dr Mohamed Osman Hegazi
E1(NMLOC)E2(SAL)
ExampleE(NMLOCSAL) E1(NM)
E2(LOC)E3(SAL)
Which is the right vertical fragmentationhellip
Grouping Attributes
Dr Mohamed Osman Hegazi
Vertical Fragmentation branch relationship ndash parents and son
ΩήϓϷϥϮΌη
ΕΎΒΗή
Ϥϟ
(Sal
ary
allo
wan
ces
Tax
(
ΎϨϴϴόΘ
ϟΕ
(N
ame ad
dre
ss g
rade(
Dr Mohamed Osman Hegazi
Hybrid Fragmentation
R
HFHF
R1
VF VFVFVFVF
R11 R12 R21 R22 R23
R2
Dr Mohamed Osman Hegazi
AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)
Site a
Site b
Fragment E
Do we replicate fragments
Where do we place each copy of each fragment
Site c
F1F1
F2
Dr Mohamed Osman Hegazi
Allocation Alternativesbull Non-replicated
ndash partitioned each fragment resides at only one site
bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the
sitesbull Rule
If replication is advantageous
otherwise replication may cause problems
read - only queriesupdate queries
1
Dr Mohamed Osman Hegazi
Optimization problem
bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash
bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash
Very hard problem
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
Vertical Fragmentation example
PROJ1 information about project budgets
PROJ2 information about project names and locations
PNO BUDGET
P1 150000
P3 250000P2 135000
P4 310000P5 500000
PNO PNAME LOC
P1 Instrumentation Montreal
P3 CADCAM New YorkP2 Database DevelopNew York
P4 Maintenance ParisP5 CADCAM Boston
PROJ1 PROJ2
New YorkNew York
PROJ
PNO PNAME BUDGET LOC
P1 Instrumentation 150000 Montreal
P3 CADCAM 250000P2 Database Develop135000
P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston
New YorkNew York
Dr Mohamed Osman Hegazi
E1(NMLOC)E2(SAL)
ExampleE(NMLOCSAL) E1(NM)
E2(LOC)E3(SAL)
Which is the right vertical fragmentationhellip
Grouping Attributes
Dr Mohamed Osman Hegazi
Vertical Fragmentation branch relationship ndash parents and son
ΩήϓϷϥϮΌη
ΕΎΒΗή
Ϥϟ
(Sal
ary
allo
wan
ces
Tax
(
ΎϨϴϴόΘ
ϟΕ
(N
ame ad
dre
ss g
rade(
Dr Mohamed Osman Hegazi
Hybrid Fragmentation
R
HFHF
R1
VF VFVFVFVF
R11 R12 R21 R22 R23
R2
Dr Mohamed Osman Hegazi
AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)
Site a
Site b
Fragment E
Do we replicate fragments
Where do we place each copy of each fragment
Site c
F1F1
F2
Dr Mohamed Osman Hegazi
Allocation Alternativesbull Non-replicated
ndash partitioned each fragment resides at only one site
bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the
sitesbull Rule
If replication is advantageous
otherwise replication may cause problems
read - only queriesupdate queries
1
Dr Mohamed Osman Hegazi
Optimization problem
bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash
bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash
Very hard problem
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
E1(NMLOC)E2(SAL)
ExampleE(NMLOCSAL) E1(NM)
E2(LOC)E3(SAL)
Which is the right vertical fragmentationhellip
Grouping Attributes
Dr Mohamed Osman Hegazi
Vertical Fragmentation branch relationship ndash parents and son
ΩήϓϷϥϮΌη
ΕΎΒΗή
Ϥϟ
(Sal
ary
allo
wan
ces
Tax
(
ΎϨϴϴόΘ
ϟΕ
(N
ame ad
dre
ss g
rade(
Dr Mohamed Osman Hegazi
Hybrid Fragmentation
R
HFHF
R1
VF VFVFVFVF
R11 R12 R21 R22 R23
R2
Dr Mohamed Osman Hegazi
AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)
Site a
Site b
Fragment E
Do we replicate fragments
Where do we place each copy of each fragment
Site c
F1F1
F2
Dr Mohamed Osman Hegazi
Allocation Alternativesbull Non-replicated
ndash partitioned each fragment resides at only one site
bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the
sitesbull Rule
If replication is advantageous
otherwise replication may cause problems
read - only queriesupdate queries
1
Dr Mohamed Osman Hegazi
Optimization problem
bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash
bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash
Very hard problem
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
Vertical Fragmentation branch relationship ndash parents and son
ΩήϓϷϥϮΌη
ΕΎΒΗή
Ϥϟ
(Sal
ary
allo
wan
ces
Tax
(
ΎϨϴϴόΘ
ϟΕ
(N
ame ad
dre
ss g
rade(
Dr Mohamed Osman Hegazi
Hybrid Fragmentation
R
HFHF
R1
VF VFVFVFVF
R11 R12 R21 R22 R23
R2
Dr Mohamed Osman Hegazi
AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)
Site a
Site b
Fragment E
Do we replicate fragments
Where do we place each copy of each fragment
Site c
F1F1
F2
Dr Mohamed Osman Hegazi
Allocation Alternativesbull Non-replicated
ndash partitioned each fragment resides at only one site
bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the
sitesbull Rule
If replication is advantageous
otherwise replication may cause problems
read - only queriesupdate queries
1
Dr Mohamed Osman Hegazi
Optimization problem
bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash
bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash
Very hard problem
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
Hybrid Fragmentation
R
HFHF
R1
VF VFVFVFVF
R11 R12 R21 R22 R23
R2
Dr Mohamed Osman Hegazi
AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)
Site a
Site b
Fragment E
Do we replicate fragments
Where do we place each copy of each fragment
Site c
F1F1
F2
Dr Mohamed Osman Hegazi
Allocation Alternativesbull Non-replicated
ndash partitioned each fragment resides at only one site
bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the
sitesbull Rule
If replication is advantageous
otherwise replication may cause problems
read - only queriesupdate queries
1
Dr Mohamed Osman Hegazi
Optimization problem
bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash
bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash
Very hard problem
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)
Site a
Site b
Fragment E
Do we replicate fragments
Where do we place each copy of each fragment
Site c
F1F1
F2
Dr Mohamed Osman Hegazi
Allocation Alternativesbull Non-replicated
ndash partitioned each fragment resides at only one site
bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the
sitesbull Rule
If replication is advantageous
otherwise replication may cause problems
read - only queriesupdate queries
1
Dr Mohamed Osman Hegazi
Optimization problem
bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash
bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash
Very hard problem
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
Allocation Alternativesbull Non-replicated
ndash partitioned each fragment resides at only one site
bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the
sitesbull Rule
If replication is advantageous
otherwise replication may cause problems
read - only queriesupdate queries
1
Dr Mohamed Osman Hegazi
Optimization problem
bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash
bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash
Very hard problem
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
Optimization problem
bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash
bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash
Very hard problem
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models
bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)
bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system
Static data allocation amp Dynamic data allocation
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies
ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck
ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
Wherewhen Eager Lazy
Primary Copy
Early Solutions in Ingres
SybaseIBMOracle Placement Strat
Serialization- Graph Based
Update Everywhere
ROWAROWAA Quorum based Oracle Synchr Repl
Oracle Advanced Repl Weak consistency Strat
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query
to the relation algebra then the second stages localize the data by distribute the query
Query Optimizationbull The third stages is to achieve optimal implementation of the query by
making the executive be as little as possible and delete the unneeded expression
bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
Data localization ΕΎϧΎϴΒϟΰϛήϤΗ
Query Decomposition ϡϼόΘγϻϞϴϠΤΗ
Global Optimization ΔϣΎόϟΔϴϠΜϣϷ
local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ
Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ
Fragment Query ϡϼόΘγϻΔΰΠΗ
Optimized fragment query with communication operation
ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ
Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ
Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ
Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ
fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ
Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ
Control site ϢϜΤΘϟϊ ϗϮϣ
local site ϲ ϠΤϤϟϊ ϗϮϤϟ
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
bull Concurrency control in databases is the activities that make the transactions consistence among all the system data
bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues
bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data
performed except for the operation that updates the data in this case operation updates the local data first
Complex algorithm for timestamps
Concurrency Control in distributed database
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
Distributed Concurrency Controlbull Nonreplicated Scheme
ndash Each site maintains a local lock manager to administer lock and unlock requests for local data
ndash Deadlock handling is more complexbull Single-Coordinator Approach
ndash The system maintains a single lock manager that resides in a single chosen site
ndash Can be used with replicated datandash Advantages
bull simple implementationbull simple deadlock handling
ndash Disadvantagesbull bottleneckbull vulnerability
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks
Dr Mohamed Osman Hegazi
Distributed Concurrency Control
bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q
which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored
ndash complex to implementndash difficult to handle deadlocks