Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Incremental evaluation of updates in factorized in-memory
databases
Master Thesis Presentation
Muhammad Idris
Ph.D. Student-(ULB, TUD)
Email: [email protected]
Joint work with: Martin Ugarte, Stijn Vansummeren
Wednesday, October 26, 2016
1
DBDBD-2016
Incremental view maintenance (IVM)Query: Q = 𝑅 𝐴, 𝐵 ⋈ 𝑆 𝐵, 𝐶 ⋈ T C, D
2Ref: http://www.dbtoaster.org/
A B
1 5
B C
5 2
C D
2 4
A B C D
1 5 2 4D =
Q(D)
U(R) =(A:2,B:5) ⋈ B C
5 2
C D
2 4⋈ 2 5 2 4=
+
A B C D
1 5 2 4
2 5 2 4
Q(D + Δ𝐷)
=1: IVM
2: Higher Order IVM A B C
1 5 2
B C D
5 2 4
U(R) =(A:2,B:5)
R ⋈ S S ⋈ T
⋈ S⋈T =
A B C D
1 5 2 4
Q(D)
2 5 2 4
+
⋈ S ⇒ (R ⋈ S) + U(R) ⋈ S = A B C
1 5 2
2 5 2
1- IVM: Re-evaluation of the query, Expensive in live updates
2- HIVM : Materializing auxiliary views in-memory:• High memory footprint• Trades space for time
Can we do better?• Less memory footprint• Efficient processing time
Give up materialization Q(D)
Maintain Constant delay enumeration of Q(D)
All tuples contribute
Indexed
Yes!
3
R(A,B)
T(C,D)
S(B,C) U(A, E)
A B
1 5
B C
5 2
5 1
C D
2 2
1 4
A E
1 2
IC
IB
IA
Experimentation
4
Constant delay enumeration
0
5
10
15
20
25
30
35
40
2 11 28 45 58 92 258 585 928
Tim
e (s
eco
nd
s)
Data size (Mega bytes)
DYN Array
All tuples contribute to the result?
Yannakakis Algorithm
Yannakakis Algorithm
5
R(A,B)
T(C,D)
S(B,C) U(A, E)
1. Bottom-up semi joins
2. top-down semi joins
3. Bottom-up joins
A B
1 5
2 3
B C
5 2
3 2
4 3
5 1C D
2 2
4 3
1 4
A E
1 2
6 3
B C D
5 2 2
5 1 4
A B C D
1 5 2 2
1 5 1 4
A B C D E
1 5 2 2 2
1 5 1 4 2
Result materializationStatic algorithmO(In + Out)
B C
5 2
3 2
4 3
5 1C D
2 2
4 3
1 4
A B
1 5
2 3
B C
5 2
3 2
4 3
5 1
Q = 𝑅 𝐴, 𝐵 ⋈ 𝑆 𝐵, 𝐶 ⋈ T C, D
How can we do with dynamic updates?
Dynamic Yannakakis
7
R(A,B)
T(C,D)
S(B,C) U(A, E)
A B
1 5
2 4
B C
5 2
3 2
4 3
C D
2 2
A E
1 2
2 3
B C
5 2
3 2
5 4
C D
2 2
A B
1 5
2 4
A E
1 2
2 3
B C
5 2
3 2
5 4
4 2
B C
5 2
3 2
5 4
4 2
A B
1 5
2 4
IB
IC
IA
IA
IC
IB
1. Live tuples
Factorized Query result
Experimentation
8
DYN vs DBToaster
Memory Footprint and Update processing time
Column1 ds1 ds2 ds3 ds4
Q1 1199969 2000497 2999671 6001215
Q3 1529969 2550497 3824671 7651215
Q4 1499969 2500497 3749671 7501215
Q6 1199969 2000497 2999671 6001215
Q9 1701994 2837185 4254696 8511240
Q10 1529994 2550522 3824696 7651240
Q11 162025 270022 405025 810025
Q13 330000 550000 825000 1650000
Q15 1201969 2003830 3004671 6011215
Q16 202000 336663 505000 1010000
Q18 1529969 2550497 3824671 7651215
Q4-full join 757056 973153 1135047 1361969
Q3-full join 940388 1208866 1410047 1691969
Q2-full join 872611 1121725 1308382 1569969
Q1-full join 944833 1214581 1416714 1699969
Q5-full join 4892553 7780920
Experimentation
9
Memory footprint – TPC-H Full Join Queries
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
ds1 ds2 ds3 ds4 ds1 ds2 ds3 ds4 ds1 ds2 ds3 ds4 ds1 ds2 ds3 ds4 ds1 ds2
Q1 Q2 Q3 Q4 Q5
Mem
ory
(M
ega
byt
es)
Queries with datasets
DYN DB
Experimentation
10
Memory footprint – TPC-H aggregate queries
0
500
1000
1500
2000
2500
Q1 Q3 Q4 Q6 Q7 Q9 Q10 Q11 Q12 Q13 Q15 Q16 Q18
Mem
ory
(MB
)
Queries
DYN DBToaster
Experimentation
11
Processing time– TPC-H aggregate queries
0
5
10
15
20
25
30
Q1 Q3 Q4 Q6 Q7 Q9 Q10 Q11 Q12 Q13 Q15 Q16 Q18
Tim
e (
seco
nd
s)
Queries
DYN DBToaster
12
Any questions or comments?
THANK YOU!