65
1 Optimal Join Algorithms meet Top- SIGMOD 2020 tutorial Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Northeastern University, Boston Slides: https://northeastern-datalab.github.io/topk-join-tutorial/ DOI: https://doi.org/10.1145/3318464.3383132 Data Lab: https://db.khoury.northeastern.edu Ranked results Time This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License. See https://creativecommons.org/licenses/by-nc-sa/4.0/ for details Part 2 : Optimal Join Algorithms

SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

1

Optimal Join Algorithms meet Top-๐‘˜SIGMOD 2020 tutorial

Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald

Northeastern University, Boston

Slides: https://northeastern-datalab.github.io/topk-join-tutorial/DOI: https://doi.org/10.1145/3318464.3383132Data Lab: https://db.khoury.northeastern.edu

Ranked results

Time

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License.See https://creativecommons.org/licenses/by-nc-sa/4.0/ for details

Part 2 : Optimal Join Algorithms

Page 2: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

2

Outline tutorial

โ€ข Part 1: Top-๐‘˜ (Wolfgang): ~20minโ€ข Part 2: Optimal Join Algorithms (Mirek): ~30min

โ€“ Lower Bound and the Yannakakis Algorithm

โ€“ Problems Caused by Cycles

โ€“ Tree Decompositions

โ€“ Summary and Further Reading

โ€ข Part 3: Ranked enumeration over joins (Nikolaos): ~40min

Page 3: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

3

Basic Terminology and Assumptions

โ€ข Terminology

- Full conjunctive query (CQ)

โ€ข Natural join of ๐‘™ relations with O(๐‘›) tuples each

โ€ข E.g.: ๐‘„ ๐ด1, ๐ด2, ๐ด3, ๐ด4 = ๐‘…1 ๐ด1, ๐ด2 โ‹ˆ ๐‘…2 ๐ด1, ๐ด2, ๐ด3 โ‹ˆ ๐‘…3 ๐ด2 โ‹ˆ ๐‘…4 ๐ด1, ๐ด2, ๐ด4โ€ข Any selections comparing attributes to constants, e.g., ๐ด4 < 1

- Query size: O(๐‘™)

- Output cardinality: ๐‘Ÿ

โ€ข Assumptions

- No pre-computed data structures such as indexes, sorted representation, materialized views

Page 4: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

4

Complexity Notation

โ€ข Standard O and ฮฉ notation for time and memory complexity in the RAM model of computation

โ€ข Common practice: focus on data complexity

- We care about scalability in data size

โ€ข Treat query size ๐‘™ as a constant

- E.g., O ๐‘“ ๐‘™ โ‹… ๐‘›๐‘“(๐‘™) + log๐‘› ๐‘“(๐‘™) โ‹… ๐‘Ÿ simplifies to O ๐‘›๐‘“(๐‘™) + log๐‘› ๐‘“(๐‘™) โ‹… ๐‘Ÿ

Page 5: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

5

Complexity Notation

โ€ข Standard O and ฮฉ notation for time and memory complexity in the RAM model of computation

โ€ข Common practice: focus on data complexity

- We care about scalability in data size

โ€ข Treat query size ๐‘™ as a constant

- E.g., O ๐‘“ ๐‘™ โ‹… ๐‘›๐‘“(๐‘™) + log๐‘› ๐‘“(๐‘™) โ‹… ๐‘Ÿ simplifies to O ๐‘›๐‘“(๐‘™) + log๐‘› ๐‘“(๐‘™) โ‹… ๐‘Ÿ

โ€ข We mostly use เทฉO-notation (soft-O) data complexity

- Abstracts away polylog factors in input size that clutter formulas

- E.g., O ๐‘›๐‘“(๐‘™) + log๐‘› ๐‘“(๐‘™) โ‹… ๐‘Ÿ further simplifies to เทฉO ๐‘›๐‘“(๐‘™) + ๐‘Ÿ

Page 6: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

6

Outline tutorial

โ€ข Part 1: Top-๐‘˜ (Wolfgang): ~20minโ€ข Part 2: Optimal Join Algorithms (Mirek): ~30min

โ€“ Lower Bound and the Yannakakis Algorithm

โ€“ Problems Caused by Cycles

โ€“ Tree Decompositions

โ€“ Summary and Further Reading

โ€ข Part 3: Ranked enumeration over joins (Nikolaos): ~40min

Page 7: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

7

Lower Bound for Any Query

โ€ข Need to read entire input at least once: ฮฉ(๐‘™๐‘›)

- ฮฉ(๐‘›) data complexity

โ€ข Need to output every result, each of size ๐‘™: ฮฉ(๐‘™๐‘Ÿ)

- ฮฉ(๐‘Ÿ) data complexity

โ€ข Together: ฮฉ(๐‘› + ๐‘Ÿ) time complexity to compute any CQ

โ€ข Amazingly, the Yannakakis algorithm essentially matches the lower bound for acyclic CQs

- Time complexity เทฉO ๐‘› + ๐‘Ÿ

Page 8: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

8

Yannakakis Algorithm

โ€ข Given: acyclic conjunctive query Q as a rooted join tree

โ€ข Step 1: semi-join reduction (two sweeps)

- Semi-join reduction sweep from the leaves to root

- Semi-join reduction sweep from root to the leaves

โ€ข Step 2: use the join tree as the query plan

- Compute the joins bottom up, with early projections

[Mihalis Yannakakis. Algorithms for acyclic database schemes. VLDBโ€™81] https://dl.acm.org/doi/10.5555/1286831.1286840

Page 9: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

9

Database TheoryYannakakis Algorithm โ€“ Example ๐‘จ๐Ÿ ๐‘จ๐Ÿ

1 201 104 60

๐‘…1

๐‘จ๐Ÿ ๐‘จ๐Ÿ ๐‘จ๐Ÿ‘

1 10 1001 20 1003 10 3001 40 3002 30 200

๐‘…2

๐‘จ๐Ÿ

102030

๐‘…3 ๐‘จ๐Ÿ ๐‘จ๐Ÿ ๐‘จ๐Ÿ’

1 10 10001 20 10001 20 20002 20 2000

๐‘…4

๐ด2 ๐ด1, ๐ด2

๐ด1, ๐ด2

๐‘„ = ๐‘…1 ๐ด1, ๐ด2 โ‹ˆ ๐‘…2 ๐ด1, ๐ด2, ๐ด3 โ‹ˆ ๐‘…3 ๐ด2 โ‹ˆ ๐‘…4 ๐ด1, ๐ด2, ๐ด4

Page 10: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

10

Database TheoryYannakakis Algorithm โ€“ Example

1. Bottom-up traversal (semi-joins)

๐‘จ๐Ÿ ๐‘จ๐Ÿ

1 201 104 60

๐‘…1

๐‘จ๐Ÿ ๐‘จ๐Ÿ ๐‘จ๐Ÿ‘

1 10 1001 20 1003 10 3001 40 3002 30 200

๐‘…2

๐‘จ๐Ÿ

102030

๐‘…3

๐ด2

๐ด1, ๐ด2

๐‘„ = ๐‘…1 ๐ด1, ๐ด2 โ‹ˆ ๐‘…2 ๐ด1, ๐ด2, ๐ด3 โ‹ˆ ๐‘…3 ๐ด2 โ‹ˆ ๐‘…4 ๐ด1, ๐ด2, ๐ด4

๐‘น๐Ÿ โ‹‰ ๐‘น๐Ÿ’

๐‘จ๐Ÿ ๐‘จ๐Ÿ ๐‘จ๐Ÿ’

1 10 10001 20 10001 20 20002 20 2000

๐‘…4

๐ด1, ๐ด2

Page 11: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

11

Database TheoryYannakakis Algorithm โ€“ Example

1. Bottom-up traversal (semi-joins)

๐‘จ๐Ÿ ๐‘จ๐Ÿ

1 201 104 60

๐‘…1

๐‘จ๐Ÿ ๐‘จ๐Ÿ ๐‘จ๐Ÿ‘

1 10 1001 20 1003 10 3001 40 3002 30 200

๐‘…2

๐‘จ๐Ÿ

102030

๐‘…3 ๐‘จ๐Ÿ ๐‘จ๐Ÿ ๐‘จ๐Ÿ’

1 10 10001 20 10001 20 20002 20 2000

๐‘…4

๐ด2

๐ด1, ๐ด2

๐‘„ = ๐‘…1 ๐ด1, ๐ด2 โ‹ˆ ๐‘…2 ๐ด1, ๐ด2, ๐ด3 โ‹ˆ ๐‘…3 ๐ด2 โ‹ˆ ๐‘…4 ๐ด1, ๐ด2, ๐ด4

๐‘น๐Ÿ โ‹‰ ๐‘น๐Ÿ’

๐‘จ๐Ÿ ๐‘จ๐Ÿ

1 101 202 20

Page 12: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

12

Database TheoryYannakakis Algorithm โ€“ Example

1. Bottom-up traversal (semi-joins)

๐‘จ๐Ÿ ๐‘จ๐Ÿ

1 201 104 60

๐‘…1

๐‘จ๐Ÿ ๐‘จ๐Ÿ ๐‘จ๐Ÿ‘

1 10 1001 20 1003 10 3001 40 3002 30 200

๐‘…2

๐‘จ๐Ÿ

102030

๐‘…3 ๐‘จ๐Ÿ ๐‘จ๐Ÿ ๐‘จ๐Ÿ’

1 10 10001 20 10001 20 20002 20 2000

๐‘…4

๐ด2

๐ด1, ๐ด2

๐‘„ = ๐‘…1 ๐ด1, ๐ด2 โ‹ˆ ๐‘…2 ๐ด1, ๐ด2, ๐ด3 โ‹ˆ ๐‘…3 ๐ด2 โ‹ˆ ๐‘…4 ๐ด1, ๐ด2, ๐ด4

๐‘น๐Ÿ โ‹‰ ๐‘น๐Ÿ’

๐‘จ๐Ÿ ๐‘จ๐Ÿ

1 101 202 20

Page 13: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

13

Database TheoryYannakakis Algorithm โ€“ Example

1. Bottom-up traversal (semi-joins)

๐‘จ๐Ÿ ๐‘จ๐Ÿ

1 201 104 60

๐‘…1

๐‘จ๐Ÿ ๐‘จ๐Ÿ ๐‘จ๐Ÿ‘

1 10 1001 20 1003 10 3001 40 3002 30 200

๐‘…2

๐‘จ๐Ÿ

102030

๐‘…3 ๐‘จ๐Ÿ ๐‘จ๐Ÿ ๐‘จ๐Ÿ’

1 10 10001 20 10001 20 20002 20 2000

๐‘…4

๐ด1, ๐ด2

๐‘„ = ๐‘…1 ๐ด1, ๐ด2 โ‹ˆ ๐‘…2 ๐ด1, ๐ด2, ๐ด3 โ‹ˆ ๐‘…3 ๐ด2 โ‹ˆ ๐‘…4 ๐ด1, ๐ด2, ๐ด4

๐‘น๐Ÿ โ‹‰ ๐‘น๐Ÿ’

๐‘จ๐Ÿ ๐‘จ๐Ÿ

1 101 202 20

๐‘น๐Ÿ โ‹‰ ๐‘น๐Ÿ‘

๐‘จ๐Ÿ

102030

Page 14: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

14

Database TheoryYannakakis Algorithm โ€“ Example

1. Bottom-up traversal (semi-joins)

๐‘จ๐Ÿ ๐‘จ๐Ÿ

1 201 104 60

๐‘…1

๐‘จ๐Ÿ ๐‘จ๐Ÿ ๐‘จ๐Ÿ‘

1 10 1001 20 1003 10 3001 40 3002 30 200

๐‘…2

๐‘จ๐Ÿ

102030

๐‘…3 ๐‘จ๐Ÿ ๐‘จ๐Ÿ ๐‘จ๐Ÿ’

1 10 10001 20 10001 20 20002 20 2000

๐‘…4

๐‘„ = ๐‘…1 ๐ด1, ๐ด2 โ‹ˆ ๐‘…2 ๐ด1, ๐ด2, ๐ด3 โ‹ˆ ๐‘…3 ๐ด2 โ‹ˆ ๐‘…4 ๐ด1, ๐ด2, ๐ด4

๐‘น๐Ÿ โ‹‰ ๐‘น๐Ÿ’

๐‘จ๐Ÿ ๐‘จ๐Ÿ

1 101 202 20

๐‘น๐Ÿ โ‹‰ ๐‘น๐Ÿ‘

๐‘จ๐Ÿ

102030

๐‘น๐Ÿ โ‹‰ ๐‘น๐Ÿ

๐‘จ๐Ÿ ๐‘จ๐Ÿ

1 101 20

Page 15: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

15

Database TheoryYannakakis Algorithm โ€“ Example

1. Bottom-up traversal (semi-joins)

2. Top-down traversal (semi-joins)

๐‘จ๐Ÿ ๐‘จ๐Ÿ

1 201 104 60

๐‘…1

๐‘จ๐Ÿ ๐‘จ๐Ÿ ๐‘จ๐Ÿ‘

1 10 1001 20 1003 10 3001 40 3002 30 200

๐‘…2

๐‘จ๐Ÿ

102030

๐‘…3 ๐‘จ๐Ÿ ๐‘จ๐Ÿ ๐‘จ๐Ÿ’

1 10 10001 20 10001 20 20002 20 2000

๐‘…4

๐‘„ = ๐‘…1 ๐ด1, ๐ด2 โ‹ˆ ๐‘…2 ๐ด1, ๐ด2, ๐ด3 โ‹ˆ ๐‘…3 ๐ด2 โ‹ˆ ๐‘…4 ๐ด1, ๐ด2, ๐ด4

๐‘น๐Ÿ’ โ‹‰ ๐‘น๐Ÿ๐‘น๐Ÿ‘ โ‹‰ ๐‘น๐Ÿ

๐‘น๐Ÿ โ‹‰ ๐‘น๐Ÿ

Page 16: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

16

Database TheoryYannakakis Algorithm โ€“ Example ๐‘จ๐Ÿ ๐‘จ๐Ÿ

1 201 10

๐‘…1

๐‘จ๐Ÿ ๐‘จ๐Ÿ ๐‘จ๐Ÿ‘

1 10 1001 20 100

๐‘…2

๐‘จ๐Ÿ

1020

๐‘…3 ๐‘จ๐Ÿ ๐‘จ๐Ÿ ๐‘จ๐Ÿ’

1 10 10001 20 10001 20 2000

๐‘…4

๐‘„ = ๐‘…1 ๐ด1, ๐ด2 โ‹ˆ ๐‘…2 ๐ด1, ๐ด2, ๐ด3 โ‹ˆ ๐‘…3 ๐ด2 โ‹ˆ ๐‘…4 ๐ด1, ๐ด2, ๐ด4

1. Bottom-up traversal (semi-joins)

2. Top-down traversal (semi-joins)

3. Join bottom-up

Page 17: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

17

Database TheoryYannakakis Algorithm โ€“ Example ๐‘จ๐Ÿ ๐‘จ๐Ÿ

1 201 10

๐‘…1

๐‘จ๐Ÿ ๐‘จ๐Ÿ ๐‘จ๐Ÿ‘

1 10 1001 20 100

๐‘…2

๐‘จ๐Ÿ

1020

๐‘…3 ๐‘จ๐Ÿ ๐‘จ๐Ÿ ๐‘จ๐Ÿ’

1 10 10001 20 10001 20 2000

๐‘…4

๐‘„ = ๐‘…1 ๐ด1, ๐ด2 โ‹ˆ ๐‘…2 ๐ด1, ๐ด2, ๐ด3 โ‹ˆ ๐‘…3 ๐ด2 โ‹ˆ ๐‘…4 ๐ด1, ๐ด2, ๐ด4

1. Bottom-up traversal (semi-joins)

2. Top-down traversal (semi-joins)

3. Join bottom-up

In each join step, each left input tuple joins with at least 1 right input tuple, and vice versa!

Page 18: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

18

Yannakakis Algorithm โ€“ Properties

โ€ข Semi-join sweeps take เทฉO ๐‘›

โ€ข A join step can never shrink intermediate result size

- This does not hold for all trees

- Tree must be attribute-connected (more on this soon)

โ€ข Hence all intermediate results are of size O ๐‘Ÿ

โ€ข Each join step therefore has O ๐‘› + ๐‘Ÿ input and O ๐‘Ÿ output

โ€ข Easy to compute a binary join with O ๐‘› + ๐‘Ÿ input and O ๐‘Ÿ output in time เทฉO ๐‘› + ๐‘Ÿ , e.g., using sort-merge join

Page 19: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

19

Outline tutorial

โ€ข Part 1: Top-๐‘˜ (Wolfgang): ~20minโ€ข Part 2: Optimal Join Algorithms (Mirek): ~30min

โ€“ Lower Bound and the Yannakakis Algorithm

โ€“ Problems Caused by Cycles

โ€“ Tree Decompositions

โ€“ Summary and Further Reading

โ€ข Part 3: Ranked enumeration over joins (Nikolaos): ~40min

Page 20: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

20

CQs with Cycles

โ€ข 3-path: ๐‘„3๐‘ = ๐‘…1(๐ด1, ๐ด2) โ‹ˆ ๐‘…2(๐ด2, ๐ด3) โ‹ˆ ๐‘…3(๐ด3, ๐ด4)

โ€ข 3-cycle: ๐‘„3๐‘ = ๐‘…1(๐ด1, ๐ด2) โ‹ˆ ๐‘…2(๐ด2, ๐ด3) โ‹ˆ ๐‘…3(๐ด3, ๐ด1)

๐‘„3๐‘ ๐‘„3๐‘

๐ด1๐ด2

๐ด2 ๐ด3

๐ด3๐ด4

๐ด1 ๐ด2

๐ด3

๐ด2

๐ด3

๐ด1

Page 21: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

21

CQs with Cycles

โ€ข 3-path: ๐‘„3๐‘ = ๐‘…1(๐ด1, ๐ด2) โ‹ˆ ๐‘…2(๐ด2, ๐ด3) โ‹ˆ ๐‘…3(๐ด3, ๐ด4)

โ€ข 3-cycle: ๐‘„3๐‘ = ๐‘…1(๐ด1, ๐ด2) โ‹ˆ ๐‘…2(๐ด2, ๐ด3) โ‹ˆ ๐‘…3(๐ด3, ๐ด1)

โ€ข Already semi-join-reduced input

๐‘จ๐Ÿ ๐‘จ๐Ÿ

1 1

2 1

โ€ฆ โ€ฆ

n 1

๐‘จ๐Ÿ ๐‘จ๐Ÿ‘

1 1

1 2

โ€ฆ โ€ฆ

1 n

๐‘จ๐Ÿ‘ *

1 1

2 2

โ€ฆ โ€ฆ

n n

๐‘…1

๐‘…1

๐‘…2

Join tree

๐‘…3

๐‘…2 ๐‘…3

Page 22: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

22

CQs with Cycles

โ€ข 3-path: ๐‘„3๐‘ = ๐‘…1(๐ด1, ๐ด2) โ‹ˆ ๐‘…2(๐ด2, ๐ด3) โ‹ˆ ๐‘…3(๐ด3, ๐ด4)

โ€ข 3-cycle: ๐‘„3๐‘ = ๐‘…1(๐ด1, ๐ด2) โ‹ˆ ๐‘…2(๐ด2, ๐ด3) โ‹ˆ ๐‘…3(๐ด3, ๐ด1)

โ€ข Already semi-join reduced in the example

โ€ข ๐‘…1 โ‹ˆ ๐‘…2 produces ๐‘›2 intermediate results

- Final output size: ๐‘›2 for ๐‘„3๐‘, but only ๐‘› for ๐‘„3๐‘

๐‘จ๐Ÿ ๐‘จ๐Ÿ

1 1

2 1

โ€ฆ โ€ฆ

n 1

๐‘จ๐Ÿ ๐‘จ๐Ÿ‘

1 1

1 2

โ€ฆ โ€ฆ

1 n

๐‘จ๐Ÿ‘ *

1 1

2 2

โ€ฆ โ€ฆ

n n

๐‘…1

๐‘…1

๐‘…2

Join tree

๐‘…3

๐‘…2 ๐‘…3

Page 23: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

23

What Went Wrong?

โ€ข The tree for the 3-cycle is not attribute-connected!

โ€ข Attribute-connectedness:

- For each attribute, the nodes containing it form a connected sub-graph

โ€ข In the right tree, ๐ด1 violates this property

๐‘…1(๐ด1, ๐ด2)

๐‘…2(๐ด2, ๐ด3)

๐‘…3(๐ด3, ๐ด4)๐‘„3๐‘ ๐‘„3๐‘

๐‘…1(๐ด1, ๐ด2)

๐‘…2(๐ด2, ๐ด3)

๐‘…3(๐ด3, ๐ด1)

Page 24: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

24

Solutions for Cycles? Some Bad News

โ€ข Maybe we just need an algorithm that is better suited for cyclic CQs?

โ€ข Yes, butโ€ฆ

โ€ข โ€ฆ [Ngo+ 18]:

- เทฉO ๐‘› + ๐‘Ÿ unattainable based on well-accepted complexity-theoretic assumptions

[Ngo+ 18] Ngo, Porat, Rรฉ, Rudra. Worst-case optimal join algorithms. J. ACMโ€™18 https://doi.org/10.1145/3180143

Page 25: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

25

What Can Be Done?

โ€ข Worst-case-optimal (WCO) join algorithms[Veldhuizen 14, Ngo+ 14, Ngo+ 18]

โ€ข Instead of เทฉO ๐‘› + ๐‘Ÿ , getเทฉO ๐‘› + ๐‘ŸWC = เทฉO ๐‘ŸWC

โ€ข ๐‘ŸWC = largest output of Q over any possible DB instance

- Determined by the AGM bound[4]

- Based on fractional edge cover of the join hypergraph

โ€ข 3-cycle: ๐‘›1.5 vs naive upper bound ๐‘›3

[Ngo+ 18] Ngo, Porat, Rรฉ, Rudra. Worst-case optimal join algorithms. J. ACMโ€™18 https://doi.org/10.1145/3180143

[Veldhuizen 14] Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. ICDTโ€™14 https://doi.org/10.5441/002/icdt.2014.13[Ngo+ 14] Ngo, Re, Rudra. Skew strikes back: New developments in the theory of join algorithms. SIGMOD Rec.โ€™14 https://doi.org/10.1145/2590989.2590991

[Atserias+ 13] Atserias, Grohe, Marx. Size bounds and query plans for relational joins. . SIAM J. Comput.โ€™13 https://doi.org/10.1137/110859440

Page 26: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

26

What Can Be Done?

โ€ข Worst-case-optimal (WCO) join algorithms[Veldhuizen 14, Ngo+ 14, Ngo+ 18]

โ€ข Instead of เทฉO ๐‘› + ๐‘Ÿ , getเทฉO ๐‘› + ๐‘ŸWC = เทฉO ๐‘ŸWC

โ€ข ๐‘ŸWC = largest output of Q over any possible DB instance

- Determined by the AGM bound[4]

- Based on fractional edge cover of the join hypergraph

โ€ข 3-cycle: ๐‘›1.5 vs naive upper bound ๐‘›3

โ€ข Tree decompositions

โ€ข Put more effort into pre-processing to avoid large intermediate results

- Use WCO joins as sub-routine

โ€ข Goal: เทฉO ๐‘›๐‘‘ + ๐‘Ÿ for smallest ๐‘‘possible

- เทฉO ๐‘›๐‘‘ captures pre-processing cost

- ๐‘‘ = 1 for acyclic CQ

[Ngo+ 18] Ngo, Porat, Rรฉ, Rudra. Worst-case optimal join algorithms. J. ACMโ€™18 https://doi.org/10.1145/3180143

[Veldhuizen 14] Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. ICDTโ€™14 https://doi.org/10.5441/002/icdt.2014.13[Ngo+ 14] Ngo, Re, Rudra. Skew strikes back: New developments in the theory of join algorithms. SIGMOD Rec.โ€™14 https://doi.org/10.1145/2590989.2590991

[Atserias+ 13] Atserias, Grohe, Marx. Size bounds and query plans for relational joins. . SIAM J. Comput.โ€™13 https://doi.org/10.1137/110859440

Page 27: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

27

Outline tutorial

โ€ข Part 1: Top-๐‘˜ (Wolfgang): ~20minโ€ข Part 2: Optimal Join Algorithms (Mirek): ~30min

โ€“ Lower Bound and the Yannakakis Algorithm

โ€“ Problems Caused by Cycles

โ€“ Tree Decompositions

โ€“ Summary and Further Reading

โ€ข Part 3: Ranked enumeration over joins (Nikolaos): ~40min

Page 28: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

28

Main Idea of Tree Decompositions

โ€ข Convert cyclic CQ to a rooted tree-shaped CQ

A1

A4

A3

A2A6

A5

R6 R1

R2

R3R4

R5

S = ๐‘…1 ๐ด1, ๐ด2 โ‹ˆ ๐‘…2 ๐ด2, ๐ด3 โ‹ˆ ๐‘…3 ๐ด3, ๐ด4

T = ๐‘…4 ๐ด4, ๐ด5 โ‹ˆ ๐‘…5 ๐ด5, ๐ด6 โ‹ˆ ๐‘…6(๐ด6, ๐ด1)

decomposition

Page 29: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

29

Main Idea of Tree Decompositions

โ€ข Convert cyclic CQ to a rooted tree-shaped CQ

โ€ข Materialize all tree nodes (โ€œbagsโ€) using a WCO join algorithm

A1

A4

A3

A2A6

A5

R6 R1

R2

R3R4

R5

S = ๐‘…1 ๐ด1, ๐ด2 โ‹ˆ ๐‘…2 ๐ด2, ๐ด3 โ‹ˆ ๐‘…3 ๐ด3, ๐ด4

T = ๐‘…4 ๐ด4, ๐ด5 โ‹ˆ ๐‘…5 ๐ด5, ๐ด6 โ‹ˆ ๐‘…6(๐ด6, ๐ด1)

decomposition

S

T

Page 30: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

30

Main Idea of Tree Decompositions

โ€ข Convert cyclic CQ to a rooted tree-shaped CQ

โ€ข Materialize all tree nodes (โ€œbagsโ€) using a WCO join algorithm

โ€ข Apply Yannakakis algorithm on the tree

- Acyclic CQ whose input relations are the bags

- Achieves เทฉO ๐‘ฅ + ๐‘Ÿ where ๐‘ฅ is the size of the largest bag

A1

A4

A3

A2A6

A5

R6 R1

R2

R3R4

R5

S = ๐‘…1 ๐ด1, ๐ด2 โ‹ˆ ๐‘…2 ๐ด2, ๐ด3 โ‹ˆ ๐‘…3 ๐ด3, ๐ด4

T = ๐‘…4 ๐ด4, ๐ด5 โ‹ˆ ๐‘…5 ๐ด5, ๐ด6 โ‹ˆ ๐‘…6(๐ด6, ๐ด1)

decomposition

S

T

Yannakakis

Page 31: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

31

Tree Decomposition Intuition

๐‘„6๐‘ ๐ด1, โ€ฆ , ๐ด6 = ๐‘…1 ๐ด1, ๐ด2 โ‹ˆ ๐‘…2 ๐ด2, ๐ด3โ‹ˆ ๐‘…3 ๐ด3, ๐ด4 โ‹ˆ ๐‘…4 ๐ด4, ๐ด5โ‹ˆ ๐‘…5(๐ด5, ๐ด6) โ‹ˆ ๐‘…6(๐ด6, ๐ด1)

Page 32: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

32

Tree Decomposition Intuition

๐‘„6๐‘ ๐ด1, โ€ฆ , ๐ด6 = ๐‘…1 ๐ด1, ๐ด2 โ‹ˆ ๐‘…2 ๐ด2, ๐ด3โ‹ˆ ๐‘…3 ๐ด3, ๐ด4 โ‹ˆ ๐‘…4 ๐ด4, ๐ด5โ‹ˆ ๐‘…5(๐ด5, ๐ด6) โ‹ˆ ๐‘…6(๐ด6, ๐ด1)

Every relation appearing in the query is covered by a bag (tree node)

For each attribute, the bags containing it are connected

Page 33: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

33

Tree Decomposition Intuition

๐‘„6๐‘ ๐ด1, โ€ฆ , ๐ด6 = ๐‘…1 ๐ด1, ๐ด2 โ‹ˆ ๐‘…2 ๐ด2, ๐ด3โ‹ˆ ๐‘…3 ๐ด3, ๐ด4 โ‹ˆ ๐‘…4 ๐ด4, ๐ด5โ‹ˆ ๐‘…5(๐ด5, ๐ด6) โ‹ˆ ๐‘…6(๐ด6, ๐ด1)

Every relation appearing in the query is covered by a bag (tree node)

For each attribute, the bags containing it are connected

Page 34: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

34

Tree Decomposition Intuition

Every relation appearing in the query is covered by a bag (tree node)

For each attribute, the bags containing it are connected

๐‘„6๐‘ ๐ด1, โ€ฆ , ๐ด6 = ๐‘…1 ๐ด1, ๐ด2 โ‹ˆ ๐‘…2 ๐ด2, ๐ด3โ‹ˆ ๐‘…3 ๐ด3, ๐ด4 โ‹ˆ ๐‘…4 ๐ด4, ๐ด5โ‹ˆ ๐‘…5(๐ด5, ๐ด6) โ‹ˆ ๐‘…6(๐ด6, ๐ด1)

๐‘…1 ๐ด1, ๐ด2 , ๐‘…2 ๐ด2, ๐ด3 , ๐‘…3 ๐ด3, ๐ด4๐‘…4 ๐ด4, ๐ด5 , ๐‘…5 ๐ด5, ๐ด6 , ๐‘…6(๐ด6, ๐ด1)

๐’ฏ1

Bag materialization costs O(๐‘›3) (AGM bound)

Page 35: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

35

Tree Decomposition Intuition

Every relation appearing in the query is covered by a bag (tree node)

For each attribute, the bags containing it are connected

๐‘„6๐‘ ๐ด1, โ€ฆ , ๐ด6 = ๐‘…1 ๐ด1, ๐ด2 โ‹ˆ ๐‘…2 ๐ด2, ๐ด3โ‹ˆ ๐‘…3 ๐ด3, ๐ด4 โ‹ˆ ๐‘…4 ๐ด4, ๐ด5โ‹ˆ ๐‘…5(๐ด5, ๐ด6) โ‹ˆ ๐‘…6(๐ด6, ๐ด1)

๐‘…1 ๐ด1, ๐ด2 , ๐‘…2 ๐ด2, ๐ด3 , ๐‘…3 ๐ด3, ๐ด4๐‘…4 ๐ด4, ๐ด5 , ๐‘…5 ๐ด5, ๐ด6 , ๐‘…6(๐ด6, ๐ด1)

๐’ฏ1

Bag materialization costs O(๐‘›3) (AGM bound)

Page 36: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

36

Tree Decomposition Intuition

Every relation appearing in the query is covered by a bag (tree node)

For each attribute, the bags containing it are connected

๐‘…1 ๐ด1, ๐ด2 , ๐‘…2 ๐ด2, ๐ด3 , ๐‘…3 ๐ด3, ๐ด4๐’ฏ2

๐‘…4 ๐ด4, ๐ด5 , ๐‘…5 ๐ด5, ๐ด6 , ๐‘…6(๐ด6, ๐ด1)

Bag materialization costs O(๐‘›2) (AGM bound)

๐‘„6๐‘ ๐ด1, โ€ฆ , ๐ด6 = ๐‘…1 ๐ด1, ๐ด2 โ‹ˆ ๐‘…2 ๐ด2, ๐ด3โ‹ˆ ๐‘…3 ๐ด3, ๐ด4 โ‹ˆ ๐‘…4 ๐ด4, ๐ด5โ‹ˆ ๐‘…5(๐ด5, ๐ด6) โ‹ˆ ๐‘…6(๐ด6, ๐ด1)

Page 37: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

37

Tree Decomposition Intuition

Every relation appearing in the query is covered by a bag (tree node)

For each attribute, the bags containing it are connected

๐‘…1 ๐ด1, ๐ด2 , ๐‘…2 ๐ด2, ๐ด3 , ๐‘…3 ๐ด3, ๐ด4๐’ฏ2

๐‘…4 ๐ด4, ๐ด5 , ๐‘…5 ๐ด5, ๐ด6 , ๐‘…6(๐ด6, ๐ด1)

Bag materialization costs O(๐‘›2) (AGM bound)

๐‘„6๐‘ ๐ด1, โ€ฆ , ๐ด6 = ๐‘…1 ๐ด1, ๐ด2 โ‹ˆ ๐‘…2 ๐ด2, ๐ด3โ‹ˆ ๐‘…3 ๐ด3, ๐ด4 โ‹ˆ ๐‘…4 ๐ด4, ๐ด5โ‹ˆ ๐‘…5(๐ด5, ๐ด6) โ‹ˆ ๐‘…6(๐ด6, ๐ด1)

Page 38: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

38

Tree Decomposition Intuition

Every relation appearing in the query is covered by a bag (tree node)

For each attribute, the bags containing it are connected

๐’ฏ3 ๐‘…1 ๐ด1, ๐ด2

๐‘…2 ๐ด2, ๐ด3

๐‘…3 ๐ด3, ๐ด4

๐‘…4 ๐ด4, ๐ด5

๐‘…5 ๐ด5, ๐ด6

๐‘…6(๐ด6, ๐ด1)

O(๐‘›) bag materializationโ€ฆ?

๐‘„6๐‘ ๐ด1, โ€ฆ , ๐ด6 = ๐‘…1 ๐ด1, ๐ด2 โ‹ˆ ๐‘…2 ๐ด2, ๐ด3โ‹ˆ ๐‘…3 ๐ด3, ๐ด4 โ‹ˆ ๐‘…4 ๐ด4, ๐ด5โ‹ˆ ๐‘…5(๐ด5, ๐ด6) โ‹ˆ ๐‘…6(๐ด6, ๐ด1)

Page 39: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

39

Tree Decomposition Intuition

Every relation appearing in the query is covered by a bag (tree node)

For each attribute, the bags containing it are connected

๐’ฏ3 ๐‘…1 ๐‘จ๐Ÿ, ๐ด2

๐‘…2 ๐ด2, ๐ด3

๐‘…3 ๐ด3, ๐ด4

๐‘…4 ๐ด4, ๐ด5

๐‘…5 ๐ด5, ๐ด6

๐‘…6(๐ด6, ๐‘จ๐Ÿ)

๐‘„6๐‘ ๐ด1, โ€ฆ , ๐ด6 = ๐‘…1 ๐ด1, ๐ด2 โ‹ˆ ๐‘…2 ๐ด2, ๐ด3โ‹ˆ ๐‘…3 ๐ด3, ๐ด4 โ‹ˆ ๐‘…4 ๐ด4, ๐ด5โ‹ˆ ๐‘…5(๐ด5, ๐ด6) โ‹ˆ ๐‘…6(๐ด6, ๐ด1)

Page 40: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

40

Tree Decomposition Intuition

Every relation appearing in the query is covered by a bag (tree node)

For each attribute, the bags containing it are connected

๐’ฏ3 ๐‘…1 ๐ด1, ๐ด2

๐‘…2 ๐ด2, ๐ด3 , ๐‘…1 ๐ด1, _

๐‘…3 ๐ด3, ๐ด4 , ๐‘…1 ๐ด1, _

๐‘…4 ๐ด4, ๐ด5 , ๐‘…1 ๐ด1, _

๐‘…5 ๐ด5, ๐ด6 , ๐‘…1 ๐ด1, _

๐‘…6(๐ด6, ๐ด1)

O ๐‘› โˆ™ ๐œ‹๐ด1 ๐‘…1 bag materialization: still O(๐‘›2)

๐‘„6๐‘ ๐ด1, โ€ฆ , ๐ด6 = ๐‘…1 ๐ด1, ๐ด2 โ‹ˆ ๐‘…2 ๐ด2, ๐ด3โ‹ˆ ๐‘…3 ๐ด3, ๐ด4 โ‹ˆ ๐‘…4 ๐ด4, ๐ด5โ‹ˆ ๐‘…5(๐ด5, ๐ด6) โ‹ˆ ๐‘…6(๐ด6, ๐ด1)

Page 41: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

41

Tree Decomposition: Formal Definition

โ€ข Given: hypergraph โ„‹ = (๐’ฑ, โ„ฐ)

- ๐’ฑ: attributes

โ€ข E.g., ๐ด1, ๐ด2, ๐ด3, ๐ด4, ๐ด5, ๐ด6

- โ„ฐ: relations

โ€ข E.g., ๐‘…3 is hyperedge (๐ด3, ๐ด4)

โ€ข A tree decomposition of โ„‹ is a pair ๐’ฏ, ๐œ’ where

- ๐’ฏ = ๐‘‰ ๐’ฏ ,๐ธ(๐’ฏ) is a tree

- ๐œ’:๐‘‰ ๐’ฏ โ†’ 2๐’ฑ assigns a bag ๐œ’(๐‘ฃ) to each tree node ๐‘ฃ such that

โ€ข Each hyperedge ๐น โˆˆ โ„ฐ is covered, i.e., โˆ€๐น โˆˆ โ„ฐ: โˆƒ๐‘ฃ โˆˆ ๐‘‰ ๐’ฏ : ๐น โŠ† ๐œ’(๐‘ฃ)

โ€ข For each ๐‘ข โˆˆ ๐’ฑ, the bags containing ๐‘ข are connected

A1

A4

A3

A2A6

A5

R6 R1

R2

R3R4

R5

๐’ฏ3 ๐‘…1 ๐ด1, ๐ด2

๐‘…2 ๐ด2, ๐ด3 , ๐‘…1 ๐ด1, _

๐‘…3 ๐ด3, ๐ด4 , ๐‘…1 ๐ด1, _

๐‘…4 ๐ด4, ๐ด5 , ๐‘…1 ๐ด1, _

๐‘…5 ๐ด5, ๐ด6 , ๐‘…1 ๐ด1, _

๐‘…6(๐ด6, ๐ด1)

[Khamis, Ngo, Suciu. What do shannon-type inequalities, submodular width, and disjunctive datalog have to do with one another? PODSโ€™17] https://doi.org/10.1145/3034786.3056105

Page 42: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

42

Tree-Decomposition Properties

โ€ข Query has multiple decompositionsโ€”which is best?

Page 43: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

43

Tree-Decomposition Properties

โ€ข Query has multiple decompositionsโ€”which is best?

โ€ข Consider a tree with O(๐‘™) nodes, each materialized using WCO join

- Worst-case output of bag ๐‘– is of size O(๐‘›๐‘‘๐‘–) for some ๐‘‘๐‘– โ‰ฅ 1 (AGM bound)

- Fractional hypertree width (fhw) ๐‘‘ = max๐‘–

๐‘‘๐‘– [Grohe+ 14]

- Total bag-materialization cost: O(๐‘›๐‘‘)

- Size of a materialized bag: O(๐‘›๐‘‘)

- Resulting cost for Yannakakis algorithm on materialized tree: เทฉO(๐‘›๐‘‘ + ๐‘Ÿ)

[Grohe+ 14] Grohe and Marx. Constraint solving via fractional edge covers. ACM TALGโ€™14. https://doi.org/10.1145/2636918

Page 44: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

44

Who Wins?๐‘…1 ๐ด1, ๐ด2 , ๐‘…2 ๐ด2, ๐ด3 , ๐‘…3 ๐ด3, ๐ด4๐‘…4 ๐ด4, ๐ด5 , ๐‘…5 ๐ด5, ๐ด6 , ๐‘…6(๐ด6, ๐ด1)

๐’ฏ1

๐‘…1 ๐ด1, ๐ด2 , ๐‘…2 ๐ด2, ๐ด3 , ๐‘…3 ๐ด3, ๐ด4๐’ฏ2

๐‘…4 ๐ด4, ๐ด5 , ๐‘…5 ๐ด5, ๐ด6 , ๐‘…6(๐ด6, ๐ด1)

๐’ฏ3 ๐‘…1 ๐ด1, ๐ด2

๐‘…2 ๐ด2, ๐ด3 , ๐‘…1 ๐ด1, _

๐‘…3 ๐ด3, ๐ด4 , ๐‘…1 ๐ด1, _

๐‘…4 ๐ด4, ๐ด5 , ๐‘…1 ๐ด1, _

๐‘…5 ๐ด5, ๐ด6 , ๐‘…1 ๐ด1, _

๐‘…6(๐ด6, ๐ด1)

๐“ฃ๐Ÿ > ๐“ฃ๐Ÿ = ๐“ฃ๐Ÿ‘

O(๐‘›2)

O(๐‘›2)

O(๐‘›3)

Page 45: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

45

A Closer Look

โ€ข ๐’ฏ1 loses, because it does not decompose the query

Page 46: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

46

A Closer Look

โ€ข ๐’ฏ1 loses, because it does not decompose the query

โ€ข Are ๐’ฏ2 and ๐’ฏ3 really equally good?

- In ๐’ฏ2, bag computation requires joining 3 relations

- In ๐’ฏ3, at most 2 relations are joined

โ€ข One of them is just the set of distinct ๐ด1-values in ๐‘…1

Page 47: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

47

A Closer Look

โ€ข ๐’ฏ1 loses, because it does not decompose the query

โ€ข Are ๐’ฏ2 and ๐’ฏ3 really equally good?

- In ๐’ฏ2, bag computation requires joining 3 relations

- In ๐’ฏ3, at most 2 relations are joined

โ€ข One of them is just the set of distinct ๐ด1-values in ๐‘…1

โ€ข ๐’ฏ3 is better when the number of distinct ๐ด1-values in ๐‘…1 is low, e.g., ๐‘‚(๐‘›2/3) instead of ๐‘‚ ๐‘›

Page 48: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

48

Who Wins?

๐’ฏ3

๐‘…1 ๐ด1, ๐ด2

๐‘…2 ๐ด2, ๐ด3 , ๐‘…1 ๐ด1, _

๐‘…3 ๐ด3, ๐ด4 , ๐‘…1 ๐ด1, _

๐‘…4 ๐ด4, ๐ด5 , ๐‘…1 ๐ด1, _

๐‘…5 ๐ด5, ๐ด6 , ๐‘…1 ๐ด1, _

๐‘…6(๐ด6, ๐ด1)

Degree constraint: ๐œ‹๐ด1 ๐‘…1 โ‰ค ๐‘›2/3

O(๐‘›)

O(๐‘›)

O(๐‘›5/3)

O(๐‘›5/3)

O(๐‘›5/3)

O(๐‘›5/3)

โ€œThe number of distinct ๐ด1values in ๐‘…1 is at most ๐‘›2/3โ€

Page 49: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

49

Who Wins?

๐‘…1 ๐ด1, ๐ด2 , ๐‘…2 ๐ด2, ๐ด3 , ๐‘…3 ๐ด3, ๐ด4

๐’ฏ2

๐‘…4 ๐ด4, ๐ด5 , ๐‘…5 ๐ด5, ๐ด6 , ๐‘…6(๐ด6, ๐ด1)

๐’ฏ3

๐‘…1 ๐ด1, ๐ด2

๐‘…2 ๐ด2, ๐ด3 , ๐‘…1 ๐ด1, _

๐‘…3 ๐ด3, ๐ด4 , ๐‘…1 ๐ด1, _

๐‘…4 ๐ด4, ๐ด5 , ๐‘…1 ๐ด1, _

๐‘…5 ๐ด5, ๐ด6 , ๐‘…1 ๐ด1, _

๐‘…6(๐ด6, ๐ด1)

Degree constraint: ๐œ‹๐ด1 ๐‘…1 โ‰ค ๐‘›2/3

O(๐‘›)

O(๐‘›)

O(๐‘›5/3)

O(๐‘›5/3)

O(๐‘›5/3)

O(๐‘›5/3)

O(๐‘›2)

O(๐‘›2)

Page 50: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

50

Could ๐’ฏ2 Win?

โ€ข Consider bag ๐‘…1 ๐ด1, ๐ด2 , ๐‘…2 ๐ด2, ๐ด3 , ๐‘…3 ๐ด3, ๐ด4 in ๐’ฏ2

โ€ข What if each ๐‘…1-tuple joins with only โ€œa fewโ€ ๐‘…2-tuples?

โ€ข What if each ๐‘…2-tuple joins with only โ€œa fewโ€ ๐‘…3-tuples?

โ€ข What if โ€œa fewโ€ was ๐‘›1/3?

Page 51: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

51

Who Wins Now?

๐‘…1 ๐ด1, ๐ด2 , ๐‘…2 ๐ด2, ๐ด3 , ๐‘…3 ๐ด3, ๐ด4

๐’ฏ2

๐‘…4 ๐ด4, ๐ด5 , ๐‘…5 ๐ด5, ๐ด6 , ๐‘…6(๐ด6, ๐ด1)

Degree constraint: โˆ€๐‘– โˆˆ {2,3,5,6}:

โˆ€๐‘—: ๐œ‹๐ด๐‘–+1๐œŽ๐ด๐‘–=๐‘— ๐‘…๐‘– โ‰ค ๐‘›1/3

O(๐‘›5/3)

O(๐‘›5/3)

Page 52: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

52

Who Wins Now?

๐‘…1 ๐ด1, ๐ด2 , ๐‘…2 ๐ด2, ๐ด3 , ๐‘…3 ๐ด3, ๐ด4

๐’ฏ2

๐‘…4 ๐ด4, ๐ด5 , ๐‘…5 ๐ด5, ๐ด6 , ๐‘…6(๐ด6, ๐ด1)

๐’ฏ3

๐‘…1 ๐ด1, ๐ด2

๐‘…2 ๐ด2, ๐ด3 , ๐‘…1 ๐ด1, _

๐‘…3 ๐ด3, ๐ด4 , ๐‘…1 ๐ด1, _

๐‘…4 ๐ด4, ๐ด5 , ๐‘…1 ๐ด1, _

๐‘…5 ๐ด5, ๐ด6 , ๐‘…1 ๐ด1, _

๐‘…6(๐ด6, ๐ด1)

Degree constraint: โˆ€๐‘– โˆˆ {2,3,5,6}:

โˆ€๐‘—: ๐œ‹๐ด๐‘–+1๐œŽ๐ด๐‘–=๐‘— ๐‘…๐‘– โ‰ค ๐‘›1/3

O(๐‘›)

O(๐‘›)

O(๐‘›2)

O(๐‘›2)

O(๐‘›2)

O(๐‘›2)

O(๐‘›5/3)

O(๐‘›5/3)

Page 53: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

53

Best of Both Worlds

โ€ข Depending on the degree constraints that hold for a DB instance, we may sometimes prefer ๐’ฏ2 and sometimes ๐’ฏ3

โ€ข What if we used both? [Alon+ 97, Marx 13]

- Intuition: each decomposition is a different query โ€œplanโ€

โ€ข Query output = union of individual plansโ€™ results

- Decide for each input tuple to which plan(s) to send it

- Main idea: split each input relation into heavy and light

โ€ข Goal: enforce desirable degree constraints for each tree

[Marx 13] Marx. Tractable hypergraph properties for constraint satisfaction and conjunctive queries. J.ACMโ€™13 https://doi.org/10.1145/2535926

[Alon+ 97] Alon, Yuster, Zwick. Finding and counting given length cycles. Algorithmicaโ€™97 https://doi.org/10.1007/BF02523189

Page 54: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

54

Multiple Plans: Plan 1

๐’ฏ3

๐‘…1๐ป ๐ด1, ๐ด2

๐‘…2 ๐ด2, ๐ด3 , ๐‘…1๐ป ๐ด1, _

๐‘…3 ๐ด3, ๐ด4 , ๐‘…1๐ป ๐ด1, _

๐‘…4 ๐ด4, ๐ด5 , ๐‘…1๐ป ๐ด1, _

๐‘…5 ๐ด5, ๐ด6 , ๐‘…1๐ป ๐ด1, _

๐‘…6(๐ด6, ๐ด1)

๐‘…1๐ป: contains all tuples whose

๐ด1-values occur more than

๐‘›1/3 times (fewer than ๐‘›2/3

such ๐ด1-values exist)

Page 55: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

55

Multiple Plans: Plan 1

๐’ฏ3: computes ๐‘…1๐ป โ‹ˆ ๐‘…2 โ‹ˆ โ‹ฏ โ‹ˆ ๐‘…6

๐‘…1๐ป ๐ด1, ๐ด2

๐‘…2 ๐ด2, ๐ด3 , ๐‘…1๐ป ๐ด1, _

๐‘…3 ๐ด3, ๐ด4 , ๐‘…1๐ป ๐ด1, _

๐‘…4 ๐ด4, ๐ด5 , ๐‘…1๐ป ๐ด1, _

๐‘…5 ๐ด5, ๐ด6 , ๐‘…1๐ป ๐ด1, _

๐‘…6(๐ด6, ๐ด1)

Degree constraint: ๐œ‹๐ด1 ๐‘…1๐ป โ‰ค ๐‘›2/3

O(๐‘›)

O(๐‘›)

O(๐‘›5/3)

O(๐‘›5/3)

O(๐‘›5/3)

O(๐‘›5/3)

๐‘…1๐ป: contains all tuples whose

๐ด1-values occur more than

๐‘›1/3 times (fewer than ๐‘›2/3

such ๐ด1-values exist)

Page 56: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

56

More Plans

โ€ข Note that

- ๐‘„6๐‘ = ๐‘…1 โ‹ˆ ๐‘…2 โ‹ˆ ๐‘…3 โ‹ˆ ๐‘…4 โ‹ˆ ๐‘…5 โ‹ˆ ๐‘…6 together with

- ๐‘…1๐ฟ = ๐‘…1 โˆ– ๐‘…1

๐ป

โ€ข implies that ๐‘„6๐‘ is the union of

- ๐‘…1๐ป โ‹ˆ ๐‘…2 โ‹ˆ ๐‘…3 โ‹ˆ ๐‘…4 โ‹ˆ ๐‘…5 โ‹ˆ ๐‘…6 and

- ๐‘…1๐ฟ โ‹ˆ ๐‘…2 โ‹ˆ ๐‘…3 โ‹ˆ ๐‘…4 โ‹ˆ ๐‘…5 โ‹ˆ ๐‘…6

โ€ข To compute the latter, apply the same trick to ๐‘…2

Page 57: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

57

Multiple Plans: Plan 2

๐’ฏ3: computes ๐‘…1๐ฟ โ‹ˆ ๐‘…2

๐ป โ‹ˆ ๐‘…3 โ‹ˆ โ‹ฏ โ‹ˆ ๐‘…6

๐‘…2๐ป ๐ด2, ๐ด3

๐‘…3 ๐ด3, ๐ด4 , ๐‘…2๐ป ๐ด2, _

๐‘…4 ๐ด4, ๐ด5 , ๐‘…2๐ป ๐ด2, _

๐‘…5 ๐ด5, ๐ด6 , ๐‘…2๐ป ๐ด2, _

๐‘…6 ๐ด6, ๐ด1 , ๐‘…2๐ป ๐ด2, _

๐‘…1๐ฟ(๐ด1, ๐ด2)

Degree constraint: ๐œ‹๐ด2 ๐‘…2๐ป โ‰ค ๐‘›2/3

๐‘…2๐ป: contains all tuples whose

๐ด2-values occur more than ๐‘›1/3 times (fewer than ๐‘›2/3

such ๐ด2-values exist)

๐‘…2๐ฟ = ๐‘…2 โˆ– ๐‘…2

๐ป

Page 58: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

58

Plans 3 to 6

โ€ข Plans discussed so far

- ๐‘…1๐ฟ โ‹ˆ ๐‘…2

๐ฟ โ‹ˆ ๐‘…3๐ป โ‹ˆ ๐‘…4 โ‹ˆ ๐‘…5 โ‹ˆ ๐‘…6

- ๐‘…1๐ฟ โ‹ˆ ๐‘…2

๐ฟ โ‹ˆ ๐‘…3๐ฟ โ‹ˆ ๐‘…4

๐ป โ‹ˆ ๐‘…5 โ‹ˆ ๐‘…6

โ€ข Continue analogously to compute

- ๐‘…1๐ฟ โ‹ˆ ๐‘…2

๐ฟ โ‹ˆ ๐‘…3๐ป โ‹ˆ ๐‘…4 โ‹ˆ ๐‘…5 โ‹ˆ ๐‘…6

- ๐‘…1๐ฟ โ‹ˆ ๐‘…2

๐ฟ โ‹ˆ ๐‘…3๐ฟ โ‹ˆ ๐‘…4

๐ป โ‹ˆ ๐‘…5 โ‹ˆ ๐‘…6

- ๐‘…1๐ฟ โ‹ˆ ๐‘…2

๐ฟ โ‹ˆ ๐‘…3๐ฟ โ‹ˆ ๐‘…4

๐ฟ โ‹ˆ ๐‘…5๐ป โ‹ˆ ๐‘…6

- ๐‘…1๐ฟ โ‹ˆ ๐‘…2

๐ฟ โ‹ˆ ๐‘…3๐ฟ โ‹ˆ ๐‘…4

๐ฟ โ‹ˆ ๐‘…5๐ฟ โ‹ˆ ๐‘…6

๐ป

โ€ข What is missing?

Page 59: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

59

The 7-th Plan

โ€ข Join all light-only partitions with each other:

- ๐‘…1๐ฟ โ‹ˆ ๐‘…2

๐ฟ โ‹ˆ ๐‘…3๐ฟ โ‹ˆ ๐‘…4

๐ฟ โ‹ˆ ๐‘…5๐ฟ โ‹ˆ ๐‘…6

๐ฟ

โ€ข Input now satisfies the other degree constraint:

- โˆ€๐‘– โˆˆ {2,3,5,6}: โˆ€๐‘—: ๐œ‹๐ด๐‘–+1๐œŽ๐ด๐‘–=๐‘— ๐‘…๐‘– โ‰ค ๐‘›1/3

โ€ข Use decomposition ๐’ฏ2 for it!

Page 60: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

60

Analysis and Discussion

โ€ข Rewrite 6-cycle into 7 sub-queries

- Six of them use ๐’ฏ3, copying the heavy attribute to intermediate bags

- One uses ๐’ฏ2 on the all-light case

โ€ข Analysis

- Assigning input tuples to subqueries: O(๐‘›)

- Bag materialization: O(๐‘›5/3)

- Bag size: O(๐‘›5/3)

โ€ข Running Yannakakis on each of the 7 trees takes เทฉO ๐‘›5/3 + ๐‘Ÿ

- Beats single-tree complexity เทฉO ๐‘›2 + ๐‘Ÿ and WCO-join complexity เทฉO ๐‘›3

Page 61: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

61

Tree Decompositions: The Big Picture

โ€ข WCO join algorithms applied to a query Q: complexity determined by the AGM bound for Q

โ€ข Decomposing Q into a tree creates bags that each may have a lower AGM bound value (fractional hypertree width)

โ€ข This reduces time complexity

- Materialize each bag using a WCO join algorithm

- Run Yannakakis algorithm on the tree with materialized bags as input

โ€ข Design goal: minimize width of tree, but ensure that each input relation is covered by a bag and the tree is attribute-connected!

Page 62: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

62

Tree Decompositions: Degree Constraints

โ€ข Degree constraints generalize cardinality constraints and functional dependencies

โ€ข Enable improved complexity guarantees

โ€ข Application to general input (with only cardinality constraints):

- Partition input so that each partition satisfies stronger degree constraints

- Use appropriate tree for each partition

- Current frontier: submodular width

โ€ข Application to input DB with degree constraints:

- Degree-aware submodular width

[Khamis, Ngo, Suciu. What do shannon-type inequalities, submodular width, and disjunctive datalog have to do with one another? PODSโ€™17] https://doi.org/10.1145/3034786.3056105

[Joglekar, Rรฉ. It's all a matter of degree. Theory of Computing Systemsโ€™18] https://doi.org/10.1007/s00224-017-9811-8

[Marx. Tractable hypergraph properties for constraint satisfaction and conjunctive queries. J.ACMโ€™13] https://doi.org/10.1145/2535926

Page 63: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

63

Outline tutorial

โ€ข Part 1: Top-๐‘˜ (Wolfgang): ~20minโ€ข Part 2: Optimal Join Algorithms (Mirek): ~30min

โ€“ Lower Bound and the Yannakakis Algorithm

โ€“ Problems Caused by Cycles

โ€“ Tree Decompositions

โ€“ Summary and Further Reading

โ€ข Part 3: Ranked enumeration over joins (Nikolaos): ~40min

Page 64: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

64

Optimal-Joins Summary

โ€ข On acyclic CQs, the Yannakakis algorithmโ€™s complexity เทฉO ๐‘› + ๐‘Ÿmatches the lower bound

- Simple join-at-a-time query plan after semi-join reduction sweeps

โ€ข Lower bound cannot be matched for general cyclic CQs

โ€ข WCO joins achieve เทฉO ๐‘› + ๐‘ŸWC

- โ€œHolisticโ€ multi-way join approach

โ€ข Tree decomposition approaches achieve เทฉO ๐‘›๐‘‘ + ๐‘Ÿ

- Best ๐‘‘ is submodular width subw(Q) using multi-tree decomposition

- ๐‘›subw(๐‘„) = O(๐‘ŸWC), but usually better

โ€ข 6-cycle: ๐‘ŸWC = ๐‘›3, but multi-tree approach achieves เทฉO ๐‘›5/3 + ๐‘Ÿ

Page 65: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-๐‘˜

65

Further Reading

โ€ข Extensions of the query model

- Projections and aggregation

โ€ข Factorized DB

โ€ข Stronger notions of optimality

โ€ข โ€ฆand many more (see our tutorial paper)

[Khamis, Ngo, Rudra. FAQ: questions asked frequently. PODSโ€™16] https://doi.org/10.1145/2902251.2902280

[Bakibayev, Koฤiskรฝ, Olteanu, Zรกvodnรฝ. Aggregation and Ordering in Factorised Databases. PVLDBโ€™13] https://doi.org/10.14778/2556549.2556579

[Bakibayev, Olteanu, Zรกvodnรฝ. FDB: A Query Engine for Factorised Relational Databases. PVLDBโ€™12] https://doi.org/10.14778/2350229.2350242

[Olteanu, Schleich. Factorized databases. SIGMOD Recordโ€™16] https://doi.org/10.1145/3003665.3003667

[Olteanu, Zรกvodny. Size bounds for factorised representations of query results. TODSโ€™15] https://doi.org/10.1145/2656335

[Khamis, Rรฉ, Rudra. Joins via Geometric Resolutions: Worst Case and Beyond. TODSโ€™16] https://doi.org/10.1145/2967101

[Ngo, Nguyen, Re, Rudra. Beyond worst-case analysis for joins with minesweeper. PODSโ€™14] https://doi.org/10.1145/2594538.2594547