25
Mike Carey UC Irvine [email protected] M. Carey, Winter 2020: CS 190 (CS122D Beta) CS190 (CS122D): Beyond SQL Data Management Lecture #2

Beyond SQL Data Management

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Beyond SQL Data Management

Mike CareyUC Irvine

[email protected]

M. Carey, Winter 2020: CS 190 (CS122D Beta)

CS190 (CS122D):Beyond SQL Data Management

⎯ Lecture #2 ⎯

Page 2: Beyond SQL Data Management

Announcements

M. Carey, Winter 2020: CS 190 (CS122D Beta) 1

• Might start bringing your laptops to class• We may want to play with systems in class• We might have “lab time” in class later on, too

• Keep your eyes on the course wiki page• https://grape.ics.uci.edu/wiki/asterix/wiki/cs190-2020-winter

• Be sure to drop in daily on the Piazza page• https://piazza.com/uci/winter2020/cs190/home

• Hopefully your cobwebs have been cleared• If not: Stanford Self-Paced Intro to DBMS Course

• Today: Parallel (Relational) Database Systems• But first: https://checkin.ics.uci.edu/

Page 3: Beyond SQL Data Management

What If My Problem Size Grows?

M. Carey, Winter 2020: CS 190 (CS122D Beta) 2

“Scaling Up” “Scaling Out”vs.

Parallelism to the rescue...!

Page 4: Beyond SQL Data Management

Just How “Big” Is My Data?

M. Carey, Winter 2020: CS 190 (CS122D Beta) 3

Cores

MainMemory

Disks

Biggerthan the system’s

aggregate memory!

See: J. Gray and G. Putzolu, “The 5 Minute Rule for Trading Memory for Disk Accesses and The 10 Byte Rule for Trading Memory for CPU Time”, SIGMOD 1987, pp. 395-398. (And subsequent 10-year follow-ups by Gray and also G. Graefe...)

Page 5: Beyond SQL Data Management

Parallel Performance Objectives

M. Carey, Winter 2020: CS 190 (CS122D Beta) 4

Source: Principles of Distributed Database Systems, 3rd Edition,T. Özsu and P. Valduriez, Springer, 2011.

Page 6: Beyond SQL Data Management

• Linear speed-up: Adding more resources results in proportionally faster execution time for a fixed amount of data

• Batch scale-up: Adding more resources can handle a proportionally bigger problem in the same amount of time

• Transaction scale-up: Adding more resources allows the handling of proportionally more concurrent requests

100GB 100GB

4X Speedup

Performance Objectives (cont.)

M. Carey, Winter 2020: CS 190 (CS122D Beta) 5

10X Batch Scaleup

100GB 1 TB

Page 7: Beyond SQL Data Management

Alternative Parallel DB Architectures

M. Carey, Winter 2020: CS 190 (CS122D Beta) 6

Shared Everything (SE) Shared Disk (SD)

Shared Nothing (SN)Notable research systems

• Gamma, Grace, Bubba, …

Many commercial examples• Teradata• IBM DB2 Parallel• Microsoft PDW• Vertica• …

Over 30 years of R&D!

Page 8: Beyond SQL Data Management

Teradata: The Big Gorilla (J)

M. Carey, Winter 2020: CS 190 (CS122D Beta) 7

A 1TB Teradatasystem on its way

to K-Mart!

(Thanks to Todd Walter,a retired Teradata CTO)

Page 9: Beyond SQL Data Management

M. Carey, Winter 2020: CS 190 (CS122D Beta) 8

Source: Principles of Distributed Database Systems, 3rd Edition,T. Özsu and P. Valduriez, Springer, 2011.

(including pipelining)

• Partitioned parallelism: Many machines each running the same step on a different subset (partition) of the data

• Pipelined parallelism: Many machines each working on a different step of a multi-step process

Two Forms of Intra-Query Parallelism

Page 10: Beyond SQL Data Management

M. Carey, Winter 2020: CS 190 (CS122D Beta) 9

• For many years, relational DBMSs were the most (only?) successful/commercial application of parallelism• Early years: Teradata, Tandem, Thinking Machines, KSR, ...• Now every major RDBMS vendor has some || offering• (Now we have parallel Web search and ML too)

DBMS: The || Success Story

• Reasons for parallel RDBMS success include:• Set-oriented processing

(partitioned parallelism)• Natural pipelining (plan trees)• Applicability of inexpensive

hardware to the problem• Users/application developers

don’t have to think about ||à SQL’s programming model

is scale-independent!

Notes:• Storage manager

and runtime executor per node•Upper layers

orchestrate them

•One way in/out: via the SQL door

SQL

Page 11: Beyond SQL Data Management

M. Carey, Winter 2020: CS 190 (CS122D Beta) 10

• Intra-operator parallelism:• Get all machines working together to concurrently compute a

given operation (e.g., a scan, sort, or join operator)

• Inter-operator parallelism:• Get different operations (query operators) running concurrently

on different machines within the system

• Inter-query parallelism:• Admit multiple queries into the system at once and allow them

to run concurrently (possibly on different machines)

Parallelism in RDBMS Terms

(We’ll focus mainly on this form today)

Page 12: Beyond SQL Data Management

M. Carey, Winter 2020: CS 190 (CS122D Beta) 11

Automatic Data PartitioningPartitioning a SQL table’s data

Range Hash Round Robin

Notes:- Shared disk (SD) and shared memory (SM) less sensitive to data partitioning - Shared nothing (SN) benefits from "good" partitioning (to avoid skew)

A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z

Good for equijoins, exact-match queries,

and range queries

Good for equijoins and exact match queries

Good to spread load (e.g., full scans)

(incoming data)

Page 13: Beyond SQL Data Management

M. Carey, Winter 2020: CS 190 (CS122D Beta) 12

• Scan in parallel and merge (a.k.a. union all)

• Note: Selections may not require all sites for range or hash partitioning, but always do for round robin

• Secondary indexes can be constructed on each partition• Indexes useful for local lookups, as expected• Per-partition index lookups will occur in parallel• However, what about unique indexes...?

(May not always want primary key partitioning...)

Parallel Scan/Select Queries

vs.

Page 14: Beyond SQL Data Management

M. Carey, Winter 2020: CS 190 (CS122D Beta) 13

Secondary Indexes

A..C D..F G...M N...R S..�

Base Table

A..Z

Base Table

A..Z A..Z A..Z A..Z

• Secondary indexes become “interesting” in the face of partitioning...

• Local secondary indexes: Partition secondary indexes by the base table’s primary key

• Inserts are then local (unless unique index, if allowed)• Lookups, however, then go to ALL indexes• Query-oriented systems do this (including AsterixDB J)

• Global secondary indexes: Partition secondary indexes by the secondary key range

• Inserts must then hit 2 nodes (base table + index)• Ditto for index lookups (index + base table)• Uniqueness is easy to enforce• Performance for short queries/updates is very good

• Teradata’s index partitioning solution (a hybrid):• Partition non-unique indices by base table key• Partition unique indices by secondary key

Page 15: Beyond SQL Data Management

M. Carey, Winter 2020: CS 190 (CS122D Beta) 14

Local Joins: Hybrid Hash JoinSELECT * FROM Item i, Manufacturer mWHERE i.mid = m.id;

Rules:1) In each Join, Probe is blocked by Build2) Hash table (on Item.mid) should be in memory3) One output produced (concurrent with Probe)4) Spills to disk in case of insufficient memory

Memory >= Build Relation + Hash Table

i m

B PHash Join

PBB P

i m

B PPBB P

Memory < Build Relation + Hash Table

i1

i2

i3

i4

m1

m2

m3

m4

PB PHash Join

(Thanks to UCI Ph.D. student Shiva Jahangiri!)

Page 16: Beyond SQL Data Management

M. Carey, Winter 2020: CS 190 (CS122D Beta) 15

Parallel Hash Join

h1(jk)......

... ...

RS

RS

h2(jk)

h2(jk)

RS

RS

Hash-distribute the rows from each table to the join processing nodes based on join key (jk) values (if/as needed)

Page 17: Beyond SQL Data Management

M. Carey, Winter 2020: CS 190 (CS122D Beta) 16

......

... ...R join S = union (Rij join Sij)

RS

RS RS

RS

Parallel Hash Join (cont.)

(Tij ≡ Local partition j of global partition i of input table T)

Page 18: Beyond SQL Data Management

M. Carey, Winter 2020: CS 190 (CS122D Beta) 17

Multiple Hash Join Queries

B1

B2

B3

B4

P1

P2

P3

P4

B1B2

B3

B4

P1 P2P4 B1

B2

B3

B4

P1

P3 P2

P3

P4

Left Deep Tree Bushy Tree Right Deep Tree

- Low parallelism (Sequential)- Low memory usage (2 relations

at a time)

- Independent parallelism- Plan-dependent memory

usage

- High parallelism- High memory usage

(Thanks to UCI Ph.D. student Shiva Jahangiri!)

Page 19: Beyond SQL Data Management

Some Parallel Join Methods

M. Carey, Winter 2020: CS 190 (CS122D Beta) 18

• Parallel hash join (we just saw this one)

• Each dataset (R, S) is repartitioned (if needed) by hashing their records on the join key

• A local hash join is performed between each of the resulting partition pairs

• Nested loops index join• The “outer” dataset R is replicated to all partitions of the “inner” indexed

dataset S• Arriving R tuples are used (as they arrive – pipelining!) to probe the S index

for matches

• Broadcast join• The build dataset R is broadcast (replicated) to all partitions of the probe

dataset and arriving tuples are treated as the build tuples in a local hash join• The probe dataset S is then used to probe the build dataset

Page 20: Beyond SQL Data Management

M. Carey, Winter 2020: CS 190 (CS122D Beta) 19

Parallel Grouped Aggregation

{ "name": "dubbel","brewery_id": 3 }{ "name": "saison", "brewery_id": 3 }

{ "name": "pils", "brewery_id": 2 }

{ "num_beers": 1, "brewery_id": 2 }{ "num_beers": 2, "brewery_id": 3 }

{ "name": "kölsch", "brewery_id": 2 }{ "name": "lager", "brewery_id": 1 }

{ "name": "ale", "brewery_id": 1 }

{ "num_beers": 2, "brewery_id": 1 }{ "num_beers": 1, "brewery_id": 2 }

{ "name": "tripel", "brewery_id": 3 }{ "name": "stout", "brewery_id": 1 }

{ "name": "weizen","brewery_id": 2 }

{ "num_beers": 1, "brewery_id": 1 }{ "num_beers": 1, "brewery_id": 2 }{ "num_beers": 1, "brewery_id": 3 }

{ "num_beers": 2, "brewery_id": 3 }{ "num_beers": 2, "brewery_id": 1 }{ "num_beers": 1, "brewery_id": 1 }{ "num_beers": 1, "brewery_id": 3 }

{ "num_beers": 3, "brewery_id": 1 }{ "num_beers": 3, "brewery_id": 3 }

{ "num_beers": 1, "brewery_id": 2 }{ "num_beers": 1, "brewery_id": 2 }{ "num_beers": 1, "brewery_id": 2 }

{ "num_beers": 3, "brewery_id": 2 }

localaggregation

hashrepartition

finalaggregation

SELECT br.brewery_id, COUNT(*) AS num_beersFROM beers brGROUP BY br.brewery_id;

• Hash-based partitioned parallelism also works for grouping!• In this case, hash partitioning is based on the grouping attribute(s)

Page 21: Beyond SQL Data Management

M. Carey, Winter 2020: CS 190 (CS122D Beta) 20

Parallel Sorting w/Global Merge

{ "name": "dubbel","brewery_id": 3 }{ "name": "saison", "brewery_id": 3 }

{ "name": "pils", "brewery_id": 2 }

{ "name": "kölsch", "brewery_id": 2 }{ "name": "lager", "brewery_id": 1 }

{ "name": "ale", "brewery_id": 1 }

{ "name": "tripel", "brewery_id": 3 }{ "name": "stout", "brewery_id": 1 }

{ "name": "weizen","brewery_id": 2 }

localsort

“repartition”

finalmerge

SELECT br.brewery_id,br. name

FROM beers brORDER BY br.name;

• Partitioned parallelism also works for sorting!• The O(N lg N) step is parallelized, followed by a global merge

{ "name": "dubbel","brewery_id": 3 }{ "name": "pils", "brewery_id": 2 }

{ "name": "saison", "brewery_id": 3 }

{ "name": "ale", "brewery_id": 1 }{ "name": "kölsch", "brewery_id": 2 }{ "name": "lager", "brewery_id": 1 }

{ "name": "stout", "brewery_id": 1 }{ "name": "tripel", "brewery_id": 3 }

{ "name": "weizen","brewery_id": 2 }

{ "name": "ale", "brewery_id": 1 }{ "name": "dubbel","brewery_id": 3 }{ "name": "kölsch", "brewery_id": 2 }{ "name": "lager", "brewery_id": 1 }{ "name": "pils", "brewery_id": 2 }

{ "name": "saison", "brewery_id": 3 }{ "name": "stout", "brewery_id": 1 }{ "name": "tripel", "brewery_id": 3 }

{ "name": "weizen","brewery_id": 2 }

Page 22: Beyond SQL Data Management

M. Carey, Winter 2020: CS 190 (CS122D Beta) 21

Range-Based Parallel Sorting

{ "name": "dubbel","brewery_id": 3 }{ "name": "saison", "brewery_id": 3 }

{ "name": "pils", "brewery_id": 2 }

{ "name": "kölsch", "brewery_id": 2 }{ "name": "lager", "brewery_id": 1 }

{ "name": "ale", "brewery_id": 1 }

{ "name": "tripel", "brewery_id": 3 }{ "name": "stout", "brewery_id": 1 }

{ "name": "weizen","brewery_id": 2 }

scan(to sample)

range(re)partition

[‘k’, ‘s’]

localsort

SELECT br.brewery_id,br. name

FROM beers brORDER BY br.name;

• Also possible to eliminate a global merge• Scan (sample), range partition, and sort ranges

{ "name": "lager", "brewery_id": 1 }{ "name": "pils", "brewery_id": 2 }

{ "name": "saison", "brewery_id": 3 }

{ "name": "ale", "brewery_id": 1 }{ "name": "dubbel","brewery_id": 3 }{ "name": "kölsch", "brewery_id": 2 }

{ "name": "stout", "brewery_id": 1 }{ "name": "tripel", "brewery_id": 3 }

{ "name": "weizen","brewery_id": 2 }

{ "name": "lager", "brewery_id": 1 }{ "name": "saison", "brewery_id": 3 }

{ "name": "pils", "brewery_id": 2 }

{ "name": "dubbel","brewery_id": 3 }{ "name": "kölsch", "brewery_id": 2 }

{ "name": "ale", "brewery_id": 1 }

{ "name": "dubbel","brewery_id": 3 }{ "name": "saison", "brewery_id": 3 }

{ "name": "pils", "brewery_id": 2 }

{ "name": "kölsch", "brewery_id": 2 }{ "name": "lager", "brewery_id": 1 }

{ "name": "ale", "brewery_id": 1 }

{ "name": "tripel", "brewery_id": 3 }{ "name": "stout", "brewery_id": 1 }

{ "name": "weizen","brewery_id": 2 }

{ "name": "tripel", "brewery_id": 3 }{ "name": "stout", "brewery_id": 1 }

{ "name": "weizen","brewery_id": 2 }

Page 23: Beyond SQL Data Management

M. Carey, Winter 2020: CS 190 (CS122D Beta) 22

• ||-ism occurs naturally in relational query processing

• Both pipelined and partitioned ||-ism arise

• Intra-operator, inter-operator, and inter-query ||-ism

• Shared Nothing is the dominant choice for scaling out

• Shared Disk is used too (e.g., Oracle), but is less common

• The cloud may bring SD back, however...!

• Shared Memory approach is actually easiest but costly

• And there are limits to scaling up – and then what?

• Shared Nothing is cheap, scales well, supports incremental

growth and has nice fault isolation – but harder to implement

Parallel RDBMS Summary

Page 24: Beyond SQL Data Management

M. Carey, Winter 2020: CS 190 (CS122D Beta) 23

• Data layout choices are important!• Hash, range, and round-robin partitioning options• Local vs. global secondary index layout

• Most DB operations can be done partition-||• Select, project, join, ...• Sorting, grouping/aggregation, ...

• Complex queries are supportable• Involve a mix of pipelining and blocking steps• Note: Query optimization can be challenging!

• Transactions and consistency are “interesting” for SN• Need two-phase commit if transactions span nodes

Parallel RDBMS Summary (cont.)

Page 25: Beyond SQL Data Management

Questions...?

M. Carey, Winter 2020: CS 190 (CS122D Beta) 24