36
CS4432: Database Systems II Query Operator & Algebraic Expressions 1

CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Embed Size (px)

Citation preview

Page 1: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

CS4432: Database Systems II

Query Operator & Algebraic Expressions

1

Page 2: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Why SQL

2

SQL is a very-high-level language. Say “what to do” rather than “how to do it.” Avoid a lot of data-manipulation details needed in procedural

languages like C++ or Java.

Database management system figures out “best” way to execute query. Called “query optimization.”

Page 3: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Query Processing

3

SELECT pNumber, count(*) AS CNT

FROM Student

WHERE sNumber > 1

GROUP BY pNumber;

SQL Query

Query Plans

Page 4: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Query Example

4

SELECT B, D FROM R, S WHERE R.A = “c” and S.E = 2 and R.C=S.C

Page 5: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

5

• How do we execute query?

- Form Cartesian product of all tables in FROM-clause- Select tuples that match WHERE-clause- Project columns that occur in SELECT-clause

One idea

Page 6: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

6

R X S R.A R.B R.C S.C S.D S.E

a 1 10 10 x 2

a 1 10 20 y 2

. .

C 2 10 10 x 2 . .

Bingo!

Got one...

SELECT B, D FROM R, S WHERE R.A = “c” and S.E = 2 and R.C=S.C

Page 7: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

7

But ?

Performance would be unacceptable!

We need a better approach for reasoning about queries, their execution orders and their respective costs

Page 8: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

8

Formal Relational Query Languages

Relational Algebra: More operational, very useful for representing execution plans.

Operators working on relations

Page 9: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Core Relational Algebra (Recap)

9

Union, intersection, and difference. Usual set operations, both operands have the same relation schema.

Selection: picking certain rows.

Projection: picking certain columns.

Products and joins: compositions of relations.

Renaming of relations and attributes.

Grouping and Aggregation: Grouping matching tuples

Duplicate Elimination: eliminates identical copies except one

Sorting: Orders tuples based on a given criteria

Page 10: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Relational Algebra Express Query Plans

10

B,D

R.A=“c” S.E=2 R.C=S.C

X R S

Page 11: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Recap on Relational Algebra & Operators

11

Page 12: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Algebra Behind the Query Language

Relational Algebra

Set of operators that operate on relations

Operator semantics based on Set or Bag theory

Relational algebra form underlying basis (and optimization rules) for SQL

12

SELECT pNumber, count(*) AS CNT

FROM Student

WHERE sNumber > 1

GROUP BY pNumber;

Page 13: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Relational Algebra Basic operators

Set Operations (Union: , Intersection: ∪ ∩ ,difference: – ) Select: σ Project: π Cartesian product: x rename: ρ

More advanced operators, e.g., grouping and joins

The operators take one or two relations as inputs and produce a new relation as an output

One input unary operator, two inputs binary operator

13

Page 14: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Union over sets: Consider two relations R and S that are

union-compatible (same schema)

A B

1 2

3 4

R

A B

1 2

3 4

5 6

S

A B

1 2

3 4

5 6

R S

14

Binary Op.

Page 15: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Difference over sets: – R – S are the tuples that appear in R and not in S R & S must be union-compatible

Defined as: R – S = {t | t ∈R and t ∈ S}

A B

1 2

3 4

R

A B

1 2

5 6

S

A B

3 4

R – S

15

Binary Op.

R-S ≠ S-R

Page 16: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Intersection over sets: ∩

Consider two Relations R and S that are union-compatible

A B

1 2

3 4

R

A B

1 2

3 4

5 6

S

A B

1 2

3 4

R ∩ S

16

Binary Op.

Page 17: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Selection: σ

Select: σc (R): c is a condition on R’s attributes Select subset of tuples from R that satisfy

selection condition c

A B C

1 2 5

3 4 6

1 2 7

R σ(C ≥ 6) (R)

A B C

3 4 6

1 2 7

17

Unary Op.

Page 18: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Selection: Example

Rσ ((A=B) ^ (D>5)) (R)

18

σ (D > C) (R)

Page 19: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Project: π πA1, A2, …, An (R), with A1, A2, …, An attributes AR

returns all tuples in R, but only columns A1, A2, …, An

A1, A2, …, An are called Projection List

A B C

1 2 5

3 4 6

1 2 7

1 2 8

R πA, C (R)

A C

1 5

3 6

1 7

1 8

19

Unary Op.

Page 20: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Extended Projection: πL (R)Example

π C, VA, X C*3+B (R)

20

A B C

1 2 5

3 4 6

1 2 7

1 2 8

R

C V X

5 1 17

6 3 22

7 1 23

8 1 26

Rename column A to VCompute this expression and call it X

Page 21: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Cross Product (Cartesian Product): X

RS

R X S

21

Each tuple in R joined with each tuple is S R x S = {t q | t ∈ R and q ∈ S}

Binary Op.

Page 22: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Natural Join: R ⋈ S

R S

R S ⋈

22

Implicit condition (R.B = S.B and R.D = S.D)

Binary Op.

An implicit equality condition on the common columns

Page 23: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Theta Join: R ⋈C S

A join based on any arbitrary condition C

It is defined as :

R ⋈C S = (σC (R X S))

A B

1 2

3 2

R

D C

2 3

4 5

4 5

S

R ⋈ R.A>=S.CS

A B D C

3 2 2 3

23

Recommendation: Always use Theta join(more explicit and more clear)

Binary Op.

Page 24: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Duplicate Elimination: (R) Delete all duplicate records

Convert a Bag (allows duplicates) to a Set (does not allow duplicates)

RA B

1 2

3 4

1 2

1 2

(R)

A B

1 2

3 4

24

Unary Op.

Page 25: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Grouping & Aggregation operator:

Grouing & Aggregate operation in relational algebra g1,g2, …gm, F1(A1), F2(A2), …Fn(An) (R)

25

Unary Op.

Group by these columns (can be empty)

Aggregation functions applied over each group

avg: average value

min: minimum value

max: maximum value

sum: sum of values

count: number of values

Page 26: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Grouping & Aggregation Operator: Example

sum(c)(R)

R S

branch_name,sum(balance)(S)

26

Page 27: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Assignment Operator: Write query as a sequence of line consisting of:

Series of assignments

Result expression containing the final answer

May use a variable multiple times in subsequent expressions

Example:

R1 (σ ((A=B) ^ (D>5)) (R – S)) ∩ W

R2 R1 ⋈(R.A = T.C) T

Result R1 U R2

27

Page 28: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Banking Example branch (branch_name, branch_city, assets)

customer (customer_name, customer_street, customer_city)

account (account_number, branch_name, balance)

loan (loan_number, branch_name, amount)

depositor (customer_name, account_number)

borrower (customer_name, loan_number)

28

Page 29: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Example Queries

Find customer names having account balance below 100 or above 10,000

πcustomer_name (depositor ⋈ πaccount_number(σbalance <100 OR balance > 10,000 (account)))

29

Page 30: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Example Queries

30

Page 31: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Example Queries (Cont’d)

31

Page 32: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Example Queries

Find customers’ names who have neither accounts nor loans

πcustomer_name(customer) - (πcustomer_name(borrower) U πcustomer_name(depositer))

32

Page 33: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Example Queries

33

For branches that gave loans > 100,000 or hold accounts with balances >50,000, report the branch name along whether it is reported because of a loan or an account

R1 πbranch_name, ‘Loan’ As Type (σamount >100,000 (loan))

R2 πbranch_name, ‘Account’ As Type(σbalance > 50,000 (account)))

Result R1 U R2

Page 34: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Example Queries

Find customer names having loans with sum > 20,000

πcustomer_name (σsum > 20,000 (customer_name, sum sum(amount)(loan borrower⋈ )))

34

Page 35: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Example Queries

Find the branch name with the largest number of accounts

R1 branch_name, countAccounts count(account_number)(account)

R2 Max max(countAccounts)(R1)

Result πbranch_name(R1 ⋈countAccounts = Max R2)

35

Page 36: CS4432: Database Systems II Query Operator & Algebraic Expressions 1

Summary of Relational-Algebra Operators

Set operators Union, Intersection, Difference

Selection & Projection & Extended Projection

Joins Natural, Theta, Outer join

Rename & Assignment

Duplicate elimination

Grouping & Aggregation

36