Upload
lambert-goodwin
View
234
Download
6
Embed Size (px)
Citation preview
CS4432: Database Systems II
Query Operator & Algebraic Expressions
1
Why SQL
2
SQL is a very-high-level language. Say “what to do” rather than “how to do it.” Avoid a lot of data-manipulation details needed in procedural
languages like C++ or Java.
Database management system figures out “best” way to execute query. Called “query optimization.”
Query Processing
3
SELECT pNumber, count(*) AS CNT
FROM Student
WHERE sNumber > 1
GROUP BY pNumber;
SQL Query
Query Plans
Query Example
4
SELECT B, D FROM R, S WHERE R.A = “c” and S.E = 2 and R.C=S.C
5
• How do we execute query?
- Form Cartesian product of all tables in FROM-clause- Select tuples that match WHERE-clause- Project columns that occur in SELECT-clause
One idea
6
R X S R.A R.B R.C S.C S.D S.E
a 1 10 10 x 2
a 1 10 20 y 2
. .
C 2 10 10 x 2 . .
Bingo!
Got one...
SELECT B, D FROM R, S WHERE R.A = “c” and S.E = 2 and R.C=S.C
7
But ?
Performance would be unacceptable!
We need a better approach for reasoning about queries, their execution orders and their respective costs
8
Formal Relational Query Languages
Relational Algebra: More operational, very useful for representing execution plans.
Operators working on relations
Core Relational Algebra (Recap)
9
Union, intersection, and difference. Usual set operations, both operands have the same relation schema.
Selection: picking certain rows.
Projection: picking certain columns.
Products and joins: compositions of relations.
Renaming of relations and attributes.
Grouping and Aggregation: Grouping matching tuples
Duplicate Elimination: eliminates identical copies except one
Sorting: Orders tuples based on a given criteria
Relational Algebra Express Query Plans
10
B,D
R.A=“c” S.E=2 R.C=S.C
X R S
Recap on Relational Algebra & Operators
11
Algebra Behind the Query Language
Relational Algebra
Set of operators that operate on relations
Operator semantics based on Set or Bag theory
Relational algebra form underlying basis (and optimization rules) for SQL
12
SELECT pNumber, count(*) AS CNT
FROM Student
WHERE sNumber > 1
GROUP BY pNumber;
Relational Algebra Basic operators
Set Operations (Union: , Intersection: ∪ ∩ ,difference: – ) Select: σ Project: π Cartesian product: x rename: ρ
More advanced operators, e.g., grouping and joins
The operators take one or two relations as inputs and produce a new relation as an output
One input unary operator, two inputs binary operator
13
Union over sets: Consider two relations R and S that are
union-compatible (same schema)
A B
1 2
3 4
R
A B
1 2
3 4
5 6
S
A B
1 2
3 4
5 6
R S
14
Binary Op.
Difference over sets: – R – S are the tuples that appear in R and not in S R & S must be union-compatible
Defined as: R – S = {t | t ∈R and t ∈ S}
A B
1 2
3 4
R
A B
1 2
5 6
S
A B
3 4
R – S
15
Binary Op.
R-S ≠ S-R
Intersection over sets: ∩
Consider two Relations R and S that are union-compatible
A B
1 2
3 4
R
A B
1 2
3 4
5 6
S
A B
1 2
3 4
R ∩ S
16
Binary Op.
Selection: σ
Select: σc (R): c is a condition on R’s attributes Select subset of tuples from R that satisfy
selection condition c
A B C
1 2 5
3 4 6
1 2 7
R σ(C ≥ 6) (R)
A B C
3 4 6
1 2 7
17
Unary Op.
Selection: Example
Rσ ((A=B) ^ (D>5)) (R)
18
σ (D > C) (R)
Project: π πA1, A2, …, An (R), with A1, A2, …, An attributes AR
returns all tuples in R, but only columns A1, A2, …, An
A1, A2, …, An are called Projection List
A B C
1 2 5
3 4 6
1 2 7
1 2 8
R πA, C (R)
A C
1 5
3 6
1 7
1 8
19
Unary Op.
Extended Projection: πL (R)Example
π C, VA, X C*3+B (R)
20
A B C
1 2 5
3 4 6
1 2 7
1 2 8
R
C V X
5 1 17
6 3 22
7 1 23
8 1 26
Rename column A to VCompute this expression and call it X
Cross Product (Cartesian Product): X
RS
R X S
21
Each tuple in R joined with each tuple is S R x S = {t q | t ∈ R and q ∈ S}
Binary Op.
Natural Join: R ⋈ S
R S
R S ⋈
22
Implicit condition (R.B = S.B and R.D = S.D)
Binary Op.
An implicit equality condition on the common columns
Theta Join: R ⋈C S
A join based on any arbitrary condition C
It is defined as :
R ⋈C S = (σC (R X S))
A B
1 2
3 2
R
D C
2 3
4 5
4 5
S
R ⋈ R.A>=S.CS
A B D C
3 2 2 3
23
Recommendation: Always use Theta join(more explicit and more clear)
Binary Op.
Duplicate Elimination: (R) Delete all duplicate records
Convert a Bag (allows duplicates) to a Set (does not allow duplicates)
RA B
1 2
3 4
1 2
1 2
(R)
A B
1 2
3 4
24
Unary Op.
Grouping & Aggregation operator:
Grouing & Aggregate operation in relational algebra g1,g2, …gm, F1(A1), F2(A2), …Fn(An) (R)
25
Unary Op.
Group by these columns (can be empty)
Aggregation functions applied over each group
avg: average value
min: minimum value
max: maximum value
sum: sum of values
count: number of values
Grouping & Aggregation Operator: Example
sum(c)(R)
R S
branch_name,sum(balance)(S)
26
Assignment Operator: Write query as a sequence of line consisting of:
Series of assignments
Result expression containing the final answer
May use a variable multiple times in subsequent expressions
Example:
R1 (σ ((A=B) ^ (D>5)) (R – S)) ∩ W
R2 R1 ⋈(R.A = T.C) T
Result R1 U R2
27
Banking Example branch (branch_name, branch_city, assets)
customer (customer_name, customer_street, customer_city)
account (account_number, branch_name, balance)
loan (loan_number, branch_name, amount)
depositor (customer_name, account_number)
borrower (customer_name, loan_number)
28
Example Queries
Find customer names having account balance below 100 or above 10,000
πcustomer_name (depositor ⋈ πaccount_number(σbalance <100 OR balance > 10,000 (account)))
29
Example Queries
30
Example Queries (Cont’d)
31
Example Queries
Find customers’ names who have neither accounts nor loans
πcustomer_name(customer) - (πcustomer_name(borrower) U πcustomer_name(depositer))
32
Example Queries
33
For branches that gave loans > 100,000 or hold accounts with balances >50,000, report the branch name along whether it is reported because of a loan or an account
R1 πbranch_name, ‘Loan’ As Type (σamount >100,000 (loan))
R2 πbranch_name, ‘Account’ As Type(σbalance > 50,000 (account)))
Result R1 U R2
Example Queries
Find customer names having loans with sum > 20,000
πcustomer_name (σsum > 20,000 (customer_name, sum sum(amount)(loan borrower⋈ )))
34
Example Queries
Find the branch name with the largest number of accounts
R1 branch_name, countAccounts count(account_number)(account)
R2 Max max(countAccounts)(R1)
Result πbranch_name(R1 ⋈countAccounts = Max R2)
35
Summary of Relational-Algebra Operators
Set operators Union, Intersection, Difference
Selection & Projection & Extended Projection
Joins Natural, Theta, Outer join
Rename & Assignment
Duplicate elimination
Grouping & Aggregation
36