Upload
truongtu
View
224
Download
0
Embed Size (px)
Citation preview
INFS614 2
Example Instances
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
sid bid day22 101 10/10/9658 103 11/12/96
R1
S1
S2
INFS614 3
SQL: Seen so far… ❖ Syntax of Basic SQL query:
– Relation-list (FROM) – Target-list (SELECT) – Qualification (WHERE)
❖ Semantic: Conceptual Evaluation Strategy; ❖ String operators: LIKE, NOT LIKE; ❖ Set operators: Union (all), Intersect (all), Except
(all); ❖ Qualification involving sets: IN, EXISTS,
UNIQUE, ANY, ALL (and their negated version); ❖ Nested queries.
INFS614 4
Post Processing ❖ Processes the result of an SQL query: ❖ 1) Order by: sort the result tuples by any
column(s), (even columns not appearing in the SELECT clause).
❖ 2) Distinct: Duplicate removal in result tuples ❖ Example:
❖ Aggregates ... (later in lecture)
SELECT Distinct S.sname FROM Sailors S, Reserves R WHERE S.sid=R.sid and R.bid=103 Order by S.sid asc;
INFS614 5
Order By: Example
• Sort order: • ASC : Ascending (default) • DESC : Descending
• Listed by priority • E.g., ties in S.age ordered by
S.sname
SELECT S.sname, S.age FROM Sailors S WHERE S.sname LIKE ‘%U%’ ORDER BY S.age,S.sname
sid sname rating age 28 yuppy 9 35.0 31 lubber 8 55.0 32 andy 8 25.5 44 guppy 5 35.0 58 rusty 10 35.0 71 zorba 10 16.0
sid sname rating age 44 guppy 5 35.0 58 rusty 10 35.0 28 yuppy 9 35.0 31 lubber 8 55.0
S.age ASC, S.sname DESC
sid sname rating age 28 yuppy 9 35.0 58 rusty 10 35.0 44 guppy 5 35.0 31 lubber 8 55.0
INFS614 6
Aggregate Operators ❖ Significant extension of relational algebra.
COUNT (*) COUNT ( [DISTINCT] A) SUM ( [DISTINCT] A) AVG ( [DISTINCT] A) MAX (A) MIN (A)
single column
INFS614 7
Aggregate Operators COUNT (*) COUNT ( [DISTINCT] A) SUM ( [DISTINCT] A) AVG ( [DISTINCT] A) MAX (A) MIN (A)
SELECT AVG (S.age) FROM Sailors S WHERE S.rating=10
SELECT COUNT (*) FROM Sailors S
SELECT AVG ( DISTINCT S.age) FROM Sailors S WHERE S.rating=10
SELECT S.sname FROM Sailors S WHERE S.rating= (SELECT MAX(S2.rating) FROM Sailors S2)
single column SELECT COUNT (DISTINCT S.rating) FROM Sailors S WHERE S.sname=‘Bob’
INFS614 8
Find name and age of the oldest sailor(s)
❖ The first query is illegal! (We’ll look into the reason a bit later, when we discuss GROUP BY.)
❖ The third query is equivalent to the second query, and is allowed in the SQL/92 standard, but is not supported in many systems.
SELECT S.sname, MAX (S.age) FROM Sailors S
SELECT S.sname, S.age FROM Sailors S WHERE S.age = (SELECT MAX (S2.age) FROM Sailors S2)
SELECT S.sname, S.age FROM Sailors S WHERE (SELECT MAX (S2.age) FROM Sailors S2) = S.age
INFS614 9
GROUP BY and HAVING ❖ So far, we’ve applied aggregate operators to
all (qualifying) tuples. Sometimes, we want to apply them to each of several groups of tuples.
❖ Consider: Find the age of the youngest sailor for each rating level. – In general, we don’t know how many rating levels
exist, and what the rating values for these levels are!
– Suppose we know that rating values go from 1 to 10; we can write 10 queries that look like this (!):
SELECT MIN (S.age) FROM Sailors S WHERE S.rating = i
For i = 1, 2, ... , 10:
INFS614 10
Solution with GROUP BY
❖ Query: Find the age of the youngest sailor for each rating level.
❖ This query:
❖ 1) Patitions Sailors S tuples into groups, according to their s.rating value
❖ 2) Returns one tuple per partion including: a) the s.rating value, b) the minimum s.age for that partition.
SELECT S.rating, MIN (S.age) FROM Sailors S GROUP BY S.rating
INFS614 11
Queries With GROUP BY and HAVING
❖ The grouping-list forms a partition of tuples with the same value for attributes in the list (e.g., same s.rating value).
❖ The target-list consists of an (i) attribute list (e.g., s.rating) and (ii) aggregate terms (e.g., MIN (S.age)).
❖ The attribute list must be a subset of the grouping-list – This is because each target-list attribute must have a single value
per group/partition!
SELECT [DISTINCT] target-list FROM relation-list WHERE qualification GROUP BY grouping-list [HAVING group-qualification]
INFS614 12
Conceptual Evaluation ❖ The cross-product of relation-list is computed, tuples that
fail qualification are discarded, `unnecessary’ fields are deleted.
❖ The remaining tuples are partitioned into groups by the value of attributes in grouping-list.
❖ The group-qualification is then applied to eliminate some groups.
– Any attribute in group-qualification that is not an argument of an aggregate op must also appear in the grouping-list.
– This is because expressions in group-qualification must have a single value per group!
❖ One answer tuple is generated per qualifying group.
INFS614 13
For each red boat, find the number of reservations for this boat
❖ Grouping over a join of two relations.
❖ What do we get if we remove B.color=‘red’ from the WHERE clause and add a HAVING clause with this condition? – ... an error!
SELECT B.bid, COUNT (*) AS rescount FROM Boats B, Reserves R WHERE R.bid=B.bid AND B.color=‘red’ GROUP BY B.bid
(B.color is not in the grouping list)
INFS614 14
For each sailor with more than three reservations, find the number of his reservations
SELECT R.sid, COUNT (*) AS rescount FROM Reserves R GROUP BY R.sid HAVING COUNT(*) > 3
Find the age of the youngest sailor for each rating level > 5 which has an average age > 25
SELECT S.rating, MIN(S.age) AS minAge FROM Sailors S WHERE S.rating > 5 GROUP BY S.rating HAVING AVG(S.age) > 25
SELECT S.rating, MIN(S.age) AS minAge FROM Sailors S GROUP BY S.rating HAVING S.rating > 5 AND AVG(S.age) > 25
INFS614 15
SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age > 18 GROUP BY S.rating HAVING COUNT (*) > 1
sid sname rating age 22 Dustin 7 450 31 lubber 8 55.0 32 Andy 8 25.5 71 Zorba 10 16.0 64 Horatio 7 35.0 29 brutus 1 33.0 58 rusty 10 35.0 74 Horatio 9 40.0 85 Art 3 25.5 95 Bob 3 63.5
sid sname rating age 22 dustin 7 45.0 31 lubber 8 55.0 32 andy 8 25.5 64 horatio 7 35.0 29 brutus 1 33.0 58 rusty 10 35.0 74 horatio 9 40.0 85 art 3 25.5 95 bob 3 63.5
Answer relation
rating 3 25.5 7 35.0 8 25.5
Find the age of the youngest sailor older than 18, for each rating with at least 2 (such) sailors older than 18
rating age 1 33.0 3 25.5 3 63.5 7 45.0 7 35.0 8 55.0 8 25.5 9 40.0 10 35.0
• GROUP BY applied • S.rating and S.age are
mentioned in the SELECT, GROUP BY, HAVING clauses; other attributes discarded
• The 2nd column of result is unnamed. (Use AS to name it.)
INFS614 16
Find the age of the youngest sailor with age > 18, for each rating with at least 2 sailors (of any age!)
• HAVING clause includes a correlated subquery. • Counts partitions without an S.age>18 predicate • What if HAVING clause is replaced by:
– HAVING COUNT(*) >1 (previous query)? – WHERE clause always applied before HAVING clause!!
SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age > 18 GROUP BY S.rating HAVING 1 < (SELECT COUNT (*) FROM Sailors S2 WHERE S.rating=S2.rating)
Computes total number of sailors who have rating equal to S.rating
INFS614 17
Find the ratings with the minimum average sailor age.
❖ Aggregate operations cannot be nested! ❖ The following is WRONG:
SELECT S.rating FROM Sailors S WHERE AVG(S.age) = (SELECT MIN (AVG (S2.age))
FROM Sailors S2 GROUP BY S2.rating)
INFS614 18
Continue from previous slide ❖ Use nested FROM to compute a temp table
of average sailor age, for each rating level – (A bit like relational algebra, or views!)
❖ Any query with a HAVING can be rewritten without a HAVING via the nested FROM
SELECT Temp.rating, Temp.avgage FROM (SELECT S.rating, AVG (S.age) AS avgage FROM Sailors S GROUP BY S.rating) AS Temp WHERE Temp.avgage = (SELECT MIN (Temp.avgage) FROM Temp)
INFS614 19
Note on Oracle SQL in the labs
Do not use the keyword “AS” to name a temporary table within the FROM clause.
INFS614 20
Note on Oracle SQL in the labs
SELECT Temp.rating, Temp.avgage FROM (SELECT S.rating, AVG (S.age) AS avgage FROM Sailors S GROUP BY S.rating) AS Temp WHERE Temp.avgage = (SELECT MIN (Temp.avgage) FROM Temp)
INFS614 21
Conclusions (so far…)
❖ Post processing on the result of queries is supported.
❖ Aggregation is the most complex “post processing” – “Group by” clause partitions the results into
groups – “Having” clause puts condition on groups (just
like Where clause on tuples).
INFS614 22
Null Values
❖ Field values in a tuple are sometimes unknown (e.g., a rating has not been assigned) or inapplicable (e.g., no spouse’s name).
❖ They can also be hidden (for security reasons)
❖ SQL provides a single special value null for all such situations.
INFS614 23
Dealing with the null value ❖ Special operators are needed to check if value is/
is not null. – Attribute_name IS NULL : TRUE if the
attribute’s value is null. FALSE otherwise – Attribute_name IS NOT NULL : TRUE if the
attribute’s value is not null. FALSE otherwise ❖ Is rating>8 true or false when rating is equal to
null? – Actually, it’s unknown.
❖ How about: rating > 8 AND age < 40 ? – Three-valued logic.
INFS614 24
Three Valued Logic ❖ Logical operators AND, OR and NOT using a 3-
valued logic (TRUE, FALSE and unknown).
NOT T F F T U U
AND T F U T T F U F F F F U U F U
OR T F U T T T T F T F U U T U U
INFS614 25
Null Values: Some issues The presence of null complicates many issues: ❖ WHERE clause eliminates tuples for which the qualification
does not evaluate to TRUE. Important in nested queries involving EXISTS or UNIQUE
❖ Two tuples in SQL are duplicates in a relation if corresponding attributes have the same value or both contains null.
SELECT S1.sname FROM SAILORS S1 WHERE EXISTS ( SELECT * FROM SAILORS S2 WHERE S1.sname = S2.sname AND AND S1.SID < S2.SID AND S1.RATING <> S2.RATING)
sid sname rating age 22 dustin 7 450 31 lubber 8 55.0 64 horatio null 35.0 32 andy 8 25.5 74 horatio 9 40.0
INFS614 26
Issues with the null value: Summary
❖ WHERE and HAVING clause eliminates rows (groups) for which the qualification does not evaluate to true (i.e., evaluate to false or unknown).
❖ Aggregate functions ignore null values (except count(*)).
❖ Distinct treats all null values as the same. ❖ The arithmetic operations +, -, *, / return
null if one of the arguments is null.
INFS614 27
Outer joins sid bid day22 101 10/10/9658 103 11/12/96
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0
(left outer-join)
= sid sname rating age bid day 22 dustin 7 45.0 101 10/10/96 31 lubber 8 55.5 Null Null 58 rusty 10 35.0 103 11/12/96
Outer joins are generators of null values!
INFS614 28
Oracle Notation: Select * From Sailor S left/right/full outer join Reserves R on S.sid =
R.sid;
Select S.sid, count(R.bid) From Sailor S left outer join Reserves R on S.sid = R.sid; Group by s.sid;
Select S.sid, count(*) From Sailor S left outer join Reserves R on S.sid = R.sid; Group by s.sid;
What do these return??
INFS614 29
SELECT statement functions ❖ Values can be transformed before aggregation:
e.g.: SELECT sum(S.A/2) FROM S;
❖ The decode function (Oracle specific): decode(attribute, domain value1, output value1,
domain value2, output value2, …, default output): SELECT sum(decode(Major, ‘INFS’, 1, 0)) as Total_IS_Stu,
sum(decode(Major, ‘INFS’, 0, 1)) as Total_NonIS_Stu FROM Student ;
❖ Calculating GPA from letter grades?
INFS614 30
Examples Department (D-code, D-Name, Chair-SSn)
Course (D-code, C-no, Title, Units) Prereq (D-code, C-no, P-code, P-no) Class (Class-no, D-code, C-no, Instructor-SSn) Faculty (Ssn, F-Name, D-Code, Rank) Student (Ssn, S-Name, Major, Status) Enrollment (Class-no, Student-Ssn) Transcript (Student-Ssn, D-Code, C-no, Grade)
INFS614 31
Query 1
List the classes (Class-no) currently taken by students whose names start with 'T'.
SELECT distinct e.Class-no FROM Enrollment e, Student s WHERE e.Student-Ssn = s.Ssn AND s.S-Name LIKE 'T%'
Department (D-code, D-Name, Chair-SSn) Course (D-code, C-no, Title, Units) Prereq (D-code, C-no, P-code, P-no) Class (Class-no, D-code, C-no, Instructor-SSn) Faculty (Ssn, F-Name, D-Code, Rank) Student (Ssn, S-Name, Major, Status) Enrollment (Class-no, Student-Ssn) Transcript (Student-Ssn, D-Code, C-no, Grade)
INFS614 32
Query 2
List the students (SSN) who are currently taking exactly one class.
SELECT e.Student-Ssn FROM Enrollment e GROUP BY e.Student-Ssn HAVING 1=count(*)
Department (D-code, D-Name, Chair-SSn) Course (D-code, C-no, Title, Units) Prereq (D-code, C-no, P-code, P-no) Class (Class-no, D-code, C-no, Instructor-SSn) Faculty (Ssn, F-Name, D-Code, Rank) Student (Ssn, S-Name, Major, Status) Enrollment (Class-no, Student-Ssn) Transcript (Student-Ssn, D-Code, C-no, Grade)
INFS614 33
Query 3
Give the percentage of the students (among all students) who are currently taking courses offered by ISE (D-code='ISE').
SELECT count(distinct e.Student-Ssn)/count(distinct s.Ssn) *100.0 as Percent
FROM Enrollment e, Class c, Student s WHERE e.Class-no=c.Class-no and c.D-code='ISE';
Department (D-code, D-Name, Chair-SSn) Course (D-code, C-no, Title, Units) Prereq (D-code, C-no, P-code, P-no) Class (Class-no, D-code, C-no, Instructor-SSn) Faculty (Ssn, F-Name, D-Code, Rank) Student (Ssn, S-Name, Major, Status) Enrollment (Class-no, Student-Ssn) Transcript (Student-Ssn, D-Code, C-no, Grade)
INFS614 34
Query 4
List the faculty members (F-Name) who teach 2 or more classes. List these faculty members by the number of classes they teach (ascending order).
SELECT f.F-Name FROM Faculty f, Class c WHERE f.Ssn=c.Instructor-SSn GROUP BY f.Ssn, f.F-Name HAVING count(distinct c.Class-no)>=2 ORDER BY count(distinct c.Class-no), F-Name;
Department (D-code, D-Name, Chair-SSn) Course (D-code, C-no, Title, Units) Prereq (D-code, C-no, P-code, P-no) Class (Class-no, D-code, C-no, Instructor-SSn) Faculty (Ssn, F-Name, D-Code, Rank) Student (Ssn, S-Name, Major, Status) Enrollment (Class-no, Student-Ssn) Transcript (Student-Ssn, D-Code, C-no, Grade)
INFS614 35
Query 5
List the students (SSN and Name) along with the number of classes they are taking. If a student is not taking any class, the student should also be listed (with 0 as the number of classes he/she is taking). The list should be ordered by the number of classes (in an ascending order), and in case of a tie, by the SSN of the students.
SELECT s.Ssn, s.S-Name, count(distinct Class-no) FROM Student s, Enrollment e WHERE s.Ssn = e.Student-Ssn (+) GROUP BY s.Ssn, s.S-Name ORDER BY count(distinct Class-no), Ssn
Department (D-code, D-Name, Chair-SSn) Course (D-code, C-no, Title, Units) Prereq (D-code, C-no, P-code, P-no) Class (Class-no, D-code, C-no, Instructor-SSn) Faculty (Ssn, F-Name, D-Code, Rank) Student (Ssn, S-Name, Major, Status) Enrollment (Class-no, Student-Ssn) Transcript (Student-Ssn, D-Code, C-no, Grade)
INFS614 36
Query 6
List the faculty members (F-Name) who teach more than twice as many classes as Professor Smith (F-Name='Smith') is teaching. (Note that if Professor Smith is not teaching anything, then any professor who teaches at least one class will satisfy the query.)
SELECT f.F-Name FROM Faculty f, Class c WHERE f.Ssn=c.Instructor-SSn GROUP BY f.Ssn, f.F-Name HAVING count(distinct c.Class-no) > (SELECT 2*count(distinct c2.Class-no) FROM Faculty f2, Class c2 WHERE f2.F-Name='Smith' and f2.Ssn=c2.Instructor-SSn)
Department (D-code, D-Name, Chair-SSn) Course (D-code, C-no, Title, Units) Prereq (D-code, C-no, P-code, P-no) Class (Class-no, D-code, C-no, Instructor-SSn) Faculty (Ssn, F-Name, D-Code, Rank) Student (Ssn, S-Name, Major, Status) Enrollment (Class-no, Student-Ssn) Transcript (Student-Ssn, D-Code, C-no, Grade)
INFS614 37
Query 7
Find the number of departments which do not have a chairman (Chair_ssn is 'NULL').
SELECT count(d.D-code) FROM Department d WHERE d.Chair-SSn is NULL
Department (D-code, D-Name, Chair-SSn) Course (D-code, C-no, Title, Units) Prereq (D-code, C-no, P-code, P-no) Class (Class-no, D-code, C-no, Instructor-SSn) Faculty (Ssn, F-Name, D-Code, Rank) Student (Ssn, S-Name, Major, Status) Enrollment (Class-no, Student-Ssn) Transcript (Student-Ssn, D-Code, C-no, Grade)
INFS614 38
Query 8
For each department (i.e., student’s Major), give the number of graduate students (status='Grad') and the number of other students (status <> 'Grad'). The two numbers must be shown in the same row as the department code. Hint: use decode.
SELECT Major, sum(decode(Status, 'Grad', 1,0)) as Grad, sum(decode(Status, 'Grad', 0,1)) as NoGrad FROM Student GROUP BY Major
Department (D-code, D-Name, Chair-SSn) Course (D-code, C-no, Title, Units) Prereq (D-code, C-no, P-code, P-no) Class (Class-no, D-code, C-no, Instructor-SSn) Faculty (Ssn, F-Name, D-Code, Rank) Student (Ssn, S-Name, Major, Status) Enrollment (Class-no, Student-Ssn) Transcript (Student-Ssn, D-Code, C-no, Grade)
INFS614 39
Query 9 For each department (D-code), give the highest rank(s) of the professors in the department along with the number of the faculty with that highest rank. The output contains one row for each department. Assume Full>Associate>Assistant, i.e., lexicographic order is fine. Note that some department may not have professors in some ranks (e.g., a department may not have full, or associate or assistant professors).
SELECT d.D-Code as dept, d.maxrank, count(f.Ssn) as num FROM (SELECT D-code, max(Rank) as maxrank FROM Faculty
GROUP BY D-code) AS d, Faculty f WHERE d.D-code=f.D-code AND f.Rank=d.maxrank GROUP BY d.D-code, d.maxrank ORDER BY d.D-code
INFS614 40
Query 10 Student(snum: integer, sname: string, major: string, level: string, age: integer) Class(name: string, meets at: string, room: string, fid: integer)!Enrolled(snum: integer, cname: string)!Faculty(fid: integer, fname: string, deptid: integer)"
SELECT DISTINCT S.sname!FROM Student S!WHERE S.snum IN! (SELECT E1.snum! FROM Enrolled E1, Enrolled E2, Class C1, Class C2! WHERE E1.snum = E2.snum AND E1.cname <> E2.cname! AND E1.cname = C1.name AND E2.cname = C2.name! AND C1.meets at = C2.meets at)"
Find the names of all students who are enrolled in two classes that meet at the same time"