Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
www.uptunotes.com
ARTICLE ON
RELATIONAL ALGEBRA
Tips to crack queries in GATE Exams:-
• In GATE exam you have no need to learn the syntax of different operations. You have to understand only how to execute that operation.
• So here we are concerned about only the execution of that operation. • For cracking queries all of you just take the relational instances of those
relations and try to correct/contradict the options. • At last but not least “GATE is just like a game b/w you and IIT`s. either
you play well and win the game or they play and you loose the game.”
By: JAGRAT GUPTA email: [email protected] Page 1
www.uptunotes.com
Query languages
Mathematical query SQL Relational algebra Relational calculus
• Relational Algebra: It is more operational and very useful for representing execution plans. (operational)
• Relational Calculus: It means user describe what they want rather how to compute it. (Non-operational or declarative).
• Relational algebra and relational calculus are keys to understand SQL query processing.
• Most important thing to remember is that- “A query is applied to relational instance and the result of the query is
also a relational instance.” • Schemas of input relations for a query are fixed but query will run regardless of instance. • The schema for result of a given query is also fixed, which is determined by definition of query
language constructs.
By: JAGRAT GUPTA email: [email protected] Page 2
www.uptunotes.com
Queries step by step (Relational algebra)
• For further queries, we are taking the following relational instances.
R1 Sid bid day 22 101 10-10-96 58 103 11-12-96
S1
Sid sname rating age 22 D 7 45 31 L 8 55 58 R 10 35
S2 sid sname rating age
28 Y 9 35 31 L 8 55 44 G 5 35 58 R 10 35
• In Relational Algebra ,
“Duplicates are automatically eliminated.”
• Selection Operation ( σ ) – 1. Selects subsets of rows from relations. 2. Selects tuples that satisfy a given predicate. 3. Remember one thing i.e. whole tuple is selected in selection operation.
4. Syntax -> σ condition(Relation)
By: JAGRAT GUPTA email: [email protected] Page 3
www.uptunotes.com
Ex-> σ sid=22(S1)
σ age=35(S1)
sid sname rating age 28 y 9 35 44 g 5 35 58 r 10 35
• Projection Operation ( П ) – 1. Deletes unwanted columns from relation. 2. The main difference between selection operation( σ ) and projection operation ( П ) is that ->
“In Selection operation, whole tuple is selected while in Projection operation only those attributes (columns) are projected which we want to
project or specify.” 3. Syntax -> П attribute1,attribute2,…….,attribute N (Relation)
Ex->П sname,rating(S2)
Sname Rating Y 9 L 8 G 5 R 10
П age(S2)
(Duplicates are automatically Removed)
sid sname rating age 22 d 7 45
age 35 55
By: JAGRAT GUPTA email: [email protected] Page 4
www.uptunotes.com
• Combination of Selection and Projection -> Syntax -> П attribute`s names(σ condition(Relation))
Ex-> П sname, rating(σ rating>8 (S2))
• In general we allow comparison using =, =,>,<,≤,≥ in the selection predicate .We can combine two or
more conditions with the help of logical connectives i.e. AND(^),OR(v) and NOT (¬).
Ex-> Пsname,rating,age(σrating>5 age≥35(s2)) sname rating age
Y 9 35 L 8 55 R 10 35
• UNION OPERATION (U) 1. It combines all the tuples that are either in first relation or in second relation or both. 2. Most important thing is that we can take union of only those relations that have same arity(they must
have same number of attribute) and domain of the ith attribute of r and the ith attribute of S must be same for all i.
Ex-> S1 U S2
Sname rating Y 9 R 10
sid sname rating age 22 D 7 45 31 L 8 55 58 R 10 35 28 Y 9 35 44 G 5 35
Duplicate rows are automatically eliminated.
By: JAGRAT GUPTA email: [email protected] Page 5
www.uptunotes.com
Π sid(R1) U Π sid(S1)
U
• SET DEFFERENCE (-) 1. S1-S2 means to find the tuples that are in relation S1 but not in relation S2. 2. S2-S1 means to find the tuples that are in relation S but not in relation S1. 3. Same arity and same domain concept are also implemented in set difference operation as previously
defined in union operation.
Ex-> (S1-S2) sid
22
sname
d
rating
7
age
45 (S2-S1)
sid Sname rating age 28 44
Y G
9 5
35 35
∏ (𝑺𝑺𝟏𝟏)𝒔𝒔𝒔𝒔𝒔𝒔 -∏ (𝑹𝑹𝟐𝟐)𝒔𝒔𝒔𝒔𝒔𝒔
sid
22 31 58
-
sid
22 58
sid
31
∏ (𝑹𝑹𝟏𝟏)𝒔𝒔𝒔𝒔𝒔𝒔 -∏ (𝑺𝑺𝟏𝟏)𝒔𝒔𝒔𝒔𝒔𝒔 Operation results no tuples.
Sid
22
58
sid 22 31
58
Sid 22 31 58
Duplicate rows are automatically eliminated.
By: JAGRAT GUPTA email: [email protected] Page 6
www.uptunotes.com
• Cartesion Product(X) i) The cross product (X) operation allows us to combine information from any two relations. ii) Most important thing is that-
“if a relation R has m tuples and another relation S has n tuples then RXS have (mXn) tuples.”
iii) Another important thing is that-
“if a relation R has p attributes and another relation S has q attributes then RXS have (p+q) no. of attribute.”
Ex-> S1 X S2 S1.si
d
22 22 22 22 31 31 31 31 58 58 58 58
S1.sname d d d d l l l l r r r r
S1.rating
7 7 7 7 8 8 8 8
10 10 10 10
S1.age
45 45 45 45 55 55 55 55 35 35 35 35
S2.sid
28 31 44 58 28 31 44 58 28 31 44 58
S2.sname y l g r y l g r y l g r
S2.rating
9 8 5
10 9 8 5
10 9 8 5
10
S2.age
35 55 35 35 35 55 35 35 55 55 35 35
R1×S1 R1.sid bid day S1.sid sname rating age
22 101 10-10-96 22 d 7 45
22 101 10-10-96 31 l 8 55
22 101 10-10-96 58 r 10 35
58 103 11-12-96 22 d 7 45
58 103 11-12-96 31 l 8 55
58 103 11-12-96 58 r 10 35
By: JAGRAT GUPTA email: [email protected] Page 7
www.uptunotes.com
• HOW TO CRACK QUERIES ON CROSS PRODUCT- Consider the query i.e.
σR1.sid=S1.sid(R1×S1)
1. First compute the R1×S1(already computed). 2. Then track out those tuples in which sid of R1 is equal to sid of S1. 3. The result is ->
R1.sid=S1.sid R1.sid bid Day S1.sid sname rating age 22 101 10-10-96 22 d 7 45 58 103 11-12-96 58 r 10 35
Consider the another query: П sname,rating(σR1.sid<S1.sid rating>7 (R1×S1))
1. First compute R1×S1 (already computed). 2. Track out those tuples for which sid of R1 is less than sid of S1 AND rating>7. (remember both
conditions must be true) . R1.sid bid day S1.sid sname rating age
22 101 10-10-96 31 l 8 55
22 101 10-10-96 58 r 10 35
Now project sname, rating attributes only from above table. (final output)
Sname rating l 8 r 10
So we have already discussed the 5 basic relational algebra operations i.e. selection, projection, union, set-difference and Cartesian or cross product.
• Additional Operations on Relational Algebra->
Intersection Operations. Joins Operations. Division Operations. Rename Operations.
By: JAGRAT GUPTA email: [email protected] Page 8
www.uptunotes.com
Those additional operations are implemented through those five basic relational algebra operations. • Intersection operation(∩):-
1. The intersection operation (∩) allows us to find those tuples that are occured in both relations. 2. Like the union as well as set-difference operation, we can apply intersection operation in only
those relations that follows the same arity and same domain concept. Ex-> S1 ∩ S2
sid sname rating age 31 l 8 55 58 r 10 35
Ex-> Π sid(R1) ∩ Π sid(S1)
∩
3. We can implement this operation with the help of set difference operation-
r ∩ s = r- (r - s) where r and s are only two relational instances. • Join operations:-
(Most of the students are totally confused towards these join operations. there are so many questions asked in GATE exam regarding joins operations.) • Conditional Join-
1. R C S Ξ σ C(R × S)
Ex-> S1 S1.sid < R1.sid R1
S1.sid sname rating age R1.sid bid day
22 d 7 45 58 103 12-12-96 41 l 8 55 58 103 11-12-96
sid
22
58
sid
22
31
58
sid
22
58
By: JAGRAT GUPTA email: [email protected] Page 9
www.uptunotes.com
You can see that this output is equivalent to
σS1.sid< R1.sid (S1 X R1) 3. Sometimes this conditional join is also called Ɵtheta join.
• Equi join- 1. A special case of conditional join where the condition C contains only equalities.
Ex-> S1⋈sidR1 ≡ S1⋈S1.sid=R1.sidR1 Sid sname rating age bid day
22
58
d r
7
10
45
35
101
103
10-10-96
11-12-96
2. Most important thing is that-
“Only one copy of the conditional attribute is present i.e. sid attribute will occur only one time unlike conditional join.”
• Natural join- 1. It means equi join on all common fields.
Ex-> S1 ⋈ S2
Sid sname rating age
31
58
l r
8
10
55
35
R1 ⋈ S1
sid sname rating Age bid day 22 58
d r
7 10
45 35
101 103
10-10-96 11-12-96
Only sid field is common in both relation.
2. In generally we can say that the natural join of r and s is
These all fields are
Common in both relations.
By: JAGRAT GUPTA email: [email protected] Page 10
www.uptunotes.com
r ⋈ s =πR∪S(σr.A1=s.A1∧r.A2=s.A2∧…….∧ .r.An=s.An (r × s))
where A1, A2……..,An are attribute in relations r(R) and s(S).
3. Most important thing is that - “Let r(R) and s(S) be relations without any attributes in common i.e. R∩S =φ“ then,
r s= r*s
• OUTER JOIN:- 1. The outer join operation is an extension of the join operation to deal with missing information. 2. We are taking the following two relations to understand the outer joins ->
Employee Emp-name Street City
A P a B Q b C R c D S d
FT-WORKS EMP-NAME BRANCH-NAME SALARY
A Mesa 1500 B Mesa 1300 E Red 5300 D Red 1500
3. There are three types of outer-joins -
i. Left outer-join ( )
ii. Right outer-join ( )
iii. Full outer-join ( )
i. Left outer-join-> It takes all tuples in the left relation that did not match with any tuple in the right relation, pads the tuples with null values for all other attributes from the right relation and add them to the result of natural join. All information from the left relation is present in the result of left outer-join.
By: JAGRAT GUPTA email: [email protected] Page 11
www.uptunotes.com
Pad tuple from left relation Result of natural-join ii. Right outer-join ( ) -> Just replace the word left by right and vice-versa and you can get the
definition of right outer-join. Emp-name Street City Branch-name Salary
A P a Mesa 1500 B Q b Mesa 1300 D S d Red 1500 E Null Null Red 5300
iii. Full outer join( ) - It padds tuples from the left relation that did not match any from the right
relation , as well as tuples from the right relation that did not match any from the left relation and adding them to the result of join.
result of natural join tuple padds from left relation tuple padds from right relation
Emp-
name
Street City Branch-name Salary
A P a Mesa 1500
B Q b Mesa 1300
D S d Red 1500
C R c Null Null
emp_name street city branch_name salary A P a Mesa 1500 B Q b Mesa 1300 D S d Red 1500 C R c null null E null null Red 5300
By: JAGRAT GUPTA email: [email protected] Page 12
www.uptunotes.com
• Division Operation(÷)- 1. The division operation is suited to queries that include the phrase “for all”.
2. Let us take the following schemas- branch(branch_name,branch_city,assets) depositor(customer_name,account_no) account(account_no,branch_name,balance)
3. Now consider the following query-> r2
“Find all customers who have an account at all the branches located at Brooklyn city.” r1
4. Tips for cracking division queries-> I. Firstly solve the query that is written after the word “all” and name it as r1.
II. Now solve the query that is written before the word “all” and name it as r2. III. Now you can write r = r2 ÷ r1.
r1 <- πbranch_name(σbranch_city=“Brooklyn”(branch)) r2<- πcustomer_name,branch_name(depositor account) r <- r2 ÷ r1
5. Most important thing is that- “the projected attribute from query r1 must also be projected in query r2. i.e. here this is apply
into branch_name attribute.” r = r2 = customer_name,branch_name
r1 branch_name so only customer_name is remained.
6. For GATE purpose, output finding questions are asked. 7. let us take the following example through relational instances->
By: JAGRAT GUPTA email: [email protected] Page 13
www.uptunotes.com
A B1 B2 B3
A ÷ B1 => A ÷ B2 =>
A ÷ B3 =>
• Rename operation ( ρ ) 1. Results of relational algebra are also relations but without any name. The rename operation allows us to
rename the output relation. rename operation is denoted with small greek letter rho ρ
2. Notation:- ρ x (E)
where the result of expression E is saved with name of x.
P P1 P2 P4
S P S1 P1 S1 P2 S1 P3 S1 P4 S2 P1 S2 P2 S3 P2 S4 P2 S4 P4
P P2
P P2 P4
S S1 S2 S3 S4
S S1 S4
S S1
By: JAGRAT GUPTA email: [email protected] Page 14
www.uptunotes.com
• Aggregate Functions (ℱ) 1. Mathematical functions on collections of values from the database. 2. We can also apply aggregate functions to attributes and
tuples: SUM. AVERAGE. MAXIMUM. COUNT. MINIMUM.
3. Syntax: (ℱ<function list>(R))
Ex-> Student Name Number Sex Ben 3412 M Dan 1234 M Nel 2341 F
ℱCOUNT(Student) Count
3
ℱMIN(Number)(Student) Number
1234
4. Grouping: (<grouping attributes>ℱ<function list>(R)) Ex->Sex ℱ COUNT, SUM(Number)(Student)
Sex Count SUM M 2 4646 F 1 2341
By: JAGRAT GUPTA email: [email protected] Page 15
www.uptunotes.com
QUESTIONS FROM GATE GATE CS (2003):-
Consider the following SQL query select distinct a1,a2.....an from r1,r2.....am where P For an arbitrary predicate P, this query is equivalent to which of the following relational algebra expressions?
A. Πa1,a2....an σρ(r1×r2×...×rm) B. Πa1,a2....an σρ(r1⋈,r2⋈...⋈rm)
C. Πa1,a2....an σρ(r1∪,r2∪...∪rm) D. Πa1,a2....an σρ(r1∩,r2∩...∩rm)
Solution:- If we compare SQL and relational algebra then following operations are equivalent-> Select ≡ Projection operation From ≡ Cartesian product of relations Where ≡ Selection operation So Select distinct a1, a2,………,an from r1,r2,................rm where P is equivalent to , Пa1,a2,…….,an(σp(r1×r2×…………×rm)) So option (a) is correct.
GATE CS (2004):- Let R1(A,B,C,D)and R2(D,E) be two relation schema, where the primary keys are shown underlined, and let C be a foreign key in R1 referring toR2 . Suppose there is no violation of the above referential integrity constraint in the corresponding relation instances r1 and r2 . Which one of the following relational algebra expressions would necessarily produce an empty relation?
A. ΠD(r2)−ΠC(r1) B. ΠC(r1)−ΠD(r2)
C. ΠD(r1⋈C≠Dr2) D. ΠC(r1⋈C=Dr2) Solution:- Here C is foreign key in R1 referring to R2. It means the values in C will be taken according to the values in D because D is primary key of relation R2. So if we take the following relational instances ->
r1 r2 A B C 1 2 3 4 5 6 2 8 7
Then on executing the following operation ->
D E 3 10 6 11 7 12 8 13
By: JAGRAT GUPTA email: [email protected] Page 16
www.uptunotes.com
Пc(r1) ¯ ПD(r2)
=> Empty relation
-
(refers to set difference operation). So option (c) is correct.
GATE CS (2004):-
Consider the following relation schema pertaining to a students database: Students (rollno, name, address) Enroll(rollno, courseno, coursename) where the primary keys are shown underlined. The number of tuples in the student and enroll tables are 120 and 8 respectively. What are the maximum and minimum number of tuples that can be present in (Student * Enroll), where ‘*’ denotes natural join? A. 8,8 B. 120,8
C. 960,8 D. 960,120 Solution:-
If we take the natural join of students and enroll relation then we are only concerned about common fields. Here roll no is the common attribute in both relations and roll no is also primary key. Relation student -> 120 tuples Relation enroll -> 8 tuples
So if we think about either max or min tuples it will be 8,8 respectively because only 8 values from roll no attribute from relation enroll are matched with the roll no attribute of the relation student. So only 8 tuples are projected in either max or min case.
So option (a) is correct. GATE CS (2007):-
Information about a collection of students is given by the relation studinfo( studId, name, sex). The relation enroll(studId, courseId) gives which student has enrolled for (or taken) what course(s). Assume that every course is taken by at least one male and at least one female student. What does the following relational algebra expression represent? ΠcourseId((ΠstudId(σsex−"female")(studInfo))∗ΠcourseId(enroll))−enroll) A. Courses in which all the female students are enrolled.
B. Courses in which a proper subset of female students are enrolled.
C 3 6 7
D 3 6 7 8
By: JAGRAT GUPTA email: [email protected] Page 17
www.uptunotes.com
C. Courses in which only male students are enrolled.
D. None of the above
Solution:- As I have already discussed that you can crack easily query questions by taking relational instances let us see how?
studInfo enroll studID Name Sex
1 A M 2 B F 3 C M 4 D M 5 E F 6 F F
Now perform the query:- ΠcourseID((ΠstudID(σ sex=”F”(studInfo))* ΠcourseID(enroll) -enroll))
Π studID(σ sex=”F”(studInfo)) ΠcourseID(enroll)
Now according to query we have to take the cross product of above two relational instances.
ΠstudID(σsex=”F”(studInfo))* ΠcourseID(enroll) studID courseID
2 101 2 102 5 101 5 102 6 101 6 102
studID courseID 1 101 5 101 3 101 2 102 6 102
studID 2 5 6
courseID 101 102
By: JAGRAT GUPTA email: [email protected] Page 18
www.uptunotes.com
Now subtract the enroll relation from the above computer relation.
enroll studID courseID
2 101 2 102 5 101 5 102 6 101 6 102
So, the result is
studID courseID 2 101 5 102 6 101
Project courseID attribute we get
courseID 101 102
We can easily see that only 1,3,5,2,6 studID is selected according to the above courseID and on the basis of studID we can select the name A,,C,E,B,F are projected respectively and also their sex. Option C is incorrect because female students are also enrolled. Now if we take the following enrolled relation instead of above enrolled relation.
enroll studID courseID
1 101 5 101 3 101
studID courseID 1 101 5 101 3 101 2 102 6 102
By: JAGRAT GUPTA email: [email protected] Page 19
www.uptunotes.com
Final output is:-
courseID 101
(Repeat whole query) So according to courseID ‘101’ ,studID 1,5,3 are selected and based on studID A,E,C names are selected. Now check their sex. Option (a) is also incorrect because all females are not enrolled.
So, option (b) is correct because you can see the above relational instance and this option is correct in each and every case.
GATE CS (2008):- Let R and S be two relations with the following schema R (P,Q,R1,R2,R3) S (P,Q,S1,S2) Where {P, Q} is the key for both schemas. Which of the following queries are equivalent? I. Πp(R⋈S) II. Πp(R) ⋈ IIP(S) III. Πp(IIP,Q(R)⋂IIP,Q(S)) IV. Πp(ΠP,Q(R) -( ΠP,Q(R)-ΠP,Q(S)) A. Only I and II B. Only I and III C. Only I, II and III D. Only I, III and IV
Solution:- Let us see above options one by one-
1. 1st and 2nd expression can not be equivalent because it may be possible that if we take the natural join of relations and then project attribute P, some tuples are extra or less as compare to the 2nd expression`s evaluation. For implementing this concept Refer the concept lossless and lossy join decomposition. So option (a) and (c) is incorrect.
2. 1st and 3rd expression must be equivalent because if we take any relational instance according to the given relations then their outputs are same.
Ex-> R S Now if you apply the 3rd expression over these two relations then
you can get the same output as we have already got in 1st expression i.e.-
P Q R1 R2 R3 1 1 a d g 1 2 b e h 2 3 c f i
P Q S1 S2 1 1 J k 2 3 L m
By: JAGRAT GUPTA email: [email protected] Page 20
www.uptunotes.com
Πp(IIP,Q(R)⋂IIP,Q(S))
3. 3rd and 4th expression must be equivalent because of the concept-
r ∩ s = r - (r - s) So option (d) is correct i.e. 1st, 3rd and 4th expressions are equivalent.
GATE CS (2012):-
Suppose R1(A,B) and R2(C,D) are two relation schemas. Let r1 and r2 be the corresponding relation instances. B is a foreign key that refers to C in R2. If the data in r1 and r2 satisfy referential integrity constraints which of the following is ALWAYS TRUE? A. ΠB(r1)-ΠC(r2)=ϕ B. ΠC(r2)-ΠB(r1)=ϕ C. ΠB(r1)=Πc(r2) D. ΠB(r1)-Πc(r2)≠ ϕ
Solution:- Here B is a foreign key that refers to C in R2 it means values in B totally depend upon values in C and also we can say that values in B are subset of values in C. So now let us take the following instances:- R1 R2
1. Option (a) is correct because if we calculate the expression
ΠB(r1)-ΠC(r2) then it gives empty relation surely in each and every case. 2. Option (b) is incorrect from the above case it gives the output:-
That is not empty relation. 3. Option (c) is incorrect because you can easily see the above two relations that B attribute in R1 is not
equal to C attribute in R2. 4. Option (d) is incorrect because it contradicts the option 1st.
1 2
A B 1 5 2 5 3 6 4 7
C D 5 a 6 b 7 c 8 d
8
By: JAGRAT GUPTA email: [email protected] Page 21
www.uptunotes.com
GATE CS (2012):- Consider the following relations A, B and C; A B
C ID PHONE AREA 10 2200 02 99 2100 01
How many tuples does the result of the following relational algebra expression contain? Assume the schema of A∪B is the same as that of A.
(A∪B)⋈A.Id>40 v C.Id<15C
A. 7 B. 4 C. 5 D. 9 Solution:- First we compute (A∪B):-
As we have known that above question is from Conditional Join i.e.-
R C S Ξ σ C(R × S) Where in this question:- R= (A∪B)
S= C And condition c is A.Id>40 v C.Id<15.
ID NAME AGE 12 Arun 60 15 Shreya 24 99 Rohit 11
ID NAME AGE 15 Shreya 24 25 Hari 40 98 Rohit 20 99 Rohit 11
ID NAME AGE 12 Arun 60 15 Shreya 24 99 Rohit 11 25 Hari 40 98 Rohit 20
By: JAGRAT GUPTA email: [email protected] Page 22
www.uptunotes.com
So first we calculate R × S i.e. (A∪B) × C
ID NAME AGE ID PHONE AREA 12 Arun 60 10 2200 02 12 Arun 60 99 2100 01 15 Shreya 24 10 2200 02 15 Shreya 24 99 2100 01 99 Rohit 11 10 2200 02 99 Rohit 11 99 2100 01 25 Hari 40 10 2200 02 25 Hari 40 99 2100 01 98 Rohit 20 10 2200 02 98 Rohit 20 99 2100 01
Now if we apply the condition A.Id>40 v C.Id<15. then we extract the 7 tuples (Red Coloured in above table). So option (a) is correct.
GATE CS (2014)( 2nd March Morning Session):- Consider the following table:-
If we are using (Studname,Age) as a key then what should not be the value of x?
Solution:- Clearly you can see the concept of key that it defines each and every tuple uniquely in the relation. So x
can not be equal to 19 because if it happens the values of the Studname, Age in the 1st tuple will equal to the values of the Studname, Age in the 3rd tuple that violates the key concept.
Studid Studname Age Course 1245 Sachin x C1 4279 Manjit 18 C4 2481 Sachin 19 C5 9436 Sumit 17 C2
By: JAGRAT GUPTA email: [email protected] Page 23