25
Spring 2020 – University of Virginia 1 © Praphamontripong © Praphamontripong SQL - Join CS 4750 Database Systems [A. Silberschatz, H. F. Korth, S. Sudarshan, Database System Concepts, Ch.4.1] [ https://www.dofactory.com/sql/join ]

SQL - Join · 2020. 6. 3. · (INNER) JOIN left table right table (INNER) JOIN SELECT * FROM one JOIN two ON one.p1 = two.p1 p1 colA colB 1 A aa 2 B bb 3 C cc 4 D dd 5 E ee 7 G gg

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

  • Spring 2020 – University of Virginia 1© Praphamontripong© Praphamontripong

    SQL - Join

    CS 4750Database Systems

    [A. Silberschatz, H. F. Korth, S. Sudarshan, Database System Concepts, Ch.4.1][ https://www.dofactory.com/sql/join ]

  • Spring 2020 – University of Virginia 2© Praphamontripong

    Recap 1: Aggregation

    ID name dept_name salary10101 Srinivasan Comp. Sci. 65000.0012121 Wu Finance 90000.0015151 Mozart Music 40000.0022222 Einstein Physics 95000.0032343 El Said History 60000.0033456 Gold Physics 87000.0045565 Kate Comp. Sci. 75000.0058583 Katz History 62000.0076543 Singh Finance 80000.0076766 Brandt Biology 72000.0083821 Kate Comp. Sci. 92000.0098345 Kim Elec. Eng. 80000.00

    Instructor (refer to alldbs.sql)

    Find average salary for each department

    SELECT dept_name, AVG(salary)FROM instructorGROUP BY dept_name;

  • Spring 2020 – University of Virginia 3© Praphamontripong

    Recap 2: Aggregation

    ID name dept_name salary10101 Srinivasan Comp. Sci. 65000.0012121 Wu Finance 90000.0015151 Mozart Music 40000.0022222 Einstein Physics 95000.0032343 El Said History 60000.0033456 Gold Physics 87000.0045565 Kate Comp. Sci. 75000.0058583 Katz History 62000.0076543 Singh Finance 80000.0076766 Brandt Biology 72000.0083821 Kate Comp. Sci. 92000.0098345 Kim Elec. Eng. 80000.00

    Instructor (refer to alldbs.sql)

    Find average salary for each department that is greater than 70000

    SELECT dept_name, AVG(salary)FROM instructorGROUP BY dept_nameHAVING AVG(salary) > 70000;

  • Spring 2020 – University of Virginia 4© Praphamontripong

    Recap: Primary Keys, Foreign KeysHiring (TA_id, Name, Year, Two_week_hours)TA_id Name Year Two_week_hours1234 Minnie 4 202345 Humpty 3 243456 Dumpty 4 303333 Minnie 3 125678 Mickey 2 16

    Key = one or more attributes that uniquely

    identify a rowTA_Assignment (TA_id, Course)TA_id Course1234 Database Systems2345 Database Systems3456 Cloud Computing1234 Software Analysis5678 Web Programming Lang.

    Foreign key = one or more attributes that uniquely identify a row in another

    table

    references

    Foreign key describes a relationship between tables

  • Spring 2020 – University of Virginia 5© Praphamontripong

    JOIN• Combine related tables or sets of data on key values

    • All tables (or relations) must have a unique identifier (primary key)

    • Any related tables must contain a copy of the unique identifier (foreign key) from the related table

    lefttable

    righttable

    FULL JOIN

    lefttable

    righttable

    LEFT JOIN

    lefttable

    righttable

    RIGHT JOIN

    lefttable

    righttable

    (INNER) JOIN

    lefttable

    righttable

    NATURAL JOIN

    (preserve all rows, uncommon, thus skip)

    (most common)

    (preserve all rowsin right table)

    (preserve all rowsin left table)

  • Spring 2020 – University of Virginia 6© Praphamontripong

    NATURAL JOIN

    lefttable

    righttable

    NATURAL JOIN

    p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

    p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

    Not include the repeated columnone(p1, colA, colB) two(p2, p1, colX, colY)

    SELECT * FROM one NATURAL JOIN two;

    for each row1 in one:for each row2 in two:

    if (row1.p1 = row2.p1):output (row1.p1, row1.colA, row1.colB,

    row2.p2, row2.colX, row2.colY)

    Join on common named

    attribute(s). If no match

    found, move on.

  • Spring 2020 – University of Virginia 7© Praphamontripong

    p1 colA colB p2 colX colY1 A aa 9001 55 ABC2 B bb 9006 42 STU3 C cc 9002 77 PQR3 C cc 9003 95 DEF5 E ee 9005 50 KLM7 G gg 9004 62 GHI7 G gg 9007 83 XYZ

    NATURAL JOIN

    lefttable

    righttable

    NATURAL JOINNot include the repeated column

    SELECT * FROM one NATURAL JOIN two

    p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

    p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

    one(p1, colA, colB) two(p2, p1, colX, colY)

    Note the order of the columns; order of rows does not matter

  • Spring 2020 – University of Virginia 8© Praphamontripong

    (INNER) JOIN

    lefttable

    righttable

    (INNER) JOIN

    p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

    p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

    Include the repeated columnone(p1, colA, colB) two(p2, p1, colX, colY)

    SELECT * FROM one JOIN two ON one.p1 = two.p1;

    for each row1 in one:for each row2 in two:

    if (row1.p1 = row2.p1):output (row1.p1, row1.colA, row1.colB,

    row2.p2, row2.p1, row2.colX, row2.colY)

    Join on specified attribute(s). If no match

    found, move on.

  • Spring 2020 – University of Virginia 9© Praphamontripong

    p1 colA colB p2 p1 colX colY1 A aa 9001 1 55 ABC2 B bb 9006 2 42 STU3 C cc 9002 3 77 PQR3 C cc 9003 3 95 DEF5 E ee 9005 5 50 KLM7 G gg 9004 7 62 GHI7 G gg 9007 7 83 XYZ

    (INNER) JOIN

    lefttable

    righttable

    (INNER) JOIN

    SELECT * FROM one JOIN twoON one.p1 = two.p1

    p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

    p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

    one(p1, colA, colB) two(p2, p1, colX, colY)

    SELECT * FROM one INNER JOIN twoON one.p1 = two.p1

    Include the repeated column

    Note the order of the columns; order of rows does not matter

  • Spring 2020 – University of Virginia 10© Praphamontripong

    Cartesian Productp1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

    p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

    one(p1, colA, colB) two(p2, p1, colX, colY)

    SELECT * FROM one JOIN two;

    for each row1 in one:for each row2 in two:

    output (row1.p1, row1.colA, row1.colB,row2.p2, row2.p1, row2.colX, row2.colY)

    SELECT * FROM one, two;

  • Spring 2020 – University of Virginia 11© Praphamontripong

    p1 colA colB p2 p1 colX colY1 A aa 9001 1 55 ABC1 A aa 9005 5 50 KLM1 A aa 9004 7 62 GHI1 A aa 9003 3 95 DEF1 A aa 9007 7 83 XYZ1 A aa 9002 3 77 PQR1 A aa 9006 2 42 STU2 B bb 9002 3 77 PQR2 B bb 9006 2 42 STU2 B bb 9001 1 55 ABC2 B bb 9005 5 50 KLM2 B bb 9004 7 62 GHI2 B bb 9003 3 95 DEF2 B bb 9007 7 83 XYZ3 C cc 9003 3 95 DEF3 C cc 9007 7 83 XYZ... … … ... … … …

    Cartesian Product

    p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

    p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

    one(p1, colA, colB)

    two(p2, p1, colX, colY)

    one × two

    Note the order of the columns; order of rows does not matter

  • Spring 2020 – University of Virginia 12© Praphamontripong

    NATURAL LEFT OUTTER JOINAugment a join by the dangling tuples

    p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

    p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

    one(p1, colA, colB) two(p2, p1, colX, colY)

    SELECT * FROM one NATURAL LEFT OUTTER JOIN two;

    Join on common named

    attribute(s). If no match

    found, pad with NULL values

    lefttable

    righttable

    LEFT JOIN

    Preserve all rows in left table

  • Spring 2020 – University of Virginia 13© Praphamontripong

    p1 colA colB p2 colX colY1 A aa 9001 55 ABC2 B bb 9006 42 STU3 C cc 9002 77 PQR3 C cc 9003 95 DEF4 D dd NULL NULL NULL5 E ee 9005 50 KLM7 G gg 9004 62 GHI7 G gg 9007 83 XYZ8 H hh NULL NULL NULL

    p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

    p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

    one(p1, colA, colB) two(p2, p1, colX, colY)

    Note the order of the columns; order of rows does not matter

    NATURAL LEFT OUTTER JOIN

    lefttable

    righttable

    LEFT JOINAugment a join by the dangling tuples

  • Spring 2020 – University of Virginia 14© Praphamontripong

    LEFT OUTTER JOINAugment a join by the dangling tuples

    p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

    p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

    one(p1, colA, colB) two(p2, p1, colX, colY)

    SELECT * FROM one LEFT OUTTER JOIN twoON one.p1 = two.p1;

    Join on specified attribute(s). If no match

    found, pad with NULL values

    lefttable

    righttable

    LEFT JOIN

    Preserve all rows in left table

  • Spring 2020 – University of Virginia 15© Praphamontripong

    p1 colA colB p2 colX colY1 A aa 9001 55 ABC2 B bb 9006 42 STU3 C cc 9002 77 PQR3 C cc 9003 95 DEF4 D dd NULL NULL NULL5 E ee 9005 50 KLM7 G gg 9004 62 GHI7 G gg 9007 83 XYZ8 H hh NULL NULL NULL

    p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

    p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

    one(p1, colA, colB) two(p2, p1, colX, colY)

    Note the order of the columns; order of rows does not matter

    LEFT OUTTER JOIN

    lefttable

    righttable

    LEFT JOINAugment a join by the dangling tuples

  • Spring 2020 – University of Virginia 16© Praphamontripong

    NATURAL RIGHT OUTTER JOINAugment a join by the dangling tuples

    p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

    p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

    one(p1, colA, colB) two(p2, p1, colX, colY)

    SELECT * FROM one NATURAL RIGHT OUTTER JOIN two;

    Join on common named

    attribute(s). If no match

    found, pad with NULL values

    lefttable

    righttable

    RIGHT JOIN

    Preserve all rows in right table

  • Spring 2020 – University of Virginia 17© Praphamontripong

    p1 p2 colX colY colA colB1 9001 55 ABC A aa2 9006 42 STU B bb3 9002 77 PQR C cc3 9003 95 DEF C cc5 9005 50 KLM E ee7 9004 62 GHI G gg7 9007 83 XYZ G gg

    p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

    p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

    one(p1, colA, colB) two(p2, p1, colX, colY)

    Note the order of the columns; order of rows does not matter

    NATURAL RIGHT OUTTER JOINAugment a join by the dangling tuples

    Since two.p1 is a foreign key referencing one.p1, matches are always founded

    lefttable

    righttable

    RIGHT JOIN

  • Spring 2020 – University of Virginia 18© Praphamontripong

    RIGHT OUTTER JOINAugment a join by the dangling tuples

    p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

    p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

    one(p1, colA, colB) two(p2, p1, colX, colY)

    Join on specified attribute(s). If no match

    found, pad with NULL values

    lefttable

    righttable

    RIGHT JOIN

    SELECT * FROM one RIGHT OUTTER JOIN twoON one.p1 = two.p1;

    Preserve all rows in right table

  • Spring 2020 – University of Virginia 19© Praphamontripong

    p1 p2 colX colY colA colB1 9001 55 ABC A aa2 9006 42 STU B bb3 9002 77 PQR C cc3 9003 95 DEF C cc5 9005 50 KLM E ee7 9004 62 GHI G gg7 9007 83 XYZ G gg

    p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

    p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

    one(p1, colA, colB) two(p2, p1, colX, colY)

    Note the order of the columns; order of rows does not matter

    RIGHT OUTTER JOINAugment a join by the dangling tuples

    Since two.p1 is a foreign key referencing one.p1, matches are always founded

    lefttable

    righttable

    RIGHT JOIN

  • Spring 2020 – University of Virginia 20© Praphamontripong

    Self Joins• Join with itself

    • Useful when the table has a FOREIGN KEY which references its own PRIMARY KEY

    • Can be viewed as joining two copies of the same table

    • To do self join, use aliases

  • Spring 2020 – University of Virginia 21© Praphamontripong

    (Challenging) Example: Self JoinsFind names of all TAs who are assigned to Database Systems and Software Analysis

    TA_id Name Year Two_week_hours1234 Minnie 4 202345 Humpty 3 243456 Dumpty 4 303333 Minnie 3 125678 Mickey 2 16

    TA_id Course1234 Database Systems2345 Database Systems3456 Cloud Computing1234 Software Analysis5678 Web Programming Lang.2345 Software Analysis

    Hiring(TA_id, Name, Year, Two_week_hours) TA_Assignment(TA_id, Course)

    Discuss/work with your neighbor. How should we write SQL?

  • Spring 2020 – University of Virginia 22© Praphamontripong

    (Challenging) Example: Self JoinsFind names of all TAs who are assigned to Database Systems and Software Analysis

    TA_id Name Year Two_week_hours1234 Minnie 4 202345 Humpty 3 243456 Dumpty 4 303333 Minnie 3 125678 Mickey 2 16

    TA_id Course1234 Database Systems2345 Database Systems3456 Cloud Computing1234 Software Analysis5678 Web Programming Lang.2345 Software Analysis

    Hiring(TA_id, Name, Year, Two_week_hours) TA_Assignment(TA_id, Course)

    SELECT H.NameFrom Hiring AS H, TA_Assignment AS TWHERE H.TA_id = T.TA_id AND

    T.Course = "Database Systems" AND

    T.Course = "Software Analysis";

    Will this work?

    No.Returns empty set

  • Spring 2020 – University of Virginia 23© Praphamontripong

    (Challenging) Example: Self JoinsFind names of all TAs who are assigned to Database Systems and Software Analysis

    TA_id Name Year Two_week_hours1234 Minnie 4 202345 Humpty 3 243456 Dumpty 4 303333 Minnie 3 125678 Mickey 2 16

    TA_id Course1234 Database Systems2345 Database Systems3456 Cloud Computing1234 Software Analysis5678 Web Programming Lang.2345 Software Analysis

    Hiring(TA_id, Name, Year, Two_week_hours) TA_Assignment(TA_id, Course)

    SELECT H.NameFrom Hiring AS H, TA_Assignment AS T1,

    TA_Assignment AS T2WHERE H.TA_id = T1.TA_id AND

    H.TA_id = T2.TA_id ANDT1.Course = "Database Systems" ANDT2.Course = "Software Analysis";

    Will this work?

    NameMinnie

    Humpty

  • Spring 2020 – University of Virginia 24© Praphamontripong

    (Challenging) Example: Self JoinsFind names of all TAs who are assigned to Database Systems and Software Analysis

    TA_id Name Year Two_week_hours1234 Minnie 4 202345 Humpty 3 243456 Dumpty 4 303333 Minnie 3 125678 Mickey 2 16

    TA_id Course1234 Database Systems2345 Database Systems3456 Cloud Computing1234 Software Analysis5678 Web Programming Lang.2345 Software Analysis

    Hiring(TA_id, Name, Year, Two_week_hours) TA_Assignment(TA_id, Course)

    SELECT H.NameFrom Hiring AS H, TA_Assignment AS T1,

    TA_Assignment AS T2WHERE H.TA_id = T1.TA_id AND

    H.TA_id = T2.TA_id ANDT1.Course = "Database Systems" ANDT2.Course = "Software Analysis";

    Let’s consider each part of the query

    All pairs of courses a TA is assigned

    For each pair, check for courses

  • Spring 2020 – University of Virginia 25© Praphamontripong

    Wrap-UpJoin combines data across tables

    • Nested-loop• Natural join (most common)• Inner join (filter Cartesian product)• Outer joins (preserve non-matching tuples)• Self join pattern

    Different joining techniques can be used to achieve particular goals

    What’s next? • SQL – subqueries, triggers

    lefttable

    righttable

    LEFT JOIN

    lefttable

    righttable

    RIGHT JOIN

    lefttable

    righttable

    NATURAL JOIN

    lefttable

    righttable

    (INNER) JOIN