128
CSE 232A Graduate Database Systems 1 Fall 2019 Arun Kumar

CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

CSE 232A Graduate Database Systems

1

Fall 2019

Arun Kumar

Page 2: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

2009: Bachelors in CSE from IIT Madras

2009—16: MS and PhD in CS from UW-Madison

PhD thesis area: Data systems for ML workloads

2016-: Asst. Prof. at UC San Diego CSE 2019-: + Asst. Prof. at UC San Diego HDSI

Summers: 110F!

Winters: —40F!

Ahh! :)

About Myself

2

Page 3: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

3

What is this course about? Why take it?

Page 4: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

1. IBM’s Watson beats humans in Jeapordy!

Page 5: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

5

How did Watson achieve that?

Page 6: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

6

Watson devoured LOTS of data!

Page 7: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

7

2. “Structured” data with Google search results

Page 8: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

8

How does Google know that?

Page 9: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

9

Google also devours LOTS of data!

Page 10: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

3. Amazon’s “spot-on” recommendations

10

Page 11: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

11

How does Amazon know that?

Page 12: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

12

You guessed it! LOTS and LOTS of data!

Analysis

Page 13: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

13

And innumerable “traditional” applications

Page 14: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

14

Page 15: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

15

Large-scale data management systems are the cornerstone of many digital applications, both modern and traditional

Page 16: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

16

The Age of “Big Data”/“Data Science”

Page 17: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

17

Data data everywhere, All the wallets did shrink! Data data everywhere, Nor any moment to think?

Page 18: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

18

CSE 232A will get you thinking about the fundamentals of database systems

1. How are large structured datasets stored and organized?

2. How are “queries” handled? 3. How to make the system faster? 4. Deeper and more recent issues

Page 19: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

19

What this course is NOT about

❖ NOT a course on basics of relational algebra or SQL ❖ Take CSE 132A instead (pre-requisite for 232A!)

❖ NOT a course on how to use an RDBMS for database-backed applications (triggers, physical tuning, etc.) ❖ Take CSE 132B instead

❖ NOT a course on distributed systems or transactions ❖ Take CSE 223B instead

http://cseweb.ucsd.edu/classes/fa19/cse232-a

Page 20: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

20

And now for the (boring) logistics …

Page 21: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

21

Prerequisites

❖ CSE 132A (or equivalent) is essential ❖ CSE 120 is also helpful ❖ For all other cases, email the instructor with

proper justification. A waiver can be considered.

http://cseweb.ucsd.edu/classes/fa19/cse232-a

Page 22: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

22

Course Administrivia

❖ Lectures: MWF 3:00-3:50pm, Ledden Auditorium Attending ALL lectures is mandatory! ❖ Instructor: Arun Kumar; [email protected] Office hours: Wed 4-5pm, 3218 CSE (EBU3b) ❖ TAs: Nikos Koulouris, Kaiqi Yao, Aman Achpal, and

Allen Ordookhanians ❖ Piazza: TBD

http://cseweb.ucsd.edu/classes/fa19/cse232-a

Page 23: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

23

Grading

❖ Midterm Exam 1: 20% Date: Friday, October 25; in-class ❖ Midterm Exam 2: 20% Date: Monday, November 25; in-class ❖ Final Exam: 60% (cumulative)

Date: Friday, December 13, 3-6pm; Room TBD

❖ (Optional/Extra Credit) 6 Paper Reviews: 3%

http://cseweb.ucsd.edu/classes/fa19/cse232-a

Page 24: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

24

Course Outline

1. How are large structured datasets stored? Storage and file layout

2. How are “queries” handled? Indexing, sorting, relational operator implementations, and query processing

3. How to make the system faster? Query optimization and parallelism

4. Recent issues and trends Data systems for ML, Data integration and cleaning, semi-structured data, ML for RDBMSs

http://cseweb.ucsd.edu/classes/fa19/cse232-a

Page 25: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

25

The primary focus will be the relational data model and Relational Database Management Systems (RDBMS)

Page 26: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

26

Relational model in a nutshell

Basically, Relation:Table :: Pilot:Driver (okay, a bit more)

The model formalizes “operations” to manipulate relations

RatingID Rating Date UserID MovieID1 3.5 08/27/15 23294 202 4.0 07/20/15 4232 2933 2.5 08/02/15 54551 846… … … … …

Page 27: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

27

Relational model in a nutshell

Invented by E. F. Codd in 1970s at IBM Research in CA

Page 28: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

28

Relational DBMS in a nutshell

A software system to implement the relational model, i.e., enable users to manage data stored as relations

Page 29: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

29

Relational DBMS in a nutshell

First RDBMSs: System R (IBM) and Ingres (Berkeley) in 1970s

A rare photo of the original System R manual

Mike Stonebraker won the Turing Award in 2015!

Page 30: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

30

Relational DBMS in a nutshell

RDBMS software is now a USD 40+ billions/year industry! Numerous open source RDBMSs also popular

People still start companies about what are basically RDBMSs!

Page 31: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

31

Course Textbook

Prescribed: “Database Management Systems” 3rd Edition Raghu Ramakrishnan and Johannes Gehrke

Optional: “Database Systems: The Complete Book” by Garcia-Molina, Widom, and Ullman “Big Data Integration” by Dong and Srivastava

Aka The “Cow Book” Which cow are you?

Page 32: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

32

Tentative Course Schedule

Week Topic

0-1 Introduction; Data Storage (Disks; Files; New Hardware)

2-3 Indexing (B+ Tree; Hash Indexes; Learned Indexes); Sorting

3-4 Relational Operator Implementations; Query Processing

4 Midterm Exam 1 on Friday, 10/25

5-6 Query Optimization; Materialized Views

6-7 Parallel RDBMSs; Dataflow Systems; Cloud RDBMSs

7-8 Data Systems for ML Workloads

9 Midterm Exam 2 on Monday, 11/25

9-10 Data Integration and Cleaning; Semi-Structured Data

10 More ML for RDBMSs; Recap

11 Final Exam on Friday, 12/13

Page 33: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

33

General Dos and Do NOTs

Do: ❖ Raise your hand before asking questions during lectures ❖ Discuss papers in the reading list with peers and others ❖ Participate in class discussions; and on Piazza, if you like ❖ Use “CSE232A:” as subject prefix for all emails to me/TAs Do NOT: ❖ Plagiarize or share content for your paper reviews ❖ Harass, cut off, or be disrespectful to others in class ❖ Use email as primary communication mechanism for

doubts/questions instead of Office Hours ❖ Record or quote the instructor’s anecdotes out of class! ☺

Page 34: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

34

Questions?

Page 35: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

35

Example Relational DB: Netflix!

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 20… … … … …

UserID Name Age JoinDate79 Alice 23 01/10/13… … …

MovieID Name ReleaseDate Director20 Inception 07/13/2010 Christopher Nolan… … …

Ratings

Users

Movies

Page 36: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

36

Recap: Relational Model

Page 37: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

37

Relational Model: Basic Terms

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 202 4.0 07/20/15 4232 2933 2.5 08/02/15 54551 846… … … … …

What is a Relation?

A glorified table!

What are Attributes?

These things

What are Domains?

The mathematical “domains” for the attributes

Integers Real …

What is Arity?Ratings

Number of attributes

Page 38: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

38

Relational Model: Basic Terms

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 202 4.0 07/20/15 4232 2933 2.5 08/02/15 54551 846… … … … …

What are Tuples?What is Cardinality?

These thingsNumber of tuples

Ratings

Page 39: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

39

Referring to “tuples”: Two notations

1. Without using attribute names (positional/sequence) 2. Using attribute names (named/set)

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 202 4.0 07/20/15 4232 2933 2.5 08/02/15 54551 846… … … … …

 

Ratings (R)

t[1] = 3.5t.NumStars = 3.5

Page 40: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

40

Relational Model: Basic Terms

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 202 4.0 07/20/15 4232 2933 2.5 08/02/15 54551 846… … … … …

What is Schema?

The relation name, and the name and logical descriptions of the attributes (including domains)

Aka “metadata”

Ratings

Page 41: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

41

Relational Model: Basic Terms

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 202 4.0 07/20/15 4232 2933 2.5 08/02/15 54551 846… … … … …

What is an Instance?

A given relation populated with a set of tuples (loose analogy: schema:instance::type:value in PL)

Instance 1

RatingID NumStars Timestamp UserID MovieID3292 1.5 06/27/14 794 10

294122 4.0 07/10/14 232 32974423 0.5 03/08/14 8451 846

… … … … …

Instance 2

Ratings

Page 42: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

42

Relational Model: Basic Terms

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 20… … … … …

What is a Relational Database?

UserID Name Age JoinDate79 Alice 23 01/10/13… … …

MovieID Name ReleaseDate Director20 Inception 07/13/2010 Christopher Nolan… … …

A collection of relations; similarly, schema vs. instance

Ratings

Users

Movies

Page 43: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

43

“Write Operations” on a Relation

❖ Insert Add tuples to a relation ❖ Delete Remove tuples from a relation (typically based on

“predicate” matches, e.g., “NumStars <= 4.5” ❖ Modify Logically, deletes + inserts, but typically implemented

as in-place updates to a relation instance

Page 44: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

44

“Read Operations” on a Relation

❖ “Select” Select all tuples from Ratings with “UserID == 19” ❖ “Project” Select only Director attribute from Movies ❖ “Aggregate” Select Average of all NumStars in Ratings

And a few more formal “algebraic” operations …

Page 45: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

45

Recap: Relational Algebra

Page 46: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Basic Relational Operations

Select Project Rename Cross Product (aka Cartesian Product) Set Operations: Union Set Difference

46

Page 47: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Select

47

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 202 4.0 07/20/15 80 203 2.5 08/02/14 79 164 4.5 03/05/14 80 16

Ratings (R)

Example: Get all ratings with 4.0 or more stars

“Selection condition/predicate”Select

“Operator”

Page 48: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Select

48

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 202 4.0 07/20/15 80 203 2.5 08/02/14 79 164 4.5 03/05/14 80 16

R

RatingID NumStars Timestamp UserID MovieID2 4.0 07/20/15 80 204 4.5 03/05/14 80 16

R’

Schema preservedSubset of tuples

(satisfying selection condition)

Page 49: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Complex Selection Conditions

49

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 202 4.0 07/20/15 80 203 2.5 08/02/14 79 164 4.5 03/05/14 80 16

R

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 20

R’

Page 50: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Basic Relational Operations

Select Project Rename Cross Product (aka Cartesian Product) Set Operations: Union Set Difference

50

Page 51: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Project

51

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 202 4.0 07/20/15 80 203 2.5 08/02/14 79 164 4.5 03/05/14 80 16

Ratings (R)

Example: Get all MovieIDs

“Projection list”

Page 52: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Project

52

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 202 4.0 07/20/15 80 203 2.5 08/02/14 79 164 4.5 03/05/14 80 16

R

MovieID2016

R’ Schema reduced (to projection list)

Tuple values “deduplicated” (slightly different semantics in SQL)

Page 53: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Composition of Relational Ops

53

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 202 4.0 07/20/15 80 203 2.5 08/02/14 79 164 4.5 03/05/14 80 16

R

Example: Get UserID and NumStars of ratings with less than 3 stars

UserID NumStars79 2.5

R’RelOp(Relation) gives a Relation

Page 54: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Basic Relational Operations

Select Project Rename Cross Product (aka Cartesian Product) Set Operations: Union Set Difference

54

Page 55: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Rename

55

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 202 4.0 07/20/15 80 203 2.5 08/02/14 79 164 4.5 03/05/14 80 16

Ratings (R)

Example: Rename Timestamp to RateDate

⇢C(2�>RateDate)(R)

⇢RatingID,NumStars,RateDate,UserID,MovieID(R)

Page 56: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Rename

56

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 202 4.0 07/20/15 80 203 2.5 08/02/14 79 164 4.5 03/05/14 80 16

R

R’

Instance preserved Schema modified

RatingID NumStars RateDate UserID MovieID1 3.5 08/27/15 79 202 4.0 07/20/15 80 203 2.5 08/02/14 79 164 4.5 03/05/14 80 16

⇢C(2�>RateDate)(R)

Page 57: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Basic Relational Operations

Select Project Rename Cross Product (aka Cartesian Product) Set Operations: Union Set Difference

57

Page 58: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Cross Product

58

UserID Name Age JoinDate79 Alice 23 01/10/1380 Bob 41 05/10/13

MovieID Name ReleaseYear Director20 Inception 2010 Christopher Nolan16 Avatar 2009 Jim Cameron

Users (U)

Movies (M)

❖ Cartesian product (construct all pairs of tuples across tables) ❖ Schema of output “concatenates” the input schemas ❖ Be careful with attribute name conflicts! Use Rename op

Page 59: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Cross Product

59

UserID Name Age JoinDate79 Alice 23 01/10/1380 Bob 41 05/10/13

MovieID Name ReleaseYear Director20 Inception 2010 Christopher Nolan16 Avatar 2009 Jim Cameron

Users (U)

Movies (M)

UserID U.Name Age JoinDate MovieID M.Name ReleaseYear

Director

79 Alice 23 01/10/13 20 Inception 2010 Christopher Nolan

79 Alice 23 01/10/13 16 Avatar 2009 Jim Cameron80 Bob 41 05/10/13 20 Inception 2010 Christopher

Nolan80 Bob 41 05/10/13 16 Avatar 2009 Jim Cameron

⇢C(1�>U .Name(U)⇥ ⇢C(1�>M .Name(M)

Page 60: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Basic Relational Operations

Select Project Rename Cross Product (aka Cartesian Product) Set Operations: Union Set Difference

60

Page 61: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Union

61

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 202 4.0 07/20/15 80 203 2.5 08/02/14 79 16

R1

RatingID NumStars Timestamp UserID MovieID3 2.5 08/02/14 79 164 4.5 03/05/14 80 165 5.0 06/09/13 135 20

R2Union of sets of tuples (instances)

Inputs must have identical schema: “Union-compatible”

Page 62: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Union

62

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 202 4.0 07/20/15 80 203 2.5 08/02/14 79 164 4.5 03/05/14 80 165 5.0 06/09/13 135 20

 

Page 63: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Basic Relational Operations

Select Project Rename Cross Product (aka Cartesian Product) Set Operations: Union Set Difference

63

Page 64: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Set Difference

64

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 202 4.0 07/20/15 80 203 2.5 08/02/14 79 16

R1

RatingID NumStars Timestamp UserID MovieID3 2.5 08/02/14 79 164 4.5 03/05/14 80 165 5.0 06/09/13 135 20

R2

Set difference of sets of tuples (instances) Inputs must be “Union-compatible”

Page 65: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Set Difference

65

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 202 4.0 07/20/15 80 203 2.5 08/02/14 79 16

R1

RatingID NumStars Timestamp UserID MovieID3 2.5 08/02/14 79 164 4.5 03/05/14 80 165 5.0 06/09/13 135 20

R2

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 202 4.0 07/20/15 80 20

R’ = R1 – R2

Page 66: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Basic Relational Operations

Select Project Rename Cross Product (aka Cartesian Product) Set Operations: Union Set Difference

66

Page 67: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Derived and Other Relational Ops

Set Operation: Intersection Join Group By Aggregate

67

Page 68: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Intersection

68

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 202 4.0 07/20/15 80 203 2.5 08/02/14 79 16

R1

RatingID NumStars Timestamp UserID MovieID3 2.5 08/02/14 79 164 4.5 03/05/14 80 165 5.0 06/09/13 135 20

R2

Set intersection of sets of tuples (instances) Inputs must be “Union-compatible”

Page 69: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Intersection

69

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 202 4.0 07/20/15 80 203 2.5 08/02/14 79 16

R1

RatingID NumStars Timestamp UserID MovieID3 2.5 08/02/14 79 164 4.5 03/05/14 80 165 5.0 06/09/13 135 20

R2

RatingID NumStars Timestamp UserID MovieID3 2.5 08/02/14 79 16

 

Page 70: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Derived and Other Relational Ops

Set Operation: Intersection Join Group By Aggregate

70

Page 71: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Join

71

❖ Equivalent Select on Cross Product, but “bypasses” full X

❖ Perhaps the most intensively studied Rel Op! ❖ Several “types” of Joins:

Natural Join and Equi-Join Condition Join (aka Theta Join) Semi-Join, Inner Join, Outer Join, Anti-Join, etc.

R ./JoinCondition M

�JoinCondition(R⇥M)

Page 72: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Natural Join

72

MovieID Name ReleaseYear Director20 Inception 2010 Christopher Nolan16 Avatar 2009 Jim Cameron

Movies (M)

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 202 4.0 07/20/15 80 203 2.5 08/02/14 79 164 4.5 03/05/14 80 16

Ratings (R)

R ./ M

Page 73: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Natural Join

73

RatingID

NumStars

Timestamp

UserID

MovieID

Name ReleaseYear

Director

1 3.5 08/27/15 79 20 Inception 2010 Christopher Nolan2 4.0 07/20/15 80 20 Inception 2010 Christopher Nolan3 2.5 08/02/14 79 16 Avatar 2009 Jim Cameron4 4.5 03/05/14 80 16 Avatar 2009 Jim Cameron

❖ “Join attributes”: attributes that determine matching tuples Have same name in both inputs! “MovieID” in R and M

❖ Implicit equality condition on join attributes If > 1 pair, implicit logical “and” of all equality terms ❖ Output schema concatenates input schemas But join attributes appear only once in output (Project)

R ./ M

Page 74: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Equi-Join

74

❖ Generalization of the Natural Join

❖ Attribute names of “join attributes” need not be the same ❖ EqualityCondition is a general boolean expression (logical

“and”, and/or “or”) of terms with equality predicates only ❖ Join attributes from both R and M in output (no Project)

Perhaps the most important and common type of Join! Lots of R&D on efficient implementations!

R ./EqualityCondition M

Page 75: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Equi-Join: Example

75

J K10 x20 y30 x

R1(J,K)P Qx 4y 9x 8

R2(P,Q) T(J,K,P,Q)J K P Q10 x x 420 y y 930 x x 810 x x 830 x x 4

T(J,K, P,Q) = R1 ./K=P R2

Page 76: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

(Primary) Key-Foreign Key Join

76

❖ A special kind of equi-join ❖ One of the join attributes is the (Primary) Key of an input

relation; the other is a Foreign Key in the other relation

RatingID NumStars Timestamp UID MovieID

UserID Name Age JoinDate

Ratings

Users

Also a common and important (sub) type of join with even more specialized efficient implementations!

Ratings ./UID=UserID Users

Page 77: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Condition Join (aka “Theta” Join)

77

❖ Generalization of the Equi-Join

 

R ./JoinCondition M

Page 78: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Condition Join: Example

78

J K10 x20 y30 x

R1(J,K)P Qx 4y 9x 12

R2(P,Q) T(J,K,P,Q)J K P Q10 x x 420 y x 420 y y 930 x x 430 x y 930 x x 12

Perhaps the most difficult type of Join to implement efficiently!

T(J,K, P,Q) = R1 ./J/2>Q R2

Page 79: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Join Expressions

79

Can compose many joins into a single complex expression

RatingID NumStars Timestamp UserID MovieID

UserID UName Age JoinDate

MovieID Name ReleaseYear Director

Ratings

UsersMovies

 

Q. What do we get as the output?

UserID

UName

Age

JoinDate

RatingID

NumStars

Timestamp

MovieID

Name ReleaseYEar

Director

AllStuff

Page 80: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

80

Taxonomy of Joins

All kinds of joins

Inner joins

Outer joins Semi joins Anti joins

Theta joinsEqui joins

Natural joins

Key-Foreign Key Joins“Snowflake” joins

“Star” joins

Page 81: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Derived and Other Relational Ops

Set Operation: Intersection Join Group By Aggregate

81

Page 82: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Group By Aggregate

82

❖ NOT a part of relational algebra, but “Extended RA”! ❖ Useful for “analytics” queries that aggregate numerical data

RatingID NumStars Timestamp UserID MovieIDRatings

What is the average rating for each movie? How many movies has each user rated?

❖ Standard 5 numerical aggregations supported in SQL:

Count, Sum, Average, Maximum, and Minimum Extra: Median, Mode, Variance, Standard Deviation, etc.

Page 83: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Group By Aggregate

83

“Grouping Attributes” (Subset of R’s attributes)

A numerical attribute in R“Aggregate Function”

(SUM, COUNT, AVG, MAX, MIN)

❖ Output schema will have X and an extra numerical attribute (result of the aggregate function)

❖ Can list multiple aggregate functions in the same operation

�X,Agg(Y )(R)

Page 84: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Group By Aggregate

84

RatingID NumStars Timestamp UserID MovieID1 3.5 08/27/15 79 202 4.0 07/20/15 80 203 2.5 08/02/14 79 164 4.5 03/05/14 80 16

R

MovieID AVG(NumStars)20 3.7516 3.5

R’

What is the average rating for each movie?

�MovieID,AV G(NumStars)(R)

Page 85: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Group By Aggregate

85

AVG(Age)25.75

U’

What is the average age of the users?

UserID Name Age JoinDate79 Alice 23 01/10/1380 Bob 41 05/10/13

123 Carol 19 08/09/14420 Dan 20 03/01/15

Users (U)

The set of Grouping Attributes can be empty too!

�AVG(Age)(U)

Page 86: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Derived and Other Relational Ops

Set Operation: Intersection Join Group By Aggregate

86

Page 87: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

87

Recap: SQL

Page 88: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Basic Form of an SQL Query

88

SELECT [DISTINCT] target-list FROM relation-list [WHERE condition]

List of attributes to projectOptional

List of relations (possibly with “aliases”)

Selection/join condition (optional)

Page 89: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

What does it mean logically?

89

SELECT [DISTINCT] target-list FROM relation-list [WHERE condition]

1. Cross-product of relations in relation-list 2. If condition given, apply it to filter out tuples 3. Remove attributes not present in target-list 4. If DISTINCT given, deduplicate tuples in result

The above is only a logical interpretation. It is NOT a “plan” an RDBMS would use in general to run an SQL query!

Page 90: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

90

Example SQL Query

SELECT M.Name FROM Movies M WHERE M.Year = 2013

Movies (M)MovieID Name Year Director

20 Inception 2010 Christopher Nolan16 Avatar 2009 Jim Cameron53 Gravity 2013 Alfonso Cuaron74 Blue Jasmine 2013 Woody Allen

Example: Get the names of movies released in 2013

⇡Name(�Y ear=2013(M))

Page 91: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

91

Example SQL Query

SELECT M.Name FROM Movies M WHERE M.Year = 2013

Movies (M)MovieID Name Year Director

20 Inception 2010 Christopher Nolan16 Avatar 2009 Jim Cameron53 Gravity 2013 Alfonso Cuaron74 Blue Jasmine 2013 Woody Allen

NameGravity

Blue Jasmine

Page 92: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

92

Example SQL Query

SELECT * FROM Movies M WHERE M.Year = 2013

Movies (M)MovieID Name Year Director

20 Inception 2010 Christopher Nolan16 Avatar 2009 Jim Cameron53 Gravity 2013 Alfonso Cuaron74 Blue Jasmine 2013 Woody Allen

MovieID Name Year Director53 Gravity 2013 Alfonso Cuaron74 Blue Jasmine 2013 Woody Allen

Page 93: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

93

Example SQL Query

SELECT M.Name FROM Movies M WHERE M.Year <> 2013

Movies (M)MovieID Name Year Director

20 Inception 2010 Christopher Nolan16 Avatar 2009 Jim Cameron53 Gravity 2013 Alfonso Cuaron74 Blue Jasmine 2013 Woody Allen

Example: Get the names of movies from years other than 2013

⇡Name(�Y ear 6=2013(M))

Page 94: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

94

Example SQL Query

SELECT M.Year FROM Movies M

Movies (M)MovieID Name Year Director

20 Inception 2010 Christopher Nolan16 Avatar 2009 Jim Cameron53 Gravity 2013 Alfonso Cuaron74 Blue Jasmine 2013 Woody Allen

Example: For which years do we have movie data?⇡Y ear(M)

Page 95: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

95

Example SQL Query

SELECT M.Year FROM Movies M

Movies (M)MovieID Name Year Director

20 Inception 2010 Christopher Nolan16 Avatar 2009 Jim Cameron53 Gravity 2013 Alfonso Cuaron74 Blue Jasmine 2013 Woody Allen

Year2010200920132013

SQL allows repetitions of tuples in a relation! Not the same semantics as RA’s Project

Called “bag semantics” vs. RA’s set semantics

Page 96: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

96

DISTINCT in SQL

SELECT DISTINCT M.Year FROM Movies M

Movies (M)MovieID Name Year Director

20 Inception 2010 Christopher Nolan16 Avatar 2009 Jim Cameron53 Gravity 2013 Alfonso Cuaron74 Blue Jasmine 2013 Woody Allen

Year201020092013

DISTINCT needed to achieve set semantics of RA’s Project in SQL

Page 97: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

97

Aliases in SQL

Movies (M)MovieID Name Year Director

20 Inception 2010 Christopher Nolan16 Avatar 2009 Jim Cameron53 Gravity 2013 Alfonso Cuaron74 Blue Jasmine 2013 Woody Allen

SELECT M.Name FROM Movies M WHERE M.Year = 2013

Why bother with the alias? Not needed here!

SELECT Name FROM Movies WHERE Year = 2013

Page 98: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

98

Aliases in SQL – Useful for Joins!

Movies (M)MovieID Name Year DirectorID

SELECT M.Name FROM Movies M, Directors D WHERE D.Name = “Jim Cameron” AND M.DirectorID = D.DID

Aliases help disambiguate attributes with the same name from multiple relations (or even a self-join!)

Example: Get names of movies directed by “Jim Cameron”

Directors (D)DID Name Age

Page 99: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

99

More SQL Examples

Example: Get names of movies released in 2013 by Woody Allen or some other director 50 years or older

SELECT M.Name FROM Movies M, Directors D WHERE (D.Name = “Woody Allen” OR D.Age >= 50) AND M.Year = 2013 AND M.DirectorID = D.DID

Movies (M)MovieID Name Year DirectorID

Directors (D)DID Name Age

Page 100: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

100

LIKE in SQL

SELECT DISTINCT M.Director FROM Movies M WHERE M.Name LIKE “Blue%”

Movies (M)MovieID Name Year Director

20 Inception 2010 Christopher Nolan16 Avatar 2009 Jim Cameron53 Gravity 2013 Alfonso Cuaron74 Blue Jasmine 2013 Woody Allen

Example: Get the directors of movies that start with “Blue”

“%” matches any number of characters; “_” matches one

Page 101: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

101

ORDER BY in SQL

SELECT M.Name FROM Movies M WHERE M.Year = 2013

Movies (M)MovieID Name Year Director

20 Inception 2010 Christopher Nolan16 Avatar 2009 Jim Cameron53 Gravity 2013 Alfonso Cuaron74 Blue Jasmine 2013 Woody Allen

NameGravity

Blue Jasmine

ORDER BY M.NameName

Blue JasmineGravity

Useful for data readability Ordering defined by domain semantics Can specify DESC; multiple attributes

Page 102: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

102

LIMIT in SQL

SELECT M.Name FROM Movies M WHERE M.Year >= 2010

MovieID Name Year Director20 Inception 2010 Christopher Nolan16 Avatar 2009 Jim Cameron53 Gravity 2013 Alfonso Cuaron74 Blue Jasmine 2013 Woody Allen

ORDER BY M.Year

YearInceptionGravity

Blue Jasmine

Also useful for data readability Prevents “flooding” of screen with data Be wary of using it without ORDER BY!

LIMIT 2

Page 103: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

103

Aggregate Functions in SQL

SELECT COUNT(*) FROM Movies M WHERE M.Year > 2010

Movies (M)MovieID Name Year Director

20 Inception 2010 Christopher Nolan16 Avatar 2009 Jim Cameron53 Gravity 2013 Alfonso Cuaron91 Interstellar 2014 Christopher Nolan

How many movies came out after 2010?

Page 104: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

104

Aggregate Functions in SQL

SELECT COUNT(*) FROM Movies M WHERE M.Year > 2010

Movies (M)MovieID Name Year Director

20 Inception 2010 Christopher Nolan16 Avatar 2009 Jim Cameron53 Gravity 2013 Alfonso Cuaron91 Interstellar 2014 Christopher Nolan

How many movies came out after 2010?

COUNT(*)2

Page 105: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

105

5 Native Aggregate Functions in SQL

❖ COUNT ([DISTINCT] attribute) ❖ AVG ([DISTINCT] attribute) ❖ SUM ([DISTINCT] attribute) ❖ MAX (attribute) ❖ MIN (attribute)

Page 106: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

106

Aggregate Functions in SQL

SELECT COUNT(DISTINCT M.Director) FROM Movies M

Movies (M)MovieID Name Year Director

20 Inception 2010 Christopher Nolan16 Avatar 2009 Jim Cameron53 Gravity 2013 Alfonso Cuaron91 Interstellar 2014 Christopher Nolan

How many directors do we have?

Page 107: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Aggregate Functions in SQL

107

RatingID Stars UserID MovieID1 3.5 79 422 4.0 80 203 2.5 79 534 4.5 123 42

Ratings (R)

SELECT R.MovieID, MAX(R.Stars) FROM Ratings R

Which MovieID(s) have the highest rating?

Other attributes NOT allowed in the target-list as such!

Page 108: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Aggregate Functions in SQL

108

RatingID Stars UserID MovieID1 3.5 79 422 4.0 80 203 2.5 79 534 4.5 123 42

Ratings (R)

SELECT DISTINCT R.MovieID FROM Ratings R WHERE R.Stars = (SELECT MAX(R2.Stars) FROM Ratings R2)

Which MovieID(s) have the highest rating?

Page 109: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Group By Aggregate in SQL

109

SELECT [DISTINCT] target-list FROM relation-list [WHERE condition] GROUP BY grouping-list HAVING group-condition

X

Condition on each group in aggregate

target-list must be in this form: X’, Agg(Y)

Subset of X

�X,Agg(Y )(R)

Page 110: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Group By Aggregate in SQL

110

RatingID Stars UserID MovieID1 3.5 79 422 4.0 80 203 2.5 79 534 4.5 123 42

Ratings (R)

What is the average rating for each movie?

SELECT R.MovieID, AVG(R.Stars) AS AvgRating FROM Ratings R GROUP BY R.MovieID

�MovieID,AVG(Stars)(R)

Page 111: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Group By Aggregate in SQL

111

RatingID Stars UserID MovieID1 3.5 79 422 4.0 80 203 2.5 79 534 4.5 123 42

Ratings (R)

SELECT R.MovieID, AVG(R.Stars) AS AvgRating FROM Ratings R GROUP BY R.MovieID

MovieID AvgRating20 4.042 4.053 2.5

One tuple in output per unique value of R.MovieID (aka “group”)

Page 112: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

112

Q1. Which of the following is not a basic operation in relational algebra?

ABCD

�[⇢./

Review Questions

Page 113: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

113

Q2. What type of join is the following NOT an example of?

RID NumStars RateDate UID MIDUID UName Age JoinDateMID MName Year Director

RUM

ABCD

R ./ U

Inner join

Outer join

Natural join

Theta join

Review Questions

Page 114: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

114

Q3. How many movies did “Jim Cameron” direct?

�COUNT(⇤)(�Director=“Jim Cameron”(M))

�ReleaseYear ,COUNT(⇤)(�Director=“Jim Cameron”(M))

�COUNT(⇤)(�Director=“Jim Cameron”(M ./ R))

�Director=“Jim Cameron”(�COUNT(⇤)(M))A

B

C

D

RID NumStars RateDate UID MIDUID UName Age JoinDateMID MName Year Director

RUM

Review Questions

Page 115: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

115

Q4. Which of the following attributes is a primary key?

RID NumStars RateDate UID MIDUID UName Age JoinDateMID MName Year Director

RUM

ABCD

U.UID

R.UID

R.MID

M.Year

Review Questions

Page 116: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

116

Q5. Which of the following SQL features do not have a counterpart operation in extended relational algebra?

ABCD

SELECT DISTINCT

WHERE

LIMIT

GROUP BY

Review Questions

Page 117: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

117

Q6. What is the cardinality of this query’s output?

AB

CD

1

2

3

4

RID NumStars RateDate UID MID1 3.5 08/27/15 79 202 4.0 07/20/15 80 203 2.5 08/02/14 79 164 4.5 03/05/14 80 165 5.0 06/09/13 135 20

SELECT COUNT(DISTINCT MID) FROM R

R

Review Questions

Page 118: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

118

Q7. Get the year and director of all movies from the last decade that have the term “Avengers” in their title.

ASELECT Year, Director from M WHERE Year >= 2008 AND MName LIKE “%Avengers%”

MID MName Year DirectorM

BSELECT Year, Director from M WHERE Year >= 2008 AND MName LIKE “%Avengers”

CSELECT Year, Director from M WHERE Year >= 2008 AND MName LIKE “Avengers%”

DSELECT Year, Director from M WHERE Year >= 2008 AND MName = “Avengers”

Review Questions

Page 119: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

119

RatingID Stars UserID MovieIDRatings (R)

Q8. Write an SQL query to get the number of 5-star ratings for each movie directed by “Christopher Nolan”

SELECT R.MovieID, COUNT(R.Stars) AS NumHighRatings FROM Ratings R, Movies M WHERE M.Director = “Christopher Nolan” AND R.MovieID = M.MovieID

AND R.Stars = 5 GROUP BY R.MovieID

Movies (M) MovieID Name Year Director

Review Questions

Page 120: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

120

Advanced SQL Operations (Optional)

Page 121: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

UNION in SQL

121

Get the IDs of users that have rated a movie directed by “Ang Lee” or a movie that released in 2013

RID Stars UID MID

MID Name Year Director

Ratings (R)

Movies (M)

SELECT R.UID FROM Ratings R, Movies M WHERE R.MID = M.MID AND M.Director = “Ang Lee” UNION SELECT R.UID FROM Ratings R, Movies M WHERE R.MID = M.MID AND M.Year = 2013

Union-compatible!

Page 122: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Semantics of UNION in SQL

122

MID Name Year Director20 Inception 2010 Christopher Nolan42 Life of Pi 2012 Ang Lee53 Gravity 2013 Alfonso Cuaron

RID Stars UID MID1 3.5 79 422 4.0 80 203 2.5 79 534 4.5 123 42 R M

SELECT R.UID FROM Ratings R, Movies M WHERE R.MID = M.MID AND M.Director = “Ang Lee” UNION SELECT R.UID FROM Ratings R, Movies M WHERE R.MID = M.MID AND M.Year = 2013

UID79

UID79123

Page 123: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

Semantics of UNION in SQL

123

UNIONUID79123

UID79

UID79123

UNION implicitly deduplicates tuples (unlike SELECT)!

Q. How to retain duplicates with UNION?

UNION ALLUID79123

UID79

UID7979123

Page 124: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

INTERSECT in SQL

124

MID Name Year Director20 Inception 2010 Christopher Nolan42 Life of Pi 2012 Ang Lee53 Gravity 2013 Alfonso Cuaron

RID Stars UID MID1 3.5 79 422 4.0 80 203 2.5 79 534 4.5 123 42 R M

SELECT R.UID FROM Ratings R, Movies M WHERE R.MID = M.MID AND M.Director = “Ang Lee” INTERSECT SELECT R.UID FROM Ratings R, Movies M WHERE R.MID = M.MID AND M.Year = 2013

UID79

UID79123

UID79

INTERSECT

Page 125: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

EXCEPT (Set Difference) in SQL

125

MID Name Year Director20 Inception 2010 Christopher Nolan42 Life of Pi 2012 Ang Lee53 Gravity 2013 Alfonso Cuaron

RID Stars UID MID1 3.5 79 422 4.0 80 203 2.5 79 534 4.5 123 42 R M

SELECT R.UID FROM Ratings R, Movies M WHERE R.MID = M.MID AND M.Director = “Ang Lee” EXCEPT SELECT R.UID FROM Ratings R, Movies M WHERE R.MID = M.MID AND M.Year = 2013

UID79

UID79123 UID

123

EXCEPT

Page 126: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

The Contentious Bag Semantics!

126

UID79797912312380

UID7912312392

UNION ALL UID797979791231231231238092

Add the number of repetitions

Page 127: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

The Contentious Bag Semantics!

127

UID79797912312380

UID7912312392

EXCEPT ALL UID797980Subtract the number

of repetitions

Page 128: CSE 232A Graduate Database Systemscseweb.ucsd.edu/classes/fa19/cse232-a/slides/Topic0... · 2019-10-09 · 19 What this course is NOT about NOT a course on basics of relational algebra

The Contentious Bag Semantics!

128

UID79797912312380

UID7912312392

INTERSECT ALL UID79123123Minimum of the

number of repetitions