CMPT354: DatabaseSystemI · SQL provides pattern matching support with the LIKE operator and two...

Preview:

Citation preview

CMPT 354:Database System I

Lecture 3. SQL Basics

1

Announcements!

• About Piazza• 97 enrolled (as of today)• Postsareanonymous to classmates

• You should have starteddoingA1• Please come to office hours if you need any help

2

SQLMotivation• Darktimesin2000s• Arerelationaldatabasesdead?

• Now,asbefore:everyonesellsSQL• Pig,Hive,Impala• SparkSQL

• NoSQL• “NonSQL”• “Not-Only-SQL”• “Not-Yet-SQL”

3

SQL:Introduction

• “S.Q.L.”or“sequel”• Supportedbyallmajorcommercialdatabasesystems• Standardized– manynewfeaturesovertime• Declarativelanguage

4

5

SQLisa…

• DataDefinitionLanguage(DDL)• Definerelationalschema• Create/alter/deletetablesandtheirattributes

• DataManipulationLanguage(DML)• Insert/delete/modifytuplesintables• Queryoneormoretables– discussednext!

Outline

• Single-table Queries• The SFW query• Usefuloperators:DISTINCT,ORDER BY,LIKE• Handlemissingvalues:NULLs

• Multiple-table Queries• Foreign key constraints• Joins: basics• Joins: SQL semantics

6

• To write the query, ask yourself three questions:• Which table are you interested in?• Which rows are you interested in?• Which columns are you interested in?

The SFWQuery

7

SELECT <columns>FROM <table name>WHERE <conditions>

Conditions

• Which rows are you interested in?• WHERE gpa > 3.5• WHERE school = ‘SFU’ AND gpa > 3.5• WHERE (school = ‘SFU’ OR school = ‘UBC’) AND gpa > 3.5• WHERE age *365 > 7500

8

SELECT <columns>FROM <table name>WHERE <conditions>

Columns

• Which columns are you interested in?• SELECT *• SELECT name, age• SELECT name as studentName, age• SELECT name, age *365 as ageDay

9

SELECT <columns>FROM <table name>WHERE <conditions>

AFewDetails

• SQLcommands arecaseinsensitive:• Same:SELECT,Select,select• Same:Student,student• Same:gpa, GPA

• Values arenot:• Different: 'SFU','sfu'

• SQLstringsareenclosedinsinglequotes• e.g.name='Mike’• Singlequotesinastringcanbespecifiedusinganinitialsinglequotecharacterasanescape• author='ShaqO''Neal'

• Stringscanbecomparedalphabetically withthecomparisonoperators• e.g.'fodder'<'foo'isTRUE

10

11

DISTINCT:EliminatingDuplicates

SELECT DISTINCT SchoolFROM Students

Versus

SELECT SchoolFROM Students

School

SFU

UBC

UT

School

SFU

SFU

UBC

UT

UT

12

ORDERBY:SortingtheResults

SELECT name, gpa, ageFROM StudentsWHERE school = 'SFU'ORDER BY gpa DESC, age ASC

• TheoutputofanSQLquerycanbeordered• Byanynumberofattributes,and• Ineitherascendingordescendingorder

• Thedefaultistouseascendingorder,thekeywordsASCandDESC,followingthecolumnname,setstheorder

13

LIKE:SimpleStringPatternMatching

SELECT *FROM StudentsWHERE name LIKE ' Sm_t%'

SQLprovidespatternmatchingsupportwiththeLIKEoperatorandtwosymbols• The% symbolstandsforzeroormorearbitrarycharacters• The_ symbolstandsforexactlyonearbitrarycharacter• The% and _ characterscanbeescapedwith\

• E.g., nameLIKE ’Michael\_Jordan'

Exercise - 1

• Which names will be returned?

14

SELECT *FROM StudentsWHERE name LIKE 'Sm_t%'

1. Smit2. SMIT3. Smart4. Smith5. Smythe6. Smut7. Smeath8. Smt

Exercise - 1

• Which names will be returned?

15

SELECT *FROM StudentsWHERE name LIKE 'Sm_t%'

1. Smit2. SMIT3. Smart4. Smith5. Smythe6. Smut7. Smeath8. Smt

1, 4, 5, 6

16

NULLSinSQL

• Wheneverwedon’thaveavalue,wecanputaNULL• Canmeanmanythings:

• Valuedoesnotexists• Valueexistsbutisunknown• Valuenotapplicable• Etc.

• NULLconstraints

CREATE TABLE Students (name CHAR(20) NOT NULL,age CHAR(20) NOT NULL,gpa FLOAT

)

What will happen?

name age gpa

Mike 20 4.0

Joe 18 NULL

Alice 21 3.8

17

1. SELECT gpa*100 FROM students2. SELECT name FROM students WHERE gpa > 3.53. SELECT name FROM students WHERE age > 15 OR gpa > 3.5

Two Important Rules

• Arithmeticoperations (+, -, *, /) onnullsreturnNULL• NULL *100

• NULL• NULL*0

• NULL

• ComparisonswithnullsevaluatetoUNKNOWN• NULL > 3.5

• UNKNOWN• NULL=NULL

• UNKNOWN

18

1. SELECT gpa*100 FROM students

2. SELECT gpa*0 FROM students

3. SELECT name FROM students WHERE gpa > 3.5

4. SELECT name FROM students WHERE gpa = NULL

Combinations of true, false, unknown

• Truthvaluesforunknown results• true OR unknown =true,• falseOR unknown= unknown,• unknown OR unknown =unknown ,• true AND unknown =unknown,• false AND unknown =false,• unknown AND unknown =unknown

• TheresultofaWHERE clauseistreatedasfalse ifitevaluatestounknown• WHERE unknownà false

19

SELECT *FROM students WHEREage > 15 OR gpa > 3.5

SELECT *FROM students WHEREage > 15 AND gpa > 3.5

What will happen?

name age gpa

Mike 20 4.0

Joe 18 NULL

Alice 21 3.8

20

1. SELECT gpa*100 FROM students2. SELECT name FROM students WHERE gpa > 3.53. SELECT name FROM students WHERE age > 15 OR gpa > 3.5

gpa

400

NULL

380

name

Mike

Alice

name

Mike

Joe

Alice

21

Exercise - 2

• Will it return all students?

SELECT *FROM StudentsWHERE age < 25 OR age >= 25

22

Exercise - 2

• Will it return all students?

SELECT *FROM StudentsWHERE age < 25 OR age >= 25

OR age is NULL

Therearespecialoperatorstotestfornullvalues• ISNULLtestsforthepresenceofnullsand• ISNOTNULL testsfortheabsenceofnulls

Outline

• Single-table Queries• The SFW query• Other useful operators: DISTINCT, LIKE, ORDER BY• NULLs

• Multiple-table Queries• Foreign key constraints• Joins: basics• Joins: SQL semantics

23

• Foreign-keyconstraint:• student_id referencessid

ForeignKeyconstraints

sid name gpa101 Bob 3.2123 Mary 3.8

student_id cid grade123 354 A123 454 A+156 354 A

Students Enrolled

24

• Foreign-keyconstraint:• student_id referencessid

ForeignKeyconstraints

sid name gpa101 Bob 3.2123 Mary 3.8156 Mike 3.7

Students Enrolledstudent_id cid grade

123 354 A123 454 A+156 354 A

25

DeclaringForeignKeys

CREATE TABLE Enrolled(student_id CHAR(20),cid CHAR(20),grade CHAR(10),PRIMARY KEY (student_id, cid),FOREIGN KEY (student_id) REFERENCES Students(sid)

)

student_id cid grade123 354 A123 454 A+156 354 A

26

Insertoperations• WhatifweinsertatupleintoEnrolled,butnocorrespondingstudent?• INSERTisrejected

27

sid name gpa123 Mary 3.8156 Mike 3.7

Students Enrolledstudent_id cid grade

123 354 A123 454 A+156 354 A190 354 A

Delete operations• Whatifwedeleteastudent,whohasenrolledcourses?• Disallowthedelete(ONDELETERESTRICT)

28

sid name gpa123 Mary 3.8

Students Enrolledstudent_id cid grade

123 354 A123 454 A+156 354 A

156 Mike 3.7

ONDELETERESTRICT

CREATE TABLE Enrolled(student_id CHAR(20),cid CHAR(20),grade CHAR(10),PRIMARY KEY (student_id, cid),FOREIGN KEY (student_id) REFERENCES Students(sid)ON DELETE RESTRICT

)

student_id cid grade123 354 A123 454 A+156 354 A

29

Delete operations• Whatifwedeleteastudent,whohasenrolledcourses?• Removeallofthecoursesforthatstudent(ONDELETE

CASCADE)

30

sid name gpa123 Mary 3.8

Students Enrolledstudent_id cid grade

123 354 A123 454 A+156 Mike 3.7156 354 A

ONDELETECASCADE

CREATE TABLE Enrolled(student_id CHAR(20),cid CHAR(20),grade CHAR(10),PRIMARY KEY (student_id, cid),FOREIGN KEY (student_id) REFERENCES Students(sid)ON DELETE CASCADE

)

student_id cid grade123 354 A123 454 A+156 354 A

31

Delete operations• Whatifwedeleteastudent,whohasenrolledcourses?• Set Foreign Key to NULL (ONDELETESETNULL)

32

sid name gpa123 Mary 3.8

Students Enrolledstudent_id cid grade

123 354 A123 454 A+156 354 A

156 Mike 3.7NULL 354 A

Interestingly, although it satisfies the foreign-key constraint,it violates the primary-key constraint, thus the deletionoperation is disallowed.

ONDELETESETNULL

CREATE TABLE Enrolled(student_id CHAR(20),cid CHAR(20),grade CHAR(10),PRIMARY KEY (student_id, cid),FOREIGN KEY (student_id) REFERENCES Students(sid)ON DELETE SET NULL

)

student_id cid grade123 354 A123 454 A+156 354 A

33

Outline

• Single-table Queries• The SFW query• Other useful operators: DISTINCT, LIKE, ORDER BY• NULLs

• Multiple-table Queries• Foreign key constraints• Joins: basics• Joins: SQL semantics• SetOperators

34

Why do we havemultiple tables?

35

sid name gpa123 Mary 3.8

Students Enrolledstudent_id cid grade

123 354 A123 454 A+156 Mike 3.7156 354 A

EnrolledStudents

student_id name gpa cid grade123 Mary 3.8 354 A123 Mary 3.8 454 A+156 Mike 3.7 354 A

VS.

• Multiple tables• Data updating is easier (e.g., update Mary’s gpa to 3.9)• Queryingeachindividualtableis faster (e.g., retrieveMary’s gpa)

• Asingle table• Data exchange is easier (e.g., share your data withothers)• Avoid the cost of joining multiple tables (e.g., retrievalall the courses that Mary has taken)

36

Storedataintomultiple tablesvs.singletable

Joins

37

SELECT <columns>FROM <table name>WHERE <conditions>

SELECT <columns>FROM <table names>WHERE <conditions>

TheSFWqueryoverasingletable

TheSFWqueryovermultipletables

Whichrowsareyouinterestedin?

Whichrowsareyouinterestedin?

Howtojointhemultipletables?

Joins:Example

38

SELECT name

FROM Students, Enrolled

WHERE sid = student_id AND

cid = 354 AND grad = ‘A+’Whichrowsareyouinterestedin?

Howtojointhetwotables?

sid name gpa123 Mary 3.8

Students Enrolledstudent_id cid grade

123 354 A+123 454 A+156 Mike 3.7156 354 A

FindallstudentwhohavegotanA+in354;returntheirnamesandgpas

Otherwaystowritejoins

39

SELECT nameFROM Students, EnrolledWHERE sid = student_id AND

cid = 354 AND grad = ‘A+’

SELECT nameFROM StudentsJOIN Enrolled ON sid = student_id

AND cid = 354 AND grad = ‘A+’

SELECT nameFROM StudentsJOIN Enrolled ON sid = student_idWHERE cid = 354 AND grad = ‘A+’

The Need fo TupleVariable

40

sid name gpa123 Mary 3.8

Students Enrolledstudent_id cid name grade123 354 DBI A+123 454 DBII A+156 354 DBI A

156 Mike 3.7

SELECT name

FROM Students, Enrolled

WHERE sid = student_id

Whichname?

TupleVariable

41

sid name gpa123 Mary 3.8

Students Enrolledstudent_id cid name grade123 354 DBI A+123 454 DBII A+156 354 DBI A

156 Mike 3.7

SELECT Students.nameFROM Students, EnrolledWHERE sid = student_id

SELECT S.nameFROM Students S, EnrolledWHERE sid = student_id

Outline

• Single-table Queries• The SFW query• Other useful operators: DISTINCT, LIKE, ORDER BY• NULLs

• Multiple-table Queries• Foreign key constraints• Joins: basics• Joins: SQL semantics• SetOperators

42

Meaning(Semantics)ofJoinQueries

43

SELECT x1.a1, x1.a2, …, xn.akFROM R1 AS x1, R2 AS x2, …, Rn AS xnWHERE Conditions(x1,…, xn)

Answer={}for x1 in R1 do

for x2 in R2 do…..

for xn in Rn doif Conditions(x1,…,xn)

then Answer=AnswerÈ {(x1.a1,x1.a2,…,xn.ak)}return Answer

Thisiscallednestedloopsemantics sinceweareinterpretingwhatajoinmeansusinganestedloop

Note:this isamultisetunion

Threesteps

1. Takecrossproduct• R1 × R2 × … × Rn

2. Applyconditions• Conditions(x1,…, xn)

3. Applyprojections• x1.a1, x1.a2, …, xn.ak

44

SELECT x1.a1, x1.a2, …, xn.akFROM R1 AS x1, R2 AS x2, …, Rn AS xnWHERE Conditions(x1,…, xn)

Note:ThisisNOThowtheDBMSexecutesthequery.

Exercise

45

SELECT nameFROM Students, EnrolledWHERE sid = student_id AND grade >= ‘A’

sid name gpa123 Mary 3.8

Students Enrolledstudent_id cid grade

123 354 A+123 454 A+156 Mike 3.7156 354 A

nameMary

nameMaryMike

nameMaryMikeMary

nameMaryMaryMike

Whichone(s)arecorrect?

(A) (B) (C) (D)

Outline

• Single-table Queries• The SFW query• Other useful operators: DISTINCT, LIKE, ORDER BY• NULLs

• Multiple-table Queries• Foreign key constraints• Joins: basics• Joins: SQL semantics• SetOperators

46

SetOperations

• SQLsupportsunion,intersectionandsetdifferenceoperations• CalledUNION,INTERSECT,andEXCEPT• Theseoperationsmustbeperformedonunioncompatibletables

• AlthoughtheseoperationsaresupportedintheSQLstandard,implementationsmayvary• EXCEPTmaynotbeimplemented

• Whenitis,itissometimescalledMINUS

47

OneofTwoCourses

• Findallstudentswhohavetakeneither354or454

48

sid name gpa123 Mary 3.8

Students Enrolledstudent_id cid grade

123 354 A+123 454 A+156 Mike 3.7156 354 A

SELECT nameFROM Students, EnrolledWHERE sid = student_id AND (cid = 354 OR cid = 454)

OneofTwoCourses- UNION

• Findallstudentswhohavetakeneither354or454

49

sid name gpa123 Mary 3.8

Students Enrolledstudent_id cid grade

123 354 A+123 454 A+156 Mike 3.7156 354 A

SELECT nameFROM Students, EnrolledWHERE sid = student_id AND cid = 354UNIONSELECT nameFROM Students, EnrolledWHERE sid = student_id AND cid = 454

BothCourses

• Findallstudentswhohavetakenboth354and454

50

sid name gpa123 Mary 3.8

Students Enrolledstudent_id cid grade

123 354 A+123 454 A+156 Mike 3.7156 354 A

SELECT nameFROM Students S, Enrolled EWHERE sid = student_id AND

(E.cid = 354 AND E.cid = 454)

BothCoursesAgain

• Findallstudentswhohavetakenboth354and454

51

sid name gpa123 Mary 3.8

Students Enrolledstudent_id cid grade

123 354 A+123 454 A+156 Mike 3.7156 354 A

SELECT nameFROM Students S, Enrolled E1, Enrolled E2WHERE S.sid = E1.student_id AND S.sid = E2.student_id

AND (E1.cid = 354 AND E2.cid = 454)

BothCourses- INTERSECT

• Findallstudentswhohavetakenboth354and454

52

sid name gpa123 Mary 3.8

Students Enrolledstudent_id cid grade

123 354 A+123 454 A+156 Mike 3.7156 354 A

SELECT nameFROM Students, Enrolled WHERE sid = student_id AND cid = 354INTERSECTSELECT nameFROM Students, EnrolledWHERE sid = student_id AND cid = 454

OneCourseButNotTheOther

• Findallstudentswhohavetaken354butnot454

53

sid name gpa123 Mary 3.8

Students Enrolledstudent_id cid grade

123 354 A+123 454 A+156 Mike 3.7156 354 A

SELECT nameFROM Students, Enrolled WHERE sid = student_id AND cid = 354EXCEPTSELECT nameFROM Students, EnrolledWHERE sid = student_id AND cid = 454

SetOperationsandDuplicates

• UnlikeotherSQLoperations,UNION,INTERSECT,andEXCEPT querieseliminateduplicatesbydefault• SQLallowsduplicatestoberetainedinthesethreeoperationsusingtheALL keyword(i.e.,multi-setoperations)

54

SELECT nameFROM Students, Enrolled WHERE sid = student_id AND cid = 354INTERSECT ALLSELECT nameFROM Students, EnrolledWHERE sid = student_id AND cid = 454

Acknowledge

• Some lecture slides were copied from or inspired by thefollowing course materials• “W4111: Introduction to databases” by Eugene Wu atColumbia University• “CSE344: IntroductiontoDataManagement” by Dan Suciu atUniversityof Washington• “CMPT354: Database System I” by JohnEdgar at Simon FraserUniversity• “CS186: Introduction to Database Systems” by Joe Hellersteinat UC Berkeley• “CS145: IntroductiontoDatabases” by Peter Bailis at Stanford• “CS348:IntroductiontoDatabaseManagement” by GrantWeddell at University of Waterloo

55

Recommended