View
214
Download
1
Embed Size (px)
Citation preview
The Structured Query Language
Zachary G. IvesUniversity of Pennsylvania
CIS 550 – Database & Information Systems
September 27, 2005
Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan
2
Administrivia
Homework 2 handed out today Due in 1 week, 10/4
Please sign up for Oracle accounts ASAP http://www.seas.upenn.edu/ora/ You can do the homework without this, but:
Will be useful in testing your HW2’s! You’ll need to get familiar with Oracle anyway
3
Recall Basic SQL
SELECT [DISTINCT] {T1.attrib, …, T2.attrib}FROM {relation} T1, {relation} T2, …WHERE {predicates}
SELECT * All STUDENTs
AS As a “range variable” (tuple variable): optional As an attribute rename operator
select-list
from-list
qualification
4
Expressions in SQL
Can do computation over scalars (int, real or string) in the select-list or the qualification Show all student IDs decremented by 1
Strings: Fixed (CHAR(x)) or variable length (VARCHAR(x)) Use single quotes: ’A string’ Special comparison operator: LIKE Not equal: <>
Typecasting: CAST(S.sid AS VARCHAR(255))
5
Set Operations
Set operations default to set semantics, not bag semantics:(SELECT … FROM … WHERE …){op}(SELECT … FROM … WHERE …)
Where op is one of: UNION INTERSECT, MINUS/EXCEPT
(many DBs don’t support these last ones!)
Bag semantics: ALL
7
Revised Example Data Instance
sid name
1 Jill
2 Qun
3 Nitin
4 Marty
fid name
1 Ives
2 Saul
8 Martin
sid exp-grade
cid
1 A 550-0105
1 A 700-1005
3 A 700-1005
3 C 501-0105
4 C 501-0105
cid subj sem
550-0105 DB F05
700-1005 AI S05
501-0105 Arch F05
555-1006 Sys S06
fid cid
1 550-0105
2 700-1005
8 501-0105
STUDENT Takes COURSE
PROFESSOR Teaches
8
Nested Queries in SQL
Simplest: IN/NOT IN
Example: Students who have taken subjects that have (at any point) been taught by Martin
9
Correlated Subqueries
Most common: EXISTS/NOT EXISTS Find all students who have taken DB but not AI
10
Universal and Existential Quantification
Generally used with subqueries: {op} ANY, {op} ALL Find the students with the best expected
grades
11
Table Expressions
Can substitute a subquery for any relation in the FROM clause:
SELECT S.sidFROM (SELECT sid FROM STUDENT WHERE sid = 5) SWHERE S.sid = 4
Notice that we can actually simplify this query!
What is this equivalent to?
12
Aggregation
GROUP BY
SELECT {group-attribs}, {aggregate-operator}(attrib)FROM {relation} T1, {relation} T2, …WHERE {predicates}GROUP BY {group-list}
Aggregate operators AVG, COUNT, SUM, MAX, MIN DISTINCT keyword for AVG, COUNT, SUM
13
Some Examples
Number of students in each course offering
Number of different grades expected for each course offering
Number of (distinct) students taking AI courses
14
Data Instance, Again
sid name
1 Jill
2 Qun
3 Nitin
4 Marty
fid name
1 Ives
2 Saul
8 Martin
sid exp-grade
cid
1 A 550-0105
1 A 700-1005
3 A 700-1005
3 C 501-0105
4 C 501-0105
cid subj sem
550-0105 DB F05
700-1005 AI S05
501-0105 Arch F05
555-1006 Sys S06
fid cid
1 550-0105
2 700-1005
8 501-0105
STUDENT Takes COURSE
PROFESSOR Teaches
15
What If You Want to Only ShowSome Groups?
The HAVING clause lets you do a selection based on an aggregate (there must be 1 value per group):
SELECT C.subj, COUNT(S.sid)FROM STUDENT S, Takes T, COURSE CWHERE S.sid = T.sid AND T.cid = C.cidGROUP BY subjHAVING COUNT(S.sid) > 5
Exercise: For each subject taught by at least two professors, list the minimum expected grade
16
Aggregation and Table Expressions(aka Derived Relations)
Sometimes need to compute results over the results of a previous aggregation:
SELECT subj, AVG(size)FROM (
SELECT C.cid AS id, C.subj AS subj, COUNT(S.sid) AS sizeFROM STUDENT S, Takes T, COURSE CWHERE S.sid = T.sid AND T.cid =
C.cidGROUP BY cid, subj)
GROUP BY subj
17
Thought Exercise…
Tables are great, but… Not everyone is uniform – I may have a cell
phone but not a fax We may simply be missing certain information We may be unsure about values
How do we handle these things?
18
One Answer: Null Values
We designate a special “null” value to represent “unknown” or “N/A”
But a question: what does:
do?
Name
Home Fax
Sam 123-4567
NULL
Li 234-8972
234-8766
Maria
789-2312
789-2121SELECT * FROM CONTACT WHERE Fax < “789-1111”
19
Three-State Logic
Need ways to evaluate boolean expressions and have the result be “unknown” (or T/F)
Need ways of composing these three-state expressions using AND, OR, NOT:
Can also test for null-ness: attr IS NULL, attr IS NOT NULL
Finally: need rules for arithmetic, aggregation
T AND U = UF AND U = FU AND U = U
T OR U = TF OR U = UU OR U = U
NOT U = U
20
Nulls and Joins
Sometimes need special variations of joins: I want to see all courses and their students … But what if there’s a course with no
students?
Outer join: Most common is left outer join:
SELECT C.subj, C.cid, T.sid FROM COURSE C LEFT OUTER JOIN Takes T ON C.cid = T.cidWHERE …
21
Data Instance, Again (!)
sid name
1 Jill
2 Qun
3 Nitin
4 Marty
fid name
1 Ives
2 Saul
8 Martin
sid exp-grade
cid
1 A 550-0105
1 A 700-1005
3 A 700-1005
3 C 501-0105
4 C 501-0105
cid subj sem
550-0105 DB F05
700-1005 AI S05
501-0105 Arch F05
555-1006 Sys S06
fid cid
1 550-0105
2 700-1005
8 501-0105
STUDENT Takes COURSE
PROFESSOR Teaches
22
Warning on Outer Join
Oracle doesn’t support standard SQL syntax here:
SELECT C.subj, C.cid, T.sid FROM COURSE C , Takes T WHERE C.cid =(+) T.cid
23
Beyond Null
Can have much more complex ideas of incomplete or approximate information Probabilistic models (tuple 80% likely to be an
answer) Naïve tables (can have variables instead of
NULLs) Conditional tables (tuple IF some condition holds)
… And what if you want “0 or more”? In relational databases, create a new table and
foreign key But can have semistructured data (like XML)
24
Modifying the Database:Inserting Data
Inserting a new literal tuple is easy, if wordy:
INSERT INTO PROFESSOR (fid, name)VALUES (4, ‘Simpson’)
But we can also insert the results of a query!
INSERT INTO PROFESSOR (fid, name) SELECT sid AS fid, name FROM STUDENT WHERE sid < 20
26
Updating Tuples
What kinds of updates might you want to do?
UPDATE STUDENT SSET S.sid = 1 + S.sid, S.name = ‘Janet’WHERE S.name = ‘Jane’
27
Now, How Do I Talk to the DB?
Generally, apps are in a different (“host”) language with embedded SQL statements Static: SQLJ, embedded SQL in C Runtime: ODBC, JDBC, ADO, OLE DB, …
Typically, predefined mappings between host language types and SQL types (e.g., VARCHAR string or char[])
28
Embedded SQL in C
EXEC SQL BEGIN DECLARE SECTION int sid; char name[20];EXEC SQL END DECLARE SECTION…EXEC SQL INSERT INTO STUDENT VALUES (:sid, :name);
EXEC SQL SELECT name, ageINTO :sid, :nameFROM STUDENTWHERE sid < 20
29
The Impedance Mismatch and Cursors
SQL is set-oriented – it returns relations There’s no relation type in most languages! Solution: cursor that’s opened, read
DECLARE sinfo CURSOR FOR SELECT sid, name FROM STUDENT…OPEN sinfo;while (…) { FETCH sinfo INTO :sid, :name …}CLOSE sinfo;
30
JDBC: Dynamic SQL
Roughly speaking, a Java version of ODBC See Chapter 6 of the text for more info
import java.sql.*;Connection conn = DriverManager.getConnection(…);PreparedStatement stmt =
conn.prepareStatement(“SELECT * FROM STUDENT”);…ResultSet rs = stmt.executeQuery ();while (rs.next()) {
sid = rs.getInteger(1);…
}
31
Database-Backed Web Sites We all know traditional static HTML web
sites:Web-Browser
HTTP-Request
GET ...
Web-Server
File-System
Load File
HTML-File
HTML-File
32
Common Gateway Interface (CGI)
Can have the web server invoke code (with parameters) to generate HTML
Web ServerHTTP-Request
HTML-File
Web Server
File-SystemLoad File
FileHTML?
HTML
Execute Program
Program?Output
I/O, Network, DB
33
CGI: Discussion
Advantages: Standardized: works for every web-server, browser Flexible: Any language (C++, Perl, Java, …) can be
used
Disadvantages: Statelessness: query-by-query approach Inefficient: new process forked for every request Security: CGI programmer is responsible for security Updates: To update layout, one has to be a
programmer
34
Java-Server-Process
DB Access in Java
Sybase
Java Applet
TCP/UDP
IP
Oracle ...
JDBC-Driver
JDBC-Driver
JDBC-Driver
JDBC Driver manager
BrowserJVM
35
Java Applets: Discussion
Advantages: Can take advantage of client processing Platform independent – assuming standard java
Disadvantages: Requires JVM on client; self-contained Inefficient: loading can take a long time ... Resource intensive: Client needs to be state of the
art Restrictive: can only connect to server where
applet was loaded from (for security … can be configured)
36
*SP Server Pages and Servlets(IIS, Tomcat, …)
File-SystemWeb Server
HTTP Request
HTML File
Web Server
Load File
FileHTML?
HTML
I/O, Network, DB
Script/Servlet?
Output
Server Extension
May have a built-in VM (JVM, CLR)
37
DB-Driven Web Server
One Step Beyond: DB-Driven Web Sites (Strudel, Cocoon, …)
LocalDatabase
HTTP Request
HTML File
Web Server
Cache
Data
HTML
Other datasources
Script?
DynamicHTML
Generation
Styles
38
Wrapping Up
We’ve seen how to query in SQL (DML) Basic foundation is TRC-based Subqueries and aggregation add extra power beyond
*RC Nulls and outer joins add flexibility of representation We can update tables
We’ve also seen that SQL doesn’t precisely match standard host language semantics Embedded SQL Dynamic SQL
We’ve seen a hint of data-driven web site architectures