Introduction to SQLBen Smith
Washington State University
SQL is used in a lot of places
Big database servers:
SQL Server, MySQL, Oracle, DB2
But programs can also connect to those servers:
SAS, Python, R
Four Examples
SQLite in Firefox
MySQL & SQLite in R (Omitted)
MS SQL (using ODBC) in SAS
MS SQL Server Manager Studio
Let’s talk about data types
Data types
Chars, Varchars, Text (strings)
Ints, Floats (binary numbers)
Decimal (base 10 number)
I’m proposing there is in fact only one datatype
This is what data really looks like
0011 0 1010010001000100
Op Code
Redirect Bit
Memory Address
http://goo.gl/9nZ9C
Memory really looks like this
0 0 0 0 0 1 1 0 0 1 1 1 1 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 0 1 0 1 0 1 1 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1 1 1 1 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 1 1
Data types, really
If we group 8 bits together we can represent 255 different things, let’s say we map those to characters of the alphabet
Using this method a bunch of “bytes” (8 bit groups), make up a “string”
Floats (binary decimal numbers) can be represented by taking 1 bit to represent the sign, some number of bits to represent the exponent (e.g. 8) and the rest to represent the fraction (e.g. 23)
Decimal Data TypeJust like 1/3 can’t be perfectly defined in base 10, there are numbers in base 2 that can’t be perfectly defined
The decimal data type solves this issue by storing each individual digit in multiple bits
About 100 times slower than float
So what is a query
It’s a question with three parts:
What do I want
Where is it located
Under what conditions
SyntaxSELECT column, ...
FROM table
... JOIN table ON ...
WHERE column = VALUE
OR column LIKE ‘VALUE’
AND column >= VALUE OR column IN(...)
Think about Joins as Merges
That is you are executing on each table independently then merging the results
Demo
Considering Complicate Conditions
Conditions can be embedded, just like math
ExampleWHERE
(
(student.degree_program_1_major_code IN (@majorone,@majortwo, @majorthree, @majorfour) AND student.[degree_program_1_level_code]=@levelcode AND student.degree_program_1_obj_start_date>bCensus.date AND student.degree_program_1_obj_start_date<=aCensus.date AND student.[center_1_code]=@center_code)
OR
(student.degree_program_2_major_code IN (@majorone,@majortwo, @majorthree, @majorfour) AND student.[degree_program_2_level_code]=@levelcode AND student.degree_program_2_obj_start_date>bCensus.date AND student.degree_program_2_obj_start_date<=aCensus.date AND student.[center_2_code]=@center_code)
)
AND student.enrollment_status_code=3 AND student.total_credits>0 AND student.term_code=CAST((CAST((@myyear+1) AS char(4))+'3') AS INT) AND student.class_standing_code=6
Demo
Groups
So you are always getting a set of results THEN grouping them
Aggregation functions work WITH the group
Example
SELECT COUNT(DISTINCT emplid) AS c,
acad_prog, sex
...
GROUP BY acad_prog, sex
Example
SELECT COUNT(DISTINCT emplid) AS c,
acad_prog, sex, MAX(term_gpa)
...
GROUP BY acad_prog, sex
Demo
Subquery
It is a query inside of a query
It can exist anywhere in the query
Can be slow method, if an approach with Joins is available, do that
ExampleSELECT emplid, (
SELECT TOP 1 gpa
FROM student WHERE s.emplid=emplid
ORDER BY gpa DESC
)
FROM student AS s
Demo