Upload
syed-asrarali
View
395
Download
0
Tags:
Embed Size (px)
Citation preview
Introduction To SQLUnit 4
Modern Business Technology
Introduction To TSQLUnit 4
Developed by
Michael Hotek
Unit 4
Goals• Primary keys• Foreign keys• Joining tables• Sub-selects• Advantages/disadvantages of joins
and sub-selects
Relationships
• A database derives its usefulness from containing a group of tables that have some relationship to each other
• An entity is a person, place, or thing of importance to an organization
• An entity generally becomes a table
• Relationships are the connections between tables
• Relationships are usually implemented as keys in a database design
Relationships
• For example take the titles and publishers table
• Every publisher publishes a title
• And every title has a publisher
• This relationship is implemented by means of the key pub_id
Relationships
• Relationships come in three different varieties
• One to one– One row in a table is related to exactly
one row in another table
• One to many– One row in a table is related to one or
more rows in another table
• Many to many– Many rows in a table are related to one
or more rows in another table
Relationships
• A many to many relationship is extremely poor database design
• This type of relationship can cause a large amount of confusion
• The problem is that many to many relationships do exist and must be stored in a database
• This is usually resolved into multiple one to many relationships also known as an intersection table
Relationships
• An intersection table is an artificial construct that is commonly used in RDBMSs
• It does not have any physical meaning, but instead serves to break up a many to many relationship
• The titleauthor table is an example of an intersection table
Relationships
• Relationships are implemented in a database as keys
• Keys are a logical construct; they are not a physical part of the database
• This means that a key does not represent any physically quantifiable item
• You will generally see numbers used as keys in a database (au_id, pub_id, stor_id)
Primary Key
• A primary key is a special type of key that consists of one or more columns that uniquely identify a row
• Primary keys must be unique and can not contain null values
• A table will only have one primary key
• A primary key will reside on the one side of a 1 - N relationship
Primary Key
• pub_id is the primary key for the publishers table
• This will uniquely identify each publisher in the table
• We do not use the publisher's name, because this could be the same as another publisher
• Also, it is easy to control data input to ensure it is valid
• It is much easier to check 4 digits than 40 characters. Also a name can be null.
Foreign Key
• A foreign key is one or more columns that refer to a primary key of another table
• pub_id is the primary key of publishers
• pub_id is a foreign key in titles
• Example:
A publisher can publish many titles, but a title must have a publisher
This relationship is shown by the primary key of the publishers table being stored with the title that publisher published.
Composite Keys
• A primary key and a foreign key can consist of more than one column
• When a key contains more than one column, it is known as a composite key
• The primary key of the titleauthor table is a composite (au_id,title_id)
Indexes
• A discussion of indexes is well beyond the scope of this course, there are a few naming items that can be noted
• Keys will be implemented in a database as an object called an index
• An index could be a primary key, a foreign key, or neither
• A primary key is the same as a primary index or unique index
• A foreign key is the same as a foreign index
Joins
• Up to this point we have confined our queries to a single table
• While this is done primarily for retrieving data into transaction processing applications, it doesn't represent a real world application of data querying
• All of the data in a database is segmented into tables and we generally need data from more than one table to show what we need
• To accomplish this, we use a join
Joins
• You will notice that there is no such thing as a join clause in our SQL syntax
• A join is simply a where clause
• A join is generally constructed between one primary key and another primary key or between a primary key and a foreign key
• (Discussion of PK/FK symbols on the ER Diagram)
Joins
• Suppose we want to view a list of sales for each store
• We could simply do the following:
select * from sales
stor_id ord_num ord_date qty payterms...
------- ------------ ---------------------- ------ --------
6380 6871 Sep 14 1994 12:00AM 5 Net 60...
6380 722a Sep 13 1994 12:00AM 3 Net 60...
7066 A2976 May 24 1993 12:00AM 50 Net 30...
7066 QA7442.3 Sep 13 1994 12:00AM 75 ON invoice
7067 D4482 Sep 14 1994 12:00AM 10 Net 60...
7067 P2121 Jun 15 1992 12:00AM 40 Net 30...
7067 P2121 Jun 15 1992 12:00AM 20 Net 30...
7067 P2121 Jun 15 1992 12:00AM 20 Net 30...
7131 N914008 Sep 14 1994 12:00AM 20 Net 30...
7131 N914014 Sep 14 1994 12:00AM 25 Net 30...
7131 P3087a May 29 1993 12:00AM 20 Net 60...
...
(22 row(s) affected)
• But, the stor_id is meaningless to us
Joins
• What we want to see is the store name, city, and state along with the sales for each order
select stor_name,ord_num,qty from stores,sales where stores.stor_id = sales.stor_id
stor_name ord_num qty
---------------------------------------- -------------------- ------
Eric the Read Books 6871 5
Eric the Read Books 722a 3
Barnum's A2976 50
Barnum's QA7442.3 75
News & Brews D4482 10
News & Brews P2121 40
News & Brews P2121 20
News & Brews P2121 20
Doc-U-Mat: Quality Laundry and Books N914008 20
Doc-U-Mat: Quality Laundry and Books N914014 25
Doc-U-Mat: Quality Laundry and Books P3087a 20
Doc-U-Mat: Quality Laundry and Books P3087a 25
Doc-U-Mat: Quality Laundry and Books P3087a 15
Doc-U-Mat: Quality Laundry and Books P3087a 25
Fricative Bookshop QQ2299 15
Fricative Bookshop TQ456 10
Fricative Bookshop X999 35
Bookbeat 423LL922 15
...
(22 row(s) affected)
Joins
• What does this mean?
• The select clause simply designates which columns we want to see. If we were retrieving a column that had the same name in the two tables, we would have to specify which table the data was coming from
select stores.stor_id,stor_name, ord_num, qty
from stores,sales
where stores.stor_id = sales.stor_id
Joins
• From clause
• We are retrieving data from more than one table, so each table must be specified in the from clause
• The from clause can be seen as the main driver of a SQL statement
• If the table isn’t in the from clause, none of it's columns can be used in any other clause
Joins
• The where clause
where stores.stor_id = sales.stor_id
• This tells the DBMS to take the first store ID in the stores table and add the data from the corresponding store ID in the sales table to the data retrieved from the stores table
• It then continues with the second store ID, etc. until it reaches the end of the table
Joins
• A join can be seen as a special type of selection criteria.
• If there is a stor_id in the stores table that does not exist in the sales table, the data for that particular store will not be returned
• You can also add additional selection criteria
select stores.stor_id,stor_name,city, state,ord_num,qty
from stores,sales
where stores.stor_id = sales.stor_id and state = 'CA'
Joins
• As we have seen, you can specify just those columns that you want to see
• For instance we are just concerned with the quantity of sales in CA
select sum(qty) from stores,sales where stores.stor_id = sales.stor_id and state = 'CA'
-----------
275
(1 row(s) affected)
• The type of join we have examined so far is also referred to as an equi-join or an inner join
• In the case of stores and sales, we could have a store that doesn't have any sales
• If we use the equi-join, we will not see these stores that do not have any sales
• So, how do we get the list of sales for all stores regardless of whether they have any sales
Joins
Outer Joins
• We accomplish this via an outer join
select stores.stor_id,stor_name, ord_num, qty from stores,sales where stores.stor_id *= sales.stor_id
stor_id stor_name ord_num qty
------- ------------------------------------ -------------------- ----
6380 Eric the Read Books 6871 5
6380 Eric the Read Books 722a 3
7066 Barnum's A2976 50
7066 Barnum's QA7442.3 75
7067 News & Brews D4482 10
7067 News & Brews P2121 40
7067 News & Brews P2121 20
7067 News & Brews P2121 20
7131 Doc-U-Mat: Quality Laundry and Books N914008 20
7131 Doc-U-Mat: Quality Laundry and Books N914014 25
7131 Doc-U-Mat: Quality Laundry and Books P3087a 20
7131 Doc-U-Mat: Quality Laundry and Books P3087a 25
7131 Doc-U-Mat: Quality Laundry and Books P3087a 15
7131 Doc-U-Mat: Quality Laundry and Books P3087a 25
7896 Fricative Bookshop QQ2299 15
7896 Fricative Bookshop TQ456 10
7896 Fricative Bookshop X999 35
8042 Bookbeat 423LL922 15
8042 Bookbeat 423LL930 10
8042 Bookbeat 756756 5
8042 Bookbeat P723 25
8042 Bookbeat QA879.1 30
(22 row(s) affected)
Outer Joins
• Notice the use of the asterisk (*)
where stores.stor_id *= sales.stor_id
• This tells the DBMS to return all of the rows in the stores table with the corresponding data in the sales table and do not drop any store IDs that are not in the sales table
• Note: The use of an asterisk to designate an outer join is used in SQL Server (Sybase and MS) most other DBMSs support a slightly different syntax as does the ANSI-92 standard
Outer Joins
• Outer joins come in three different flavors– Left– Right– Full
• A left outer join is the same thing as a right outer join except for the order
• Left: stores.stor_id *= sales.stor_id• Right: sales.stor_id =* stores.stor_id
• Every left outer join can also be expressed as a right outer join and vice versa
Full Outer Join
• A full outer join is included here for completeness
• You should use a full outer join ONLY under very specific circumstances
• A full outer join will produce a cross product of the two tables
• If you have one table with 100 rows and another with 1000 rows, a full outer join will produce a result set of 100,000 rows
Full Outer Join
• This is because with a full outer join, you are telling the database to give every combination of rows possible
• i.e. Each row is matched to every row in the other table
• This type is query will almost never be preformed and should be avoided at all costs.
• The first time you inadvertently fire one of these off, you will get a rather angry call from your DBA
Subqueries
• Subqueries are simply a SQL statement nested inside of another SQL statement
• The most common place to do this is in a where or having clause.
select [distinct] select_list
from table_list
where {expression {[not] in | comparison [any|all]}|[not] exists}
(select [distinct] subquery_select_list from table_list where conditions)
[group by group_by_list]
[having conditions]
[order by order_by_list]
Subqueries
• Subqueries come in two basic kinds: correlated and noncorrelated
• A noncorrelated subquery is one in which the inner query is independent, gets evaluated first, and passes it’s result set back to the outer query
• A correlated subquery is one in which the inner query is dependent upon the results from the outer query
Subqueries
• Below are examples of these two kinds
• noncorrelated:select pub_name from publishers
where pub_id in (select pub_id from titles
where type = 'business')
• correlated:select pub_name from publishers p
where 'business' in (select type from titles where oub_id = p.pub_id)
• As is the case with most of the subqueries, you can also express them as a join
Subqueries
• Subqueries also come in three different types:
• They return zero or more items• They return exactly one item• They test for existence of a value
• If you have a subquery of the first type it must be preceeded by an IN. where column = (select…) will return an error if the subquery returns more than one item
Noncorrelated Subqueries
• At a conceptual level, a noncorrelated subquery is executed in two parts.
• First the inner query is executed
• It then passes its results back to the outer query which then finds the rows that match the list passed back
• The column names are resolved implicitly based upon the from clause of the corresponding query. You can always explicitly define the table name.
• This is recommended for complex subqueries
Correlated Subqueries
• Processing of a correlated subquery is much more complicated, but these can handle queries you can't easily do with noncorrelated subqueries or joins
• A correlated subquery depends on data from the outer query
• The inner query will execute once for each row in the outer query
Correlated Subqueries
• The outer query retrieves the first row of data and passes the data values to the inner query
• The inner query finds all rows that match the data passed from the outer query
• Finally the rows from the inner query are checked against the conditions in the outer query
• If one or more rows match the conditions, the data corresponding to that row will be returned to the user
Joins or Subqueries
select distinct pub_name from publishers, authors where publishers.city = authors.city
ANDselect pub_name from publishers where city in
(select city from authors)
will return the same results
• But if you want data from both the publishers and authors tables, you must use a join.
select pub_name,au_fname,au_lname
from publishers,authors
where publishers.city = authors.city
Joins or Subqueries
select au_lname,au_fname,city from authors where city in (select city from authors where au_fname = 'Dick' and au_lname = 'Straight')
can also be expressed asselect au_lname,au_fname,city from authors
a1, authors a2 where a1.city = a1.city and a2.au_fname = 'Dick' and a2.au_lname = 'Straight'
• This is referred to as a self join
Joins or Subqueries
• Whether you use joins or subqueries is usually a matter of choice
• Most joins can be expressed as subqueries and vice versa
• Calculating an aggregate and using this in the selection criteria is an advantage of subqueries
select title,price from titles
where price = (select min(price) from titles)
• Displaying data from multiple tables is usually done with a join
Common Restrictions
• The select list of a inner query introduced by an IN can have only one column. This column must also be join compatible with the column in the where clause of the outer query
• Subqueries introduced by an unmodified comparison operator (not followed by ANY or ALL) can not include a group by or having clause unless this will force the inner query to return a single value
• Subqueries can not manipulate their results internally. i.e. They can not contain an order by or the keyword INTO
Any and All
• You use the ANY and ALL keywords with a comparison operator in a subquery
• > ALL means greater than every value in the results of the inner query (> maximum value)
• > ANY means greater than any value in the results of the inner query (> minimum value)
Any and All
ALL Results ANY Results
> all (1,2,3) > 3 >any (1,2,3) > 1
< all (1,2,3) < 1 < any (1,2,3) < 3
= all (1,2,3) = 1 and =2 and =3 = any (1,2,3) =1 or =2 or =3
select title from titles where advance > all
(select advance from publishers,titles where titles.pub_id = publishers.pub_id and pub_name = 'New Age Books')
title
--------------------------------------------------------------------------
The Busy Executive's Database Guide
Cooking with Computers: Surreptitious Balance Sheets
You Can Combat Computer Stress!
Straight Talk About Computers
Silicon Valley Gastronomic Treats
The Gourmet Microwave
The Psychology of Computer Cooking
But Is It User Friendly?
Secrets of Silicon Valley
Net Etiquette
Computer Phobic AND Non-Phobic Individuals: Behavior Variations
Is Anger the Enemy?
Life Without Fear
Prolonged Data Deprivation: Four Case Studies
...
(18 row(s) affected)
Exists
• The last type of subquery is used to test for the existence of something
• To find all of the publishers who publish business books we would do the following:
select distinct pub_name from publishers where exists (select 1 from titles where pub_id = publishers.pub_id and type = 'business')
pub_name
----------------------------------------
Algodata Infosystems
New Moon Books
(2 row(s) affected)
Additional Restrictions
• A subquery that test for existence will either contain one column or an asterisk in the select list. It makes no sense to include a column list, because this type of query simply tests to see if a row exists and does not return any data
• A subquery may contain another subquery
• In fact you can nest as many levels as you need. However, for most applications more than four levels is an indication of poor database design
select au_lname,au_fname from authors where au_id in (select au_id from titleauthors where title_id in (select title_id from titles where type = 'popular_comp'))
• This will return the list of authors who have written at least one popular computing book
Nesting Subqueries
Unit 4 Review
• A relationships are connections between tables
• A primary key is that set of columns that define a unique row
• You can have only one primary key per table• A foreign key is one or more columns that
refer to a primary key of the same or another table
• You can have up to 255 foreign keys per table
• A composite key consists of more than one column
• Joins are used when you need to retrieve data from more than one table
• The two kinds of joins are: equijoin and outer join
• An outer join comes in three flavors: left, right, full
• A full outer join will produce the cross product of the two tables
• Subqueries are nested SQL statements
Unit 4 Review
• Subqueries come in two kinds: correlated and noncorrelated
• Subqueries are also of three different types: return exactly one item, return zero or more items, and test for existence
• Most joins can be written as a subquery and vice versa
• A subquery is used when you need to include an aggregate in the where conditions
• A join is used when you want to retrieve data from more than one table
• All means every value (> all means greater than every value)
• Any means at least one value (> any means greater than at least one value)
• Exists allows us to test to see if a value exists
• Exists queries are used with correlated subqueries
• Subqueries can be nested any number of levels
Unit 4 Exercises
• Time allotted for exercises is 1 hour