58
Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Embed Size (px)

Citation preview

Page 1: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Using Relational Databases and SQL

John HurleyDepartment of Computer Science

California State University, Los Angeles

Lecture 1Introduction

Page 2: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Introduction

John Hurley

Call me John, especially outside class.

If that’s too informal for you, you can call me “Instructor”

[email protected]

xxx6aTWOb VI Xxx4cATEd

d7xxx8eONEf5gFORE!hij

(text preferred)

Office hours listed on course page. I will often be in A-310A (inside A-310) at other times, too.

Page 3: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Attendance

Page 4: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

AdministrativeCourse page:

http://www.calstatela.edu/faculty/jhurley2/classes/cs122

Syllabus

Software download links

Assignment dates

Page 5: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

GradingGrading: A, B, C, (with + and -), NC.

If you are an undergraduate and don’t get a C or better, you get an NC

If you are a graduate student and don’t get a B or better, you get an NC

See the grading scale on the syllabus; no curve

In past terms, I have assigned all grades from A to C as well as NC in this class. Median grade is usually B or B+, which is lower than the median grade in my CS120 sections.

Page 6: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Deadbeats Will Fail!About 10% of the all the course grades I have ever given in CS122 were NCs. Everyone to whom I have ever given an NC missed significant portions of the coursework.

If you decide not to take the class, drop it yourself. Don’t expect me to drop you!

I can’t drop anyone after the no-record drop deadline

You will have your midterm grades before the drop-with- W deadline

Page 7: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Labs

Assignments are in text files linked from the course web page

Turn in on CSNS

Posted before the weekly lecture

Part A usually due at the end of the week’s lab class

Part B usually due before the following week’s lecture

Let me know *in advance* if you won’t be able to attend a lab for some good reason

I may give quizzes towards the end of lab periods if attendance is poor.

Last labs before the midterm and final will be ungraded, very realistic practice exams.

Page 8: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Assignments

Labs and exams will contain the following types of questions:

short answers and multiple choice

1-paragraph answers

SQL to English

English to SQL

Page 9: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Quizzes

Quizzes will be administered either during lectures or labs

Quizzes usually unannounced but open-notes

I may give one pre-announced closed-book quiz which requires you to memorize a few very important definitions

No makeups unless you provide a satisfactory explanation *in advance* for why you won’t be in class.

Page 10: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Cheating

You may discuss general material about databases and the techniques taught in this class with other students

You may give or receive help understanding assignments and debugging work

Page 11: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Cheating

You may copy examples from the lecture notes and then change them to meet assignment requirements.

Working programmers often solve problems in similar ways.

Other instructors may not allow this. I am only saying that it is OK and expected in my sections of CS122.

You may not directly use language from the lecture notes to answer short-answer questions; restate the answers in your own words. This is difficult to do; that’s the point.

Page 12: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Cheating; Copying

There are grey areas in cheating in CS, but presenting an answer that is copied directly from any source other than your brain is always over the line.

You may not copy code from other students or allow anyone to copy your code.

Few to none of my assignment questions are taken from the textbook or other sources, so don’t bother copying published solutions to the textbook exercises.

If you copy code posted by past students, you will likely do poorly anyway, because I change many questions slightly each term.

Page 13: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Cheating on LAB WORKOK on lab work:

Copying examples from the lecture notes and modifying them to fulfill the assignment

Examples of legitimate help for other students:

“The problem with your query is that you forgot to write the join condition”

“That isn’t working because on a Mac the table name is case sensitive”

“You accidentally copied a character from Powerpoint that is invalid in mySQL”

“You need to use a float instead of an int for that field because the values might not be an integer”

Cheating on lab work:

Copying code from other students or internet sources

Copying text from other students for short-answer or essay questions

Copying text from the internet or a book for short-answer and essay questions

Page 14: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Cheating on Exams and QuizzesOK on exams and open-book quizzes

Consulting lecture notes, textbooks, your own notes

Checking Wikipedia or other internet sources that do not involve real-time communication with human beings

Copying examples from the lecture notes and modifying them to answer the questions

Cheating on exams and quizzes:

Copying code or text from other students or internet sources

Answering short-answer questions with direct quotes form the notes (restate them in your own words!)

Communicating with any human being other than me via email, chat, phone, or any other means

Page 15: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Cheating DetectionIt is obvious to me when students answer short-answer questions with text copied from professional-level sources like Wikipedia and textbooks.

Even for SQL code, there are only a few correct answers to each question using the material we cover. However, if you copy answers from other students, you will sooner or later copy an identifiable incorrect answer or trip up in some other way.

I will be comparing all students’ lab and exam papers using an automatic tool designed to detect copying. I developed this application specifically to detect cheating in CS122!

If I do detect copying, I will penalize all students involved equally. If you understand the material, it is foolish to take this risk by letting other students copy your work.

People who do well on labs but poorly on exams and quizzes receive careful scrutiny!

Page 16: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Using Relational Databases and SQL

Part IDatabases

Page 17: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Database Definition

Data (information) + base (foundation)

A database is a structured collection of persistent data.

Structured: organized according to a set of rules. In this case, organized according to a database model.

Persistent: stored in permanent storage, not just RAM. If you shut down the application or the power goes off, the data is not lost.

Page 18: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Database Definition

Many definitions are like this one:

A collection of data, typically modelling the activities of one or more related organizations (Ramkrishnan and Gherke, Database Management Systems.)

I don’t like this definition, because databases don’t always model anything in particular. Database designers don’t always know what the data will be used for.

Page 19: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

What is a Database?

Structured using a database model

No database model, no database!

Often, not always, used to model organizational activities

Examples:

Companies

Stores

Universities

Page 20: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Database SkillsDatabase skills are foundational in CS

The great majority of modern applications use databases to store information

You will put these skills together with your OOP programming skills a little later if you are an undergraduate, very soon if you are a grad student

As a working software engineer, you will probably use the skills you learn in this class every day

Page 21: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Database SkillsSome applications you are familiar with that rely heavily on large databases:

Wikipedia

GET

Amazon.com

ITunes

Page 22: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

TablesUser can add and remove tables, get information from them, update or delete information in them, change them

These are the skills we will study in this class

Page 23: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Database Background

Storage was bulky, expensive, and slow in the old days!

Page 24: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Physical/Logical Separation

Previous to the inventions of DBMS, one had to write a program that traversed pointers at the physical level to extract data from a database

By abstracting the physical level and writing a program at the logical level instead, extracting data from a database became much easier

Page 25: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Database ModelsThe Two Levels of a Database Model

Physical Level (how data is stored)

The things we don’t have to worry about

Logical Level (how data is organized)

The things we do care about

The Basic Models

Hierarchical Model (IBM’s IMS) represented data as a tree

Network Model (CODASYL)

Relational Model (ALPHA, SEQUEL)

Page 26: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Hierarchical Database Model

Page 27: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Hierarchical Database Model

Example of a query to retrieve info:

for book in (get_children("Programming/J.Smith”))

print book.field("Title"), book.field("Publisher")

Mostly superseded by relational model

Has an afterlife with XML

Page 28: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

XML Data<CATALOG>

<CD>

<TITLE>When a Man Loves A Woman</TITLE>

<ARTIST>Percy Sledge</ARTIST>

<COUNTRY>USA</COUNTRY>

<COMPANY>Atlantic</COMPANY>

<PRICE>8.70</PRICE>

<YEAR>1987</YEAR>

</CD>

<CD>

<TITLE>Black Angel</TITLE>

<ARTIST>Savage Rose</ARTIST>

<COUNTRY>EU</COUNTRY>

<COMPANY>Mega</COMPANY>

<PRICE>10.90</PRICE>

<YEAR>1995</YEAR>

</CD>

</CATALOG>

Page 29: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Network ModelBuilt on hierarchical model but allows multiple parents and multiple children

Page 30: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Relational Model

Proposed by Edgar F. Codd (circa 1969)

Database is a collection of tables (relations)

Relational comes from ‘Relational Algebra/Calculus’ and not from ‘Relationships’

Relational model is based on extensive mathematical theory, which we will not cover in this class

Dominant database model

Oracle was the first to aggressively market a commercial relational database product

Page 31: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Dr. Edgar F(rank) Codd

MA Mathematics, MA Chemistry

MS and PhD in Communication Sciences

ACM Turing Award (1981)

Page 32: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

TablesArtists = Table (Relation)

ArtistID, City, Region, ... = Columns (Attributes)

Each row is called a Record (Tuple)

Page 33: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Using Relational Databases and SQL

Part IIDatabase Management Systems

Page 34: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Database Management Systems (DBMS)

A DBMS handles these functions:

Data definition: Defining new data structures for a database, removing data structures from the database, modifying the structure of existing data.

Update: Inserting, modifying, and deleting data.

Retrieval: Obtaining information either for end-user queries and reports or for processing by applications.

Administration: Registering and monitoring users, enforcing data security, monitoring performance, maintaining data integrity, dealing with concurrency control, and recovering information if the system fails.

Source: Wikipedia

Page 35: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Database Management Systems (DBMS)

Some common relational DBMSs:

MySQL, PostgreSQL (free, open source)

Oracle, MS SQL Server (commercial)

Page 36: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Database Schemas

The definition of the database, where you define

Tables

Relationships

Constraints

Stored Functions and Procedures

Views

Indexes

Schemas are typically represented by a schema diagram; see the Lyric diagram linked from the course page

Page 37: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Database Management Systems (DBMS)

You can have multiple databases, each with a single schema

A separate database for each application

Toystore (First database)

Bookstore (Second database)

Furniture Store (Third database)

Etc.

You can also have a single database, with multiple schemas

Page 38: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Database Management Systems (DBMS)

Page 39: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Using Relational Databases and SQL

Part IIIQuery Languages

Page 40: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Query Languages

Query: question

Query Language = A computer language used to extract data from a database

Data Sublanguage = A computer language used to extract and manipulate database data

SEQUEL/SQL (1974) developed at IBM

Page 41: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Query Languages

Data Sublanguage Alpha (Codd’s original query language)

Data Sublanguage SEQUEL (SQL)

Page 42: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

SQL

Stands for Structured Query Language

A non-procedural, domain-specific language (not like Java, C or C++)

An open ANSI and ISO standard

Supported by most major DBMS

Some variations in implementations

Used by programmers, managers, and database administrators

Page 43: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

SQLSQL is “nonprocedural” or “declarative”

Procedural languages, like Java or C, require programmers to implement an algorithm (“a series of instructions that will solve a problem in a finite amount of time”) to accomplish each task

Nonprocedural / declarative languages, like SQL, require the programmer to describe *what data* s/he wants. The platform (in this case, DBMS) determines how to produce the data

This is an important distinction, but as we will see, it is not as clear-cut for SQL as it is for, say, HTML.

Page 44: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

SQL Functions

View information from relational databases

Single and multiple table selections

Calculation and analysis

Manipulate information in relational databases

Insert and delete records

Update records

Create relational databases

Create databases, tables, constraints, ...

Page 45: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Nonstandard Features• SQL is an open standard, but developers of DBMSs often

add additional features that are not part of the standard

• Differentiate their products from competitors

• Vendor lock-in• What happens when you want to switch to a different DBMS?

• Is it a good idea to use features like this?

Page 46: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Using Relational Databases and SQL

Part IVLyric Database Discussion

Page 47: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Primary Keys

Primary key is used to uniquely identify every record in a table

Must be a field or combination of fields with unique values

What would happen if we needed to identify individuals in the university DB and tried to do this using first name? Last name? Both? Height? DOB?

If more than one field is required, we have a composite primary key

Page 48: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

The Lyric Database

Database for a web-based company that provides services to artists and the studios that they work for

Before we start extracting data from a database, we must understand the database completely first

Let’s go over all the tables and attributes

Page 49: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Primary Key Example

What is the primary key of the Studios table?

What is the primary key of the XRefArtistsMembers table? (hint: it may require more than one field to make up a primary key!)

Page 50: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Using Relational Databases and SQL

Part VMySQL

Page 51: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

MySQLFor coursework, we will use MySQL, which you must install on a USB drive.

– Bring a USB drive to the next class meeting!

You may also install it on your own laptop, but note that you will have to use the lab computers for the midterm and final exam, so be sure you can run it from a USB drive before the midterm.

Page 52: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Downloading MySQL, Part IGo to CS122 web page and follow the links to MySQL site

Get MySQL Community Server

mysql-5.5.x has the MySQL database client and server programs

Get the .zip files (not the MSIs) for your OS (Windows vs. OSX) and processor (32 vs 64 bit).

The files are labelled in a way that may confuse you into downloading the source code, which you don’t need. Be careful to get the binaries instead. MySQL 5.5.8-win32, for example, is 132 MB. The 27 MB file is the source code.

Page 53: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Downloading MySQL, Part II

Extract the zip files; you will have two directories

You may also want to use the MySQL Workbench, which is a GUI tool for working with MySQL. However, Workbench only works with the 32 bit version and is buggy in any case. Please don’t ask me to help you with it until at least week 3, after everyone is working smoothly with the main MySQL software.

mysql-workbench-gpl-5.2.x.... is the MySQL GUI Tools

Page 54: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Using MySQL In WindowsThis process should only be slightly different in OSX

Open up a Windows command line console

Use the cd command to navigate to the

mysql-5.x.xx-xx/bin directory

• If you add this directory to your PATH, you won’t have to navigate there every time. However, you *won’t* be able to add anything to the PATH on the lab computers.

Type in the following to start the database server:

start mysqld

Then type in the following to start the database client:

mysql –u root

Page 55: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Some MySQL Commands

Once MySQL has started and you see the mysql prompt:

At mysql> prompt type in: show databases;

At mysql> prompt type in: create database lyric;

At mysql> prompt type in: use lyric;

At mysql> prompt type in: show tables;

You shouldn’t see any yet

Page 56: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Adding Data to a DatabaseNow that the database is selected, let's load a database script

Download lyric.sql from the course webpage

At mysql> prompt type in: source [path] lyric.sql;

Where [path] stands for the path to the location where you saved lyric.sql.

If you put lyric.sql in mysql’s bin directory, all you will have to type is source lyric.sql

You should see a bunch of messages like this:Query OK, 1 rows affected (0.01 sec).

Page 57: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Verify that the database is set up

To check whether everything has worked correctly, type

SELECT * FROM Salespeople;

The output should look like this:+---------+-----------+----------+----------+--------+------------+

| SalesID | FirstName | LastName | Initials | Base | Supervisor |

+---------+-----------+----------+----------+--------+------------+

| 1 | Bob | Bentley | bbb | 100.00 | 4 |

| 2 | Lisa | Williams | lmw | 300.00 | 4 |

| 3 | Clint | Sanchez | cls | 100.00 | 1 |

| 4 | Scott | Bull | sjb | NULL | NULL |

+---------+-----------+----------+----------+--------+------------+

4 rows in set (0.39 sec)

Page 58: Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 1 Introduction

Using MySQL in the lab

If you will be using your own laptop in the lab, bring it to the next class meeting

If you will be using MySQL on a lab computer, *bring a USB drive to the lab* on Wednesday