Upload
others
View
15
Download
0
Embed Size (px)
Citation preview
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
Bases de Dades: introduction to SQL (part 1)
Andrew D. [email protected]
Departamento de Ciencias de la ComputaciónUniversidad Autónoma de Barcelona
Fall, 2010
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
Outline
1 Last week on bases de dades...
2 Goals of today’s lecture
3 Data and data-centricity
4 Iterated design
5 Exercises
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
Embedded databasesThe NOSQL movement
Embedded databases
DB archaeology: Repeat my “locate *.sqlite” experiment on your(or a university) computer. Try to find more examples ofembedded databases in applications. Other things you mightsearch for are “*.sql” or “*.db”.
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
Embedded databasesThe NOSQL movement
The nosql movement
Changing times: I mentioned “nosql” databases a few times, butnever said what that means. Do some searching to discover whatthe principal ideas are behind the nosql movement.
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
Goals and decisionsOutline
Goals for today
Begin thinking about database design and how it affects databaseapplications.Introduce the high-level language SQL (Structured QueryLanguage).Familiarize ourselves with some of the main types of SQLstatements.Understand the importance of data modeling.
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
Goals and decisionsOutline
Design or use?
We have a problem of the chicken and the egg...It’s hard to teach about database use before we know how adatabase is designed.But it’s hard to design a database before we know somethingabout how it will be used.In this class we will follow an iterated design philosophy.We will design a little (learning as we go), use a DB a little(learning as we go), and then re-design a little to fix any problemswe encounter.Pretty similar to the real world (but be careful...)
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
Goals and decisionsOutline
Standards, standards everywhere!
Someone once commented: The great thing about standards isthat there are so many to choose from!This is very true in the DBMS world, especially with SQL.There have been many revisions and major releases of the ANSISQL standard.SQL:1999 is the most broadly supported, SQL:2008 is the latestversion.No implementation is complete (there will be some missingfeatures).Every implementation implements custom extensions.Read the manual.
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
Goals and decisionsOutline
Outline
Data and data centricityWhen data is central.Why use a database?
A simple case studyBut nonetheless a real oneChoosing a data model
Expressing the design in SQLAnswering questionsExercises
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
Why use a database?The unholy triad
Data-centric applications
Many applications are inherently data-centric: the generate andconsume large quantities of data.Email clients, search engines, reservation systems, productcatalogs, you name it: it’s all about the data.How should this data be stored in a way that is efficient?How should this data be stored so that it can be retrieved?How should this data be stored so that it can be maintained?How should this data be stored so that it can be flexibly retrievedand maintained in unforeseen ways?
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
Why use a database?The unholy triad
Efficiency, standard API
DBMS provide answers to all of those questions.They are one of the most researched, advanced, sophisticatedand reliable technologies in the world of computer science.Unless you have a very good reason for doing otherwise, a DBMSshould be the backbone of any data-centric application.
Andy’s tenth axiom: Any sufficiently complex data-centric applicationcontains an ad hoc, informally-specified, bug-ridden, slowimplementation of half a relational database management system. 1
1With apologies to Greenspun and his tenth rule of programmingAndrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
Why use a database?The unholy triad
Design decisions
Life in the database world is often a zero-sum game:
Compromise and cooperation is necessary.Remember that you may have to wear any of these hats at anytime...
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB
A country profile database
Imagine an application (or group of applications) that mustroutinely deal with information about countries and the languagesspoken in them.It could be a GIS application, or a shipping address database, orjust about anything.What are the data modeling needs of such applications?What information should we model about countries?Assume for now that we are interested mostly in geopolitical andlinguistic information.
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB
Thought experiments
What types of queries will applications need to perform against thistype of database?
What language is spoken in a particular (group of) countries?In what countries is a particular language spoken?In how many countries is a language spoken?What countries are part of a geopolitical region?...
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB
A first design: mono-tabular
Let’s assume we want to model the following information about acountry:
1 Country name2 Official ISO country code3 The continent on which it is4 The geopolitical region it belongs to5 Language spoken
Hmmm... Already we run into problems.
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB
Second run: mono-tabular
There are few countries in the world where only ONE language isspoken:
1 Country name2 Official ISO country code3 The continent on which it is4 The geopolitical region it belongs to5 Language spoken6 Is language official?7 Percentage of population speaking language
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB
The data definition
Data (relations, actually) are represented in as tables in DBMS. Tablecolumns represent the attributes, rows are instances of data. Adescription of the structure all tables is called a database schema.
Our table description (each row corresponds to a column in DB):
Column Name Type Null? DefaultCode char(3) no ’ ’Name char(52) no ’ ’Continent enum no ’Asia’Region char(26) no ’ ’Language char(30) no ’ ’IsOfficial enum no ’F’Percentage float(4,1) no 0.0
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB
Some sqlite meta-commands
This is a small sample, the .help command is very useful:
Command Action.read <file> execute sequence of SQL from file.schema <table> describe table structure.dump <table> dump SQL representing DB.quit quit sqlite3 CLI.help get help on meta commands
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB
Our schema in SQL
CREATE TABLE Country (Code char(3) NOT NULL default ’’,Name char(52) NOT NULL default ’’,Continent enum NOT NULL default ’Asia’,Region char(26) NOT NULL default ’’,Language char(30) NOT NULL default ’’,IsOfficial enum NOT NULL default ’F’,Percentage float NOT NULL default 0.0
);
This SQL statement will create the ’Country’ table.Table structure expressed in a Data Definition Language (DDL).The table is created in the current database.
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB
Inserting from data
Each row in a table must be INSERT-ed.A row corresponds to a datum, or to a single element in therelation (using the set-theoretic formulation of relations).
INSERT INTO CountryVALUES(’AFG’, ’Afghanistan’,
’Southern and Central Asia’,’Asia’,’Balochi’, ’F’, 0.9);
...
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB
SQL syntax diagrams
SQL has a lot of syntax. Too much, some might say...The sqlite reference manual uses has nice syntax diagrams tohelp (http://www.sqlite.org/lang.html):
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB
A sample session
The SQL file used in this example is on the course website 2.
09:09:20> sqlite3 country.sqliteSQLite version 3.6.22Enter ".help" for instructionsEnter SQL statements terminated with a ";"sqlite> .read mono_country.sqlsqlite>
2http://www.cvc.uab.es/~bagdanov/database/mono_country.zip
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB
SELECT-ing data
The main way to retrieve data (rows) is through the SELECTstatement.Your SQL reference will “fall open” to the select page.General form: SELECT <> FROM <> WHERE <>;
Each of the <> can be very, very complex.
Simple examples:
SELECT * FROM Country; /* ALL columns, ALL rows */
/* In which countries is Spanish spoken? */SELECT Code,Name FROM Country WHERE Language=’Spanish’;
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB
More complex queries
/* In which countries is Spanish an official language? */SELECT Code,Name FROM Country WHERE Language=’Spanish’
AND IsOfficial=’T’;
/* We can COUNT things too. */SELECT count(*) FROM Country WHERE Language=’Spanish’;
/* On what continents is Spanish spoken? */SELECT Continent FROM Country where Language=’Spanish’;
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB
Key points
Things to take home:Data-centric applicationsAndy’s Tenth AxiomBasic sqlite interaction (meta-commands versus SQL statements).The CREATE TABLE, INSERT and SELECT SQL statements(basic versions).SQL syntax diagrams.
Next Week:Data Definition Language: specifying structure and constraints ontables.Refining the design of our case study: normalization primary keys.More complex queries.
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
Exercises: lecture 2
A few exercises to do at home. Please come to the next problem sessionprepared to discuss your findings (items indicated in BOLD will be collectedfor grading):
1 DB creation: Download the ’mono_country.zip’ file from the coursewebsite. Duplicate my experiments with creating the ’Country’ table andeach of the sample queries I showed in the course. 3
2 Redundancy: there is a LOT of redundancy in our first design of thedatabase (this is called an “unnormalized database”). In particular, all ofthe country information is repeated for each language in the country.What problems might this cause for the application programmer and theDB administrator? How might you fix this problem of redundancy?
3http://www.cvc.uab.es/~bagdanov/database/mono_country.zip
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)
Last week on bases de dades...Goals of today’s lectureData and data-centricity
Iterated designExercises
Exercises (TO BE COLLECTED 19 October)
3 Do it yourself: design (just design, do not implement) a data structureimplementing the information about countries used in this lecture. Whatoperations must supported on to implement all of the queries weexamined today? What are the advantages and disadvantages ofimplementing your own data structures versus using a DBMS.
4 Distinct attributes: Write queries to determine:The number of distinct languages spoken.The number of distinct regions in which Spanish is spoken.The countries where Spanish is NOT an official language, but isspoken by more than 50 of the population.
5 Inserting new rows: if you search for the countries where Catalan isspoken, you will note that there are some missing entries (France andItaly, at least). Write INSERT statements to insert these missing entriesinto the DB. Show the new results of search for Catalan-speakingcountries.
Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)