20
1 Lecture 1: Introduction to databases Timothy G. Griffin Easter Term 2008 – IB/Dip/IIG www.cl.cam.ac.uk/Teaching/current/Databases/

1 Lecture 1: Introduction to databases Timothy G. Griffin Easter Term 2008 – IB/Dip/IIG

Embed Size (px)

Citation preview

Page 1: 1 Lecture 1: Introduction to databases Timothy G. Griffin Easter Term 2008 – IB/Dip/IIG

1

Lecture 1:Introduction to databases

Timothy G. Griffin Easter Term 2008 – IB/Dip/IIG

www.cl.cam.ac.uk/Teaching/current/Databases/

Page 2: 1 Lecture 1: Introduction to databases Timothy G. Griffin Easter Term 2008 – IB/Dip/IIG

2

Database Prehistory

Data entry Storage and retrieval

Query processing Sorting

Page 3: 1 Lecture 1: Introduction to databases Timothy G. Griffin Easter Term 2008 – IB/Dip/IIG

3

Early Automation

• Data management and application code were all tangled together– Hard to modify– Hard to generalize

• Many competing approaches

• Data manipulation code written at very low levels of abstraction

Page 4: 1 Lecture 1: Introduction to databases Timothy G. Griffin Easter Term 2008 – IB/Dip/IIG

4

Our Hero --- E. F. Codd

Edgar F. "Ted" Codd ( August 23, 1923 - April 18, 2003) was a British computer scientist who invented relational databases while working for IBM.

He was born in Portland, Dorset, studied mathsand chemistry at Oxford. He was a pilot in the Royal Air Force during WWII. In 1948 he joined IBMin New York as a mathematical programmer. He fled the USA to Canada during the McCarthy period. Later, he returned to the USA to earn a doctorate in CSfrom the University of Michigan in Ann Arbor. He then joined IBM research in San Jose.

His 1970 paper “A Relational Model of Data for Large Shared Data Banks” changed everything.

In the mid 1990’s he coined the term OLAP.

Page 5: 1 Lecture 1: Introduction to databases Timothy G. Griffin Easter Term 2008 – IB/Dip/IIG

5

Database Management Systems (DBMSs)

Raw Resources (bare metal)

DBMS

Your Applications Go Here Database abstractions

allow this interface to be cleanly defined and this allows applications and data management

systems to be implemented separately.

Page 6: 1 Lecture 1: Introduction to databases Timothy G. Griffin Easter Term 2008 – IB/Dip/IIG

6

Data Distrib.

Ser

vice

Too

ls

End

Use

rs

Service DBProduction DB

DevelopmentDB

Sub

mit

ters

Sub

mis

sion

tool

s

Add value(computation)

Add value (review etc.)

Data exchange

Other archives

Q/C etc

Databasedesign

Releases&Updates

Releases&Updates

Today, Database Systems are Ubiquitous

Database system design from the European Bioinformatics Institute (Hinxton UK)

Page 7: 1 Lecture 1: Introduction to databases Timothy G. Griffin Easter Term 2008 – IB/Dip/IIG

7

What is a database system?

• A database is a large, integrated collection of data

• A database contains a model of something!

• A database management system (DBMS) is a software system designed to store, manage and facilitate access to the database

Page 8: 1 Lecture 1: Introduction to databases Timothy G. Griffin Easter Term 2008 – IB/Dip/IIG

8

What does a database system do?

• Manages Very Large Amounts of Data

• Supports efficient access to Very Large Amounts of Data

• Supports concurrent access to Very Large Amounts of Data

• Supports secure, atomic access to Very Large Amounts of Data

Page 9: 1 Lecture 1: Introduction to databases Timothy G. Griffin Easter Term 2008 – IB/Dip/IIG

9

Databases are a Rich Area for Computer Science

• Programming languages and software engineering (obviously)

• Data structures and algorithms (obviously)• Logic, discrete maths, computation theory

– Some of today’s most beautiful theoretical results are in “finite model theory” --- an area derived directly from database theory

• Systems problems: concurrency, operating systems, file organisation, networks, distributed systems…

Many of the concepts covered in this course are “classical” --- they form the heart of the subject. But the field of databases is still evolving andproducing new and interesting research (hinted at in lectures 11 & 12).

Page 10: 1 Lecture 1: Introduction to databases Timothy G. Griffin Easter Term 2008 – IB/Dip/IIG

10

What this course is about

• According to Ullman, there are three aspects to studying databases:1. Modelling and design of databases

2. Programming

3. DBMS implementation

• This course addresses 1 and 2

Page 11: 1 Lecture 1: Introduction to databases Timothy G. Griffin Easter Term 2008 – IB/Dip/IIG

11

Course Outline1. Introduction2. Entity-Relationship Model3. The Relational Model 4. The Relational Algebra5. The Relational Calculus6. Schema refinement: Functional dependencies7. Schema refinement: Normalisation8. Transactions9. Online Analytical Processing (OLAP) 10. More OLAP11. Basic SQL and Integrity Constraints12. Further relational algebra, further SQL

Page 12: 1 Lecture 1: Introduction to databases Timothy G. Griffin Easter Term 2008 – IB/Dip/IIG

12

Recommended Reading

• Date, “An introduction to database systems”, 8th ed.• Elmasri & Navathe, “Fundamentals of database systems”, 4th ed.• Silberschatz, Korth & Sudarshan, “Database system concepts”, 4th

ed.• Ullman & Widom, “A first course in database systems”.• OLAP

– DB2/400: Mastering Data Warehousing Functions. (IBM Redbook) Chapters 1 & 2 only. http://www.redbooks.ibm.com/abstracts/sg245184.html

– Data Warehousing and OLAPHector Garcia-Molina (Stanford University)http://www.cs.uh.edu/~ceick/6340/dw-olap.ppt

– Data Warehousing and OLAP Technology for Data Mining Department of ComputingLondon Metropolitan Universityhttp://learning.unl.ac.uk/csp002n/CSP002N_wk2.ppt

Page 13: 1 Lecture 1: Introduction to databases Timothy G. Griffin Easter Term 2008 – IB/Dip/IIG

13

Some systems to play with

1. mysql:• www.mysql.org• Open source, quite powerful

2. PostgreSQL:• www.postgresql.org• Open source, powerful

3. Microsoft Access:• Simple system, lots of nice GUI wrappers

4. Commercial systems:1. Oracle 10g (www.oracle.com)2. SQL Server 2000 (www.microsoft.com/sql)3. DB2 (www.ibm.com/db2)

Page 14: 1 Lecture 1: Introduction to databases Timothy G. Griffin Easter Term 2008 – IB/Dip/IIG

14

Database system architecture

• It is common to describe databases in two ways– The logical level:

• What users see, the program or query language interface, …

– The physical level:• How files are organised, what indexing mechanisms

are used, …

• It is traditional to split the logical level into two: overall database design (conceptual) and the views that various users get to see

• A schema is a description of a database

Page 15: 1 Lecture 1: Introduction to databases Timothy G. Griffin Easter Term 2008 – IB/Dip/IIG

15

Three-level architecture

ConceptualSchema

Physicallevel

Conceptuallevel

InternalSchema

ExternalSchema 1

ExternalSchema 2

ExternalSchema n

Externallevel

Page 16: 1 Lecture 1: Introduction to databases Timothy G. Griffin Easter Term 2008 – IB/Dip/IIG

16

Logical and physical data independence

• Data independence is the ability to change the schema at one level of the database system without changing the schema at the next higher level

• Logical data independence is the capacity to change the conceptual schema without changing the user views

• Physical data independence is the capacity to change the internal schema without having to change the conceptual schema or user views

Page 17: 1 Lecture 1: Introduction to databases Timothy G. Griffin Easter Term 2008 – IB/Dip/IIG

17

Database design process

• Requirements analysis– User needs; what must database do?

• Conceptual design– High-level description; often using E/R model

• Logical design– Translate E/R model into (typically) relational schema

• Schema refinement– Check schema for redundancies and anomalies

• Physical design/tuning– Consider typical workloads, and further optimise

Next Lecture

Page 18: 1 Lecture 1: Introduction to databases Timothy G. Griffin Easter Term 2008 – IB/Dip/IIG

18

The Fundamental Tradeoff of Database Performance Tuning

• De-normalized data can often result in faster query response

• Normalized data leads to better transaction throughput, and avoids “update anomalies” (corruption of data integrity)

What is more important in your database --- query responseor transaction throughput? The answer will vary. What do the extreme ends of the spectrum look like?

Yes, indexing data can speed up transactions, but this just proves the point --- an index IS redundant data. General rule of thumb: indexing will slow down transactions!

Page 19: 1 Lecture 1: Introduction to databases Timothy G. Griffin Easter Term 2008 – IB/Dip/IIG

19

A Theme of this Course:OLTP vs. OLAP

• OLTP = Online Transaction Processing– Need to support many concurrent transactions

(updates and queries) – Normally associated with the “operational database”

that supports day-to-day activities of an organization.

• OLAP = Online Analytic Processing – Often based on data extracted from operational

database, as well as other sources– Used in long-term analysis, business trends.

Page 20: 1 Lecture 1: Introduction to databases Timothy G. Griffin Easter Term 2008 – IB/Dip/IIG

20

Data Distrib.

Ser

vice

Too

ls

End

Use

rs

Service DBProduction DB

DevelopmentDB

Sub

mit

ters

Sub

mis

sion

tool

s

Add value(computation)

Add value (review etc.)

Data exchange

Other archives

Q/C etc

Databasedesign

Releases&Updates

Releases&Updates

Design Heterogeneity

Database system design from the European Bioinformatics Institute (Hinxton UK)

De-normalized Derived Tables--- for fast access

Normalized Tables