32
SEMINAR SELECTED TOPICS IN DATABASE THEORY Lecture 1: Introduction / The Relational Model David Carral, Markus Kr ¨ otzsch Knowledge-Based Systems TU Dresden, 10th October 18

SEMINAR SELECTED TOPICS IN DATABASE THEORY

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SEMINAR SELECTED TOPICS IN DATABASE THEORY

SEMINAR SELECTED TOPICS INDATABASE THEORY

Lecture 1: Introduction / The Relational Model

David Carral, Markus Krotzsch

Knowledge-Based Systems

TU Dresden, 10th October 18

Page 2: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Introduction

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 2 of 28

Page 3: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Course Tutors

David Carral Markus Krötzsch

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 3 of 28

Page 4: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Structure of the Seminar and Evaluation

Lectures• Wednesday 10th (i.e., today), DS6: Introductory lecture 1

• Wednesday 17th, DS6: Introductory lecture 2

• Afterwards: Office hours in 3035 and presentations

Evaluation• Paper summary: self-selected research paper;a ~15 pages• Presentation:

– 20 minutes + discussion– Participate in the presentations of other students

aSee the “Literature” tab at https://iccl.inf.tu-dresden.de/web/Research_Advances_in_Database_Theory_(WS2018).

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 4 of 28

Page 5: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Other stuff...

Web Pagehttps://iccl.inf.tu-dresden.de/web/Research_Advances_in_Database_Theory_(WS2018)

Lecture NotesAll slides will be available online.

Reading listSerge Abiteboul, Richard Hull, Victor Vianu; Foundations ofdatabases. Available at http://webdam.inria.fr/Alice/

AcknowledgementsCheck out Vim Martens!

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 5 of 28

Page 6: SEMINAR SELECTED TOPICS IN DATABASE THEORY

On to the content...

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 6 of 28

Page 7: SEMINAR SELECTED TOPICS IN DATABASE THEORY

What is a database?

A Database Management System (DBMS) is a software tomanage collections of data. The architecture of DBMS consist ofthree levels:

• External Level: Application-specific user views

• Logical Level: Abstract data model, independent ofimplementation, conceptual view

• Physical Level: Data structures and algorithms,platform-specific

In this seminar: focus on logical view for relational data model

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 7 of 28

Page 8: SEMINAR SELECTED TOPICS IN DATABASE THEORY

The Relational Model

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 8 of 28

Page 9: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Database = collection of tables

Schedule

Movie Cinema Date R-rated

Goodfellas Thalia 15/10 True

Unforgiven Thalia 17/10 True

Boogie Nights Rundkino 21/11 True

Annie Hall Rundkino 21/11 False

A table has a schema:

• Schedule[{Movie, Cinema, Date, R-rated}]

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 9 of 28

Page 10: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Towards a a formal definition of “table”A table row has one value for each column.

• That is, a row is a function from the attributes of the tableschema to specific values.

Schedule

Movie Cinema Date R-rated

. . . . . . . . . . . .

Boogie Nights Rundkino 21/11 True

. . . . . . . . . . . .

The above row can be represented with the function:

f : {Movie 7→ Boogie Nights, Cinema 7→ Rundkino,

Date 7→ 21/11, R-rated 7→ True}

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 10 of 28

Page 11: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Database = set of tables

Let dom (“domain”) be the set of conceivable values in tables.

Definition 1• A relation schema R[U] consists of a relation name R and a

finite set U of attributes

• |U| is the arity of R[U]

• A table for R[U] is a finite set of functions from U to dom

• A database instance I is a finite set of tables

Note: we disregard the order and multiplicity of rows.Tables are also called relation instances. The table with relationschema R[U] in the database instance I is written RI.

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 11 of 28

Page 12: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Database = set of tables

Let dom (“domain”) be the set of conceivable values in tables.

Definition 1• A relation schema R[U] consists of a relation name R and a

finite set U of attributes

• |U| is the arity of R[U]

• A table for R[U] is a finite set of functions from U to dom

• A database instance I is a finite set of tables

Note: we disregard the order and multiplicity of rows.Tables are also called relation instances. The table with relationschema R[U] in the database instance I is written RI.

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 11 of 28

Page 13: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Database = set of tables

Schedule

Movie Cinema Date R-rated

Goodfellas Thalia 15/10 True

Unforgiven Thalia 17/10 True

Boogie Nights Rundkino 21/11 True

Annie Hall Rundkino 21/11 False

• The domain dom of the above table is the following set:{Goodfellas, Thalia, 15/10, True, Unforgiven, Thalia, 17/10,Boogie Nights, Rundkino, 21/11, Annie Hall, Rundkino, False }

• The above is a table for the relation schemaSchedules[{Movie, Cinema, Date, R-rated}]

• Let I be a database instance. Then, SchedulesI is the set ofrows in this table.

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 12 of 28

Page 14: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Database = set of tables

Schedule

Movie Cinema Date R-rated

Goodfellas Thalia 15/10 True

Unforgiven Thalia 17/10 True

Boogie Nights Rundkino 21/11 True

Annie Hall Rundkino 21/11 False

The table represented above is the set {r1, r2, r3, r4} where r1, r2,r3, and r4 are the following functions:

r1 = {M 7→ Goodfellas, C 7→ Thalia, D 7→ 15/10, R 7→ True}

r2 = {M 7→ Unforgiven, C 7→ Thalia, D 7→ 17/10, R 7→ True}

r3 = {M 7→ Boogie Nights, C 7→ Rundkino, D 7→ 21/11, R 7→ True}

r4 = {M 7→ Annie Hall, C 7→ Rundkino, D 7→ 21/11, R 7→ False}

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 13 of 28

Page 15: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Database = set of relations

Remark: Attribute names do not matter. Instead of the function

{M 7→ Goodfellas, C 7→ Thalia, D 7→ 15/10, R 7→ True}

we could also use a tuple:

〈Goodfellas, Thalia, 15/10, True〉

Necessary assumption: Attributes have a fixed order.

Definition 2• A relation schema R[U] is defined as before

• A table for R[U] is a finite subset of dom|U|

• A database instance I is a finite set of tables

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 14 of 28

Page 16: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Database = set of relations

Schedule

Movie Cinema Date R-rated

Goodfellas Thalia 15/10 True

Unforgiven Thalia 17/10 True

Boogie Nights Rundkino 21/11 True

Annie Hall Rundkino 21/11 False

The table represented above is the following set:

{〈Goodfellas, 15/10, True, Thalia〉,

〈Unforgiven, 17/10, True, Thalia〉,

〈Boogie Nights, 21/11, True, Rundkino〉,

〈Annie Hall, 21/11, False, Rundkino〉

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 15 of 28

Page 17: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Database = set of facts

Another convenient way to write databases:

Definition 3A fact is an expression p(t1, . . . , tn) where

• p is an n-ary predicate symbol

• t1, . . . , tn are constan symbols

A database instance is a finite set of facts.

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 16 of 28

Page 18: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Database = set of facts

Schedule

Movie Cinema Date R-rated

Goodfellas Thalia 15/10 True

Unforgiven Thalia 17/10 True

Boogie Nights Rundkino 21/11 True

Annie Hall Rundkino 21/11 False

The information in the above corresponds to the following facts:

Schedule(Goodfellas, 15/10, True, Thalia)

Schedule(Unforgiven, 17/10, True, Thalia)

Schedule(Boogie Nights, 21/11, True, Rundkino)

Schedule(Annie Hall, 21/11, False, Rundkino)

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 17 of 28

Page 19: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Graphical Representation

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 18 of 28

Page 20: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Summary: Different Perspectives

Perspective DB Instance Table Row

Named Set of tables Set of functions Function

Unnamed Set of tables Set of tuples Tuple

Fact-based Set of facts Set of facts Fact

Graph Labelled hypergraph L. hypergraph L. Edge

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 19 of 28

Page 21: SEMINAR SELECTED TOPICS IN DATABASE THEORY

The Relational Algebra

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 20 of 28

Page 22: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Relational Algebra Queries

Query language based on a set of operations on databases.Each operation refers to some tables and produces another table.

Main operations of the named perspective:

• Selection σ

• Projection π

• Join ./

• Renaming δ

• Difference −

• Union ∪

• Intersection ∩

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 21 of 28

Page 23: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Selection“Find all R-rated movies”

σR-rated=“True”Schedule

“Find all connections that begin and end in the same stop”

σFrom=toConnect

Definition 4The selection operator has the form σn=m

• n is an attribute name

• m is an attribute name or a constant value

Consider a table RI for the relational schema R[U].

• For m constant value: σn=m(RI) = {f ∈ RI | f (n) = m}

• For m constant value: σn=m(RI) = {f ∈ RI | f (n) = f (m)}

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 22 of 28

Page 24: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Selection

“Find all dates in which some movie is projected.”

πDateSchedule

Definition 5The projection operator has the form πa1,...,an where each ai is anattribute name.Consider a table RI for R[U].

πa1,...,an (RI) = {f{a1,...,an} | f ∈ RI}

where f{a1,...,an} is the restriction of f to the domain {a1, . . . , an}, i.e.,the function {a1 7→ f (a1), . . . , an 7→ f (an)}.

Remark: Projection is only defined if ai ∈ U for each ai.

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 23 of 28

Page 25: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Natural Join

ScheduleMovie Cinema Date R-rated

Unforgiven Thalia 17/10 True

Boogie Nights Rundkino 21/11 True

LocationCinema Neighborhood

Thalia Neudstadt

Rundkino Altstadt

Schedule ./ LocationMovie Cinema Date R-rated Neighborhood

Unforgiven Thalia 17/10 True Neudstadt

Boogie Nights Rundkino 21/11 True Altstadt

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 24 of 28

Page 26: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Natural Join

Definition 6The natural join operator has the form ./.Consider tables RI for R[U] and SI for S[V].

RI ./ SI = {f : U ∪ V → dom | fU ∈ RI and fV ∈ SI}

where fU (resp. fV ) is the restriction of f to elements in U (resp. V)as before.

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 25 of 28

Page 27: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Rename

δMovie,Cinema,Date,R-rated→Film,Cinema,Date,R-rated(Schedule)

Definition 7The renaming operator has the form δa1,...,an→b1,...,bn with all ai

mutually distinct attribute names, and likewise for all bi.Consider a table RI for R[{a1, . . . , an}]

δa1,...,an→b1,...,bn (RI) = {f ◦ g | f ∈ RI and g : {bi 7→ ai}1≤i≤n}

where f ◦ g is function composition: (f ◦ g)(x) = f (g(x))

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 26 of 28

Page 28: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Difference, Union, Intersection

• Binary operators defined like the usual set operations.

• Remark: These operators are only defined on tables of thesame relational schema. That is, tables with the same set ofattributes.

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 27 of 28

Page 29: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Research on Database Theory

• Relational algebra query answering.– Studying combined/data/query Complexity.

• Developing and studying novel query languages.– Fragments of relational algebra such as conjunctive

queries. That is, relational algebra expressions that useonly select, project, join, and rename.

– Extensions of relational algebra such as Datalog.• Study the “expressivity” of a query language.

– Let’s do this!

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 28 of 28

Page 30: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Research on Database Theory

• Relational algebra query answering.– Studying combined/data/query Complexity.

• Developing and studying novel query languages.– Fragments of relational algebra such as conjunctive

queries. That is, relational algebra expressions that useonly select, project, join, and rename.

– Extensions of relational algebra such as Datalog.

• Study the “expressivity” of a query language.

– Let’s do this!

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 28 of 28

Page 31: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Research on Database Theory

• Relational algebra query answering.– Studying combined/data/query Complexity.

• Developing and studying novel query languages.– Fragments of relational algebra such as conjunctive

queries. That is, relational algebra expressions that useonly select, project, join, and rename.

– Extensions of relational algebra such as Datalog.• Study the “expressivity” of a query language.

– Let’s do this!

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 28 of 28

Page 32: SEMINAR SELECTED TOPICS IN DATABASE THEORY

Research on Database Theory

• Relational algebra query answering.– Studying combined/data/query Complexity.

• Developing and studying novel query languages.– Fragments of relational algebra such as conjunctive

queries. That is, relational algebra expressions that useonly select, project, join, and rename.

– Extensions of relational algebra such as Datalog.• Study the “expressivity” of a query language.

– Let’s do this!

David Carral, 10th October 18 Seminar Selected Topics in Database Theory slide 28 of 28