ADB09 Intro

Embed Size (px)

Citation preview

  • 8/4/2019 ADB09 Intro

    1/46

    Advanced Databases

    Introduction

    dr. Toon Calders

    prof. dr. Jan Paredaens

  • 8/4/2019 ADB09 Intro

    2/46

    Outline

    Motivation for the course

    Other DH courses

    Practical organization Course topics

    Project

    Overview of changes

  • 8/4/2019 ADB09 Intro

    3/46

    Motivation for the Course

    Database = a piece of software to handle data:

    store,

    maintain, and

    query

    Most ideal system situation-dependent

    data type: simple / semi-structured / complex /

    types of queries: simple lookup / analytical / type of usage: multi-user / single-user / distributed /

  • 8/4/2019 ADB09 Intro

    4/46

    Motivation for the Course

    Relational databases are tuned towards:

    simple data

    simple, ad-hoc queries

    multiple users

    Other models are more suitable for other types of

    data

    Object-Oriented,

    Deductive,

    Semi-Structured Databases,

    Data warehouses

  • 8/4/2019 ADB09 Intro

    5/46

    Motivation for the Course

    Study different data models

    Advantages, disadvantages

    Conceptual level

    what are the important notions? Whats underneath?

    In a scientific way

    exact, not just claims

  • 8/4/2019 ADB09 Intro

    6/46

    Motivation for the Course

    Student knows:

    different database models

    Understands:

    why they are introduced conceptual notions

    Is able to:

    quickly master vendor-specific products

  • 8/4/2019 ADB09 Intro

    7/46

    Outline

    Motivation for the course

    Other DH courses

    Practical organization Course topics

    Project

    Overview of changes

  • 8/4/2019 ADB09 Intro

    8/46

    Other DH Courses

    Relational database systems

    (2ID05) Databases and Data Modelling

    (2ID35) Database Technology

    transations, indexing, query optimization, distributed DB Other database models

    (2ID45) Advanced Databases

    (2II15) Data Mining

    (2ID25) Information Retrieval (2ID99) Capita Selecta DH

  • 8/4/2019 ADB09 Intro

    9/46

    Outline

    Motivation for the course

    Other DH courses

    Practical organization Course topics

    Project

    Overview of changes

  • 8/4/2019 ADB09 Intro

    10/46

    Practical Organization

    In principle

    Wed 8:45 10:30 Practical session M 1.46

    no new material opportunity to practice, ask questions

    together solve exercises

    Fri 10:45 12:30 Lectures HG 6.09 XML : Paredaens (6 lectures)

    other parts: Calders

  • 8/4/2019 ADB09 Intro

    11/46

    Practical Organization

    Important information

    http://wwwis.win.tue.nl/~tcalders/teaching/advancedDB/

    Subscribe to 2ID45 on studyweb !

    messages to the whole class group

    lecture postponed, room changes,

    [email protected]

  • 8/4/2019 ADB09 Intro

    12/46

    Practical Organization

    Course material

    Book:

    Silberschatz, Korth, Sudarshan. Database system

    concepts 5th edition. McGraw-Hill International

    Lots of additional material on course webpage

    papers

    slides

    solutions to exercises

  • 8/4/2019 ADB09 Intro

    13/46

    Practical Organization

    Grades:

    70% written exam

    30% group project

    No project = no grade

    Grade for the project can be transfered to August,

    similar for grade for the exam

    Grades expire in August

  • 8/4/2019 ADB09 Intro

    14/46

    Outline

    Motivation for the course

    Other DH courses

    Practical organization Course Topics

    Project

    Overview of changes

  • 8/4/2019 ADB09 Intro

    15/46

    Course Topics

    Limitations of the relational model

    Deductive databases

    Object-Oriented Databases

    Data Warehousing & OLAP

    Semi-Structured data

  • 8/4/2019 ADB09 Intro

    16/46

    Limitations of the relational model

    Not every query can be expressed

    Transitive closure cannot be expressed in Relational

    Algebra

    Give all cities reachable from Antwerp by plane

    Give all smallest components of a part

    Give all decendants of person X

    Not even if youre very smart

    proof

    Extension to other relational query languages

  • 8/4/2019 ADB09 Intro

    17/46

    Deductive Databases

    Motivation is two-fold:

    add deductive capabilities to databases; the database

    contains:

    facts (intensional relations)

    rules to generate derived facts (extensional relations)

    Database is knowledge base

    Extend the querying

    datalog allows for recursion

  • 8/4/2019 ADB09 Intro

    18/46

    Deductive Databases

    Datalog as engine of deductive databases

    similarities with Prolog

    has facts and rules

    rules define -possibly recursive- views Semantics not always clear

    safety

    negation

    recursion

  • 8/4/2019 ADB09 Intro

    19/46

    Deductive Databases

    g(a,b). g(b,c). g(a,d).

    reach(X,X) :- g(X,Y).

    reach(X,Y) :- g(X,Y).

    reach(X,Z) :- reach(X,Y), reach(Y,Z).node(X) :- g(X,Y).

    node(Y) :- g(X,Y).

    unreach(X,Y) :- node(X), node(Y),

    not reach(X,Y).

  • 8/4/2019 ADB09 Intro

    20/46

    Deductive Databases

    In this topic we study:

    How to handle negation and recursion in the same

    program

    How to efficiently evaluate Datalog queries

  • 8/4/2019 ADB09 Intro

    21/46

    OO Databases

    Many applications require the storage and

    manipulation of complex data

    design databases

    geometric databases

    Object-Oriented programming languages manipulate

    complex objects

    classes, methods, inheritance, polymorphism

  • 8/4/2019 ADB09 Intro

    22/46

    OO Databases

    Very simple example:

    Class book

    set of authors

    title set of keywords

    Extremely simple to model in OO language

    Hard in relational database!

  • 8/4/2019 ADB09 Intro

    23/46

    OO Databases

    In many applications persistency of the data is

    nevertheless required

    protection against system failure

    consistency of the data

    Mapping: object in OO language tuples of atomic

    values in relational database is often problematic

  • 8/4/2019 ADB09 Intro

    24/46

    OO Databases

    Either we ignore the multivalued dependencies

    This table is in 3NF, BCNF

    Title Author Keyword

    Database System Concepts Silberschatz Database

    Database System Concepts Korth Database

    Database System Concepts Sudarshan Database

    Database System Concepts Silberschatz Storage

    Database System Concepts Korth Storage

    Database System Concepts Sudarshan Storage

  • 8/4/2019 ADB09 Intro

    25/46

    OO Databases

    Or we go to 4NF

    Title Author

    Database System Concepts Silberschatz

    Database System Concepts Korth

    Database System Concepts Sudarshan

    Title Keyword

    Database System Concepts Database

    Database System Concepts Storage

  • 8/4/2019 ADB09 Intro

    26/46

    OO Databases

    Basically OODB = persistent OO programming

    language

    Very important concept

    rather uninteresting scientifically

    This topic will mainly be self-study

    Reading bookchapter + Q & A session

  • 8/4/2019 ADB09 Intro

    27/46

    Data Warehousing & OLAP

    Data

    Warehouse

    Extract

    Transform

    Load

    Refresh

    OLAP Engine

    Monitor

    &

    Integrator

    Metadata

    Data Sources Front-End Tools

    Serve

    Data Marts

    Operational

    DBs

    other

    sources

    Data Storage

    OLAP

    Server Analysis

    Query/Reporting

    Data Mining

    ROLAP

    Server

  • 8/4/2019 ADB09 Intro

    28/46

    Data Warehousing & OLAP

    Transaction processing

    Operational setting

    Up-to-date = critical

    Simple data

    Simple queries; only touch a

    small part of the database

    Flight reservations

    ticket sales

    do not sell a seat twice

    reservation, date, name

    Give flight details ofX

    List flights to Y

  • 8/4/2019 ADB09 Intro

    29/46

    Data Warehousing & OLAP

    Decision support

    Off-line setting

    Historical data

    Summarized data

    Integrate different databases

    Statistical queries

    Flight company

    Evaluate ROI flights

    Flights of last year

    # passengers per carrier fordestination X

    Passengers, fuel costs,

    maintenance info

    Average % of seats

    sold/month/destination

  • 8/4/2019 ADB09 Intro

    30/46

    Data Warehousing & OLAP

    In this topic we will study:

    Conceptual models for decision support

    Database explosion problem

    Efficient implementation strategies indexing, view materialization

  • 8/4/2019 ADB09 Intro

    31/46

    XML

    Why is XML important?

    simple open non-proprietary widely accepted data

    exchange format

    XML is like HTML but

    no fixed set of tags

    X = extensible

    no fixed semantics (c.q. representation) of tags

    representation determined by separate stylesheet

    semantics determined by application

    no fixed structure

    user-defined schemas

  • 8/4/2019 ADB09 Intro

    32/46

    Jan Vijs

    11

    123

    Turnstreet

    66

    Hole Rd

    XML

  • 8/4/2019 ADB09 Intro

    33/46

    XML

    In this topic:

    XML

    XQuery, XSLT

    LiX

    Query

    Taught by prof Paredaens

  • 8/4/2019 ADB09 Intro

    34/46

    Outline

    Motivation for the course

    Other DH courses

    Practical organization Course Topics

    Project

    Overview of changes

  • 8/4/2019 ADB09 Intro

    35/46

    Project

    Pick one of the 4 topics:

    deductive databases / rule-based systems

    object-oriented databases

    data warehouses semi-structured databases

    Formulate your own project

    illustrating the different course concepts

    showing you mastered the technology

  • 8/4/2019 ADB09 Intro

    36/46

    Project

    Make a project proposal ( WEEK 10 )

    examples of last year will be given

    fulfilling certain constraints

    listing technologies to be used

    Status report ( WEEK 15 )

    Final report ( WEEK 20 )

    Project presentations ( WEEKS 21 & 22 )

  • 8/4/2019 ADB09 Intro

    37/46

    Outline

    Motivation for the course

    Other DH courses

    Practical organization Course Topics

    Project

    Overview of changes

  • 8/4/2019 ADB09 Intro

    38/46

    Overview of Changes

    First some facts and figures regarding Spring 2008

    Heterogeneous group

    Outside NL, HBO, BSc TU/e

    CSE

    BIS

  • 8/4/2019 ADB09 Intro

    39/46

    Overview of Changes

    Some suggestions I decided to act upon:

    1. Start with the difficult material:

    expressiveness of RA

    Gaifman locality

    2. Too much time is being spent on XML

    (5+5) (6+3) & topic (XSLT) has been added

    3. Disproportional weight given to XML in exam

    project no longer exclusively XML

  • 8/4/2019 ADB09 Intro

    40/46

    Overview of Changes

    Some suggestions I decided to act upon:

    4. Some materials and instruction just too hard

    extra exercices will be added; more modular

    5. The course was split up in lots of individual subjects,

    with no apparent relation to one another

    tried to handle that in the course motivation

  • 8/4/2019 ADB09 Intro

    41/46

    Overview of Changes

    Some suggestions that were ignored:

    A google for 'advanced databases' returns quite

    some courses from other universities that lookinteresting to me. Perhaps the lecturers could take a

    look at those.

    When (re-)constructing the course last year other

    universities ADB courses were surveyed. Many of the

    interesting topics are already handled in other courses(Data Mining, Information retrieval, Database technology)

  • 8/4/2019 ADB09 Intro

    42/46

    Overview of Changes

    Some suggestions that were ignored:

    Don't discuss prerequisite knowledge too much, it is

    prerequisite.

    Heterogeneous group.

    Balance the course subjects more, TC was discussed

    very specific while the other 3 subjects where treated

    in global.

    Time spent on TCis justified by its difficulty and its

    importance for database theory + motivates OODB &

    Deductive DB

  • 8/4/2019 ADB09 Intro

    43/46

    Overview of Changes

    Take-away message

    (some?) lecturers do act on questionnaires

    filling out the questionnaires is useful

  • 8/4/2019 ADB09 Intro

    44/46

    Overview of Changes

    Take-away message

    (some?) lecturers do act on questionnaires

    filling out the questionnaires is useful

  • 8/4/2019 ADB09 Intro

    45/46

    Summary

    Relational model has limitations

    simple queries

    simple data

    OO

    DBs allow complex data types Deductive databases, datalog complex queries

    Somewhere in-between: datawarehouses and OLAP

    special requirements, special datastructures

    Semi-structured data can be stored inX

    ML

    Project complements theoretical lectures

    Instructions for clarification

  • 8/4/2019 ADB09 Intro

    46/46

    !! See you on Friday !!