50
29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

Embed Size (px)

Citation preview

Page 1: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Graphs and Functions:Recurring Themes in Databases

Alex Poulovassilis

Page 2: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Databases

Databases store information of relevance to a group of users e.g.

• employees’ personal details, for a Personnel department

• employees’ income details, for a Payroll department

• details of molecular structure and interaction, for a Drug company

• details of TV broadcasts and ratings, for a TV company

Page 3: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Data models

The information stored in a database is expressed using a data model

The binary relational data model is a very simple data model

In this model, information is represented using entities and binary relationships between them

These can be represented as the nodes and edges of a graph

e.g. here is the schema of a ViewingFigures database:

Page 4: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Page 5: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Data and Schema

The schema of a database defines the type and format of the actual data – it is part of the database’s meta data

The data in the database conforms to the schema.

So a fragment of the ViewingFigures data might be:

Page 6: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Page 7: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Page 8: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Page 9: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

The TriStarp Project

The TriStarp research project, led by Prof Peter King from the mid 1980s, aimed to

(1) develop repository technology for binary relational information

(2) develop languages for computing with this kind of information

Mir Derakhshan worked on (1). Carol Small and I worked on (2).

We were supported by CASE studentships from IBM UK Labs, Prof Geoff Sharman and Norman Winterbottom being our industrial supervisors

Page 10: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Computing with Binary Relational Data

There are two natural candidates for this:

• logic languages - explored by Carol

• functional languages - the topic of my PhD research, resulting in the FDL language (1990)

Page 11: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

The Logic Approach

• Find all actors who star in programme P205

stars(P205,x?)

stars

P205 x?

stars

Programme Actor

Page 12: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

The Logic Approach

• Find all programmes in which Kevin Bacon stars

stars(p?,’Kevin Bacon’)

stars

p? Kevin Bacon

stars

Programme Actor

Page 13: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

The Logic Approach

• Find all actors who have starred with Kevin Bacon

stars(p?,’Kevin Bacon’),stars(p?,x?)

stars

Programme Actor

stars

p? Kevin Bacon

x?

stars

Page 14: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

The Functional Approach

The functional approach interprets binary relationships as functions, leading to the so-called functional data model

starsProgramme Actor

inv_stars

Page 15: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

The Functional Approach

• Find all actors who star in programme P205

stars P205

starsProgramme Actor

inv_stars

Page 16: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

The Functional Approach

• Find all programmes in which Kevin Bacon stars

inv_stars ’Kevin Bacon’

starsProgramme Actor

inv_stars

Page 17: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

The Functional Approach

Find all actors who have starred with Kevin Bacon

[x | pinv_stars ’Kevin Bacon’; xstars p]

starsProgramme Actor

inv_stars

Page 18: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

More complex queries

Find the most popular programme showing at 10pm on 1st November, 2001:

let maxViewers = max [viewers s | s inv_date (1,11,2001);

(start s) <= 2200; (end s) > 2200] in

[of s | s inv_viewers maxViewers]

Page 19: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Derived Functions

Find the most popular programme showing at time t on date d:

mostPopular t d =

let maxViewers = max [viewers s | s inv_date d;

(start s) <= t; (end s) > t] in

[of s | s inv_viewers maxViewers]

Page 20: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Recursive functions

Find actors linked to Kevin Bacon via any number of edges labelled

stars:

linkedTo [‘Kevin Bacon’]

where:

linkedTo result = let new = [x | y result;

p inv_stars y;

x stars p] in

if (subset new result)

then result

else linkedTo (new U result)

stars

Programme Actor

linkedTo

Page 21: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Oracle of Bacon at Virginia www.cs.virginia.edu/oracle

Bacon Number No of People

0 1

1 1479

2 115203

3 285896

4 65055

5 4535

6 534

7 81

8 28

9 1

10 1

Total linkable actors 472814

Page 22: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Higher-order functions

More generally:

linkedTo s = complete (stars,inv_stars) s

where:

complete (f,inv_f) result = let new = [x | b result;

a inv_f b;

x f a] in

if (subset new result)

then result

else complete (f,inv_f) (new U result)

f

A B

linkedTo

Page 23: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Collaboration Networks

Find all people linked to a person P via the author relationship:

complete (author,inv_author) [P]

author Paper Person

inv_author

Page 24: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Acknowledgements…

If we ask the simpler query

[x | pinv_author ’Alexandra Poulovassilis’; xauthor p]

author Paper Person

inv_author

we obtain the people with whom I have co-authored research papers:J.Bailey K.Benkerimi S.Courtenage P.Demetriades M.Derakhshan B.Heydecker S.Hild P.J.H.King M.Levene N.Lorentzos P.J.McBrien P.Newson E.Nonas R.Offen S.Reddi S.Schwarz C.Small E.Tuv P.T.Wood L.Xu

Page 25: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Drawbacks of the Binary Relational Model

Despite its elegance, the binary relational model has some drawbacks:

(a) large binary relational schemas can be hard to understand

(b) it is not so natural for representing higher-dimensional relationships

Page 26: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

The Hypernode Model

(a) led to research into nested-graph data models with Mark Levene

Page 27: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Higher-dimensional relationships

An example of problem (b) is the 3-way relationship between

Distribution companies, Programmes and TV companies

which has to be represented by an entity and 3 binary relationships:

Supply

DistrCo

Programme

TVCo

Page 28: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

The PFL Language

This led to the development of a new functional language PFL, with Carol Small, which directly supports higher-dimensional relationships

e.g. the supply relationship is accessed by a single selector function

|supply : (DistrCo,Programme,TVCo) [(DistrCo,Programme,TVCo)]

Some examples:

|supply (Any,P205,BBC)

|supply (Any,Any,BBC)

|supply (Any,P205,Any)

Page 29: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Active Databases

Up to now, I have been looking at schema, data and derived database information

In the 1990s a new kind of database information was being explored, namely event-condition-action rules of the form:

on event if condition do action

ECA rules make a database active in that it can automatically execute actions if events occur and conditions hold

Page 30: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Active PFL

In a project during mid 1990s, we extended PFL with ECA rules (with Swarup Reddi and Carol Small)

For example:

on insert viewers

if [s | (s,n)|viewersInc (Any,Any); n < 500000]

do insert [s | (s,n)|viewersInc (Any,Any); n < 500000] lowRated

viewers

Showing Number

Page 31: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

PFL’s ECA rule execution semantics

We specified these in PFL itself, to experiment before implementing:

execSched (db,s) =if s = []then (db,[])else execSched (schedRules (exec (head s,db),s))

schedRules (db,a:s) =let (db,pre,suf) =

fold schedRule (db,[],[]) (triggers a) in(db,pre ++ s ++ suf)

schedRule i (db,pre,suf) =if (eval (event-condition-query i) db) = {}then (db,pre,suf)else updateSched (actions i,mode i,db,pre,suf)

Page 32: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Analysing and Optimising ECA rules

Techniques are needed for analysing and optimising the behaviour of ECA rules

In a project that started in late 1990s, we have been using the functional semantics of ECA rule execution as the basis for developing such techniques (with James Bailey, Simon Courtenage, Pete Newson)

In particular, we have been investigating abstract interpretation and partial evaluation of the rule execution semantics for analysis and optimisation, respectively.

Page 33: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Abstract execution semantics

execSched* (db*,s*) =if s* = []then (db*,[])else execSched* (schedRules* (exec* (head s*,db*),s*))

schedRules* (db*,a*:s*) =let (db*,pre*,suf*) =

fold schedRule* (db*,[],[]) (triggers a*) in(db*,pre* ++ s* ++ suf*)

schedRule* i (db*,pre*,suf*) =if (eval* (event-condition-query i) db*) = Falsethen (db*,pre*,suf*)else updateSched (actions i,mode i,db*,pre*,suf*)

Page 34: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Correctness of the Abstract Execution

If for all queries q, abstract databases db*, and abstract actions a*:

• conc (exec* (a*,db*)) is a superset of

[exec (a,db) | (a,db) conc (a*,db*)]

• eval* q db* = False implies that

for all db in conc db*, eval q db = {}

then execSched* is a conservative test for

• rule termination

• rule unreachability

Page 35: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Partial Evaluation of Rule Execution

Produce a specialised equation for schedRules for each kind of rule action

that may appear at the head of the schedule:

schedRules (db,a1:s) =

let (db,pre,suf) =

fold schedRule (db,[],[]) (triggers a1) in

(db,pre ++ s ++ suf)

schedRules (db,a2:s) =

let (db,pre,suf) =

fold schedRule (db,[],[]) (triggers a2) in

(db,pre ++ s ++ suf) . . .

Page 36: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Partial Evaluation of Rule Execution

Suppose action a1 triggers rule 2 and rule 3 (in that order of priority).

Then we can replace triggers a1 above by [2,3] and apply fold

obtaining:

schedRules (db,a1:s) =

let (db,pre,suf) =

schedRule (schedRule (db,[],[]) 2) 3 in

(db,pre ++ s ++ suf)

Page 37: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Partial Evaluation of Rule Execution

Now we can apply schedRule (assuming rule 2 has Immediate scheduling mode and rule 3 Deferred scheduling mode):

schedRules (db,a1:s) = let (db,pre,suf) = if (eval (event-condition-query 2) db) = {}then if (eval (event-condition-query 3) db) = {}

then (db,[],[]) else (db,[],bind (actions 3) db)

else if (eval (event-condition-query 3) db) = {} then (db,bind (actions 2) db,[]) else (db,bind (actions 2) db,bind (actions 3) db)

in (db,pre ++ s ++ suf)

Page 38: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Heterogeneous Databases

So far, I have been discussing single databases

However, larger-scale applications may need to integrate information from several databases, possibly supporting different data models

To integrate information stored in such heterogeneous databases it is necessary to form a single, integrated schema

Conflicts may existing between the various source schemas, which must be removed by applying transformations to these schemas

Page 39: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Page 40: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Graphs for Schema Transformation

In work with Peter McBrien started in late 1990s, we have developed a general framework for transforming and integrating heterogeneous database schemas

We represent schemas expressed in higher-level data models, such as relational or object-oriented, in terms of a nested-graph data model, thus allowing us to transform between different data models

In our schema transformation framework, new schema constructs are defined using queries over existing constructs

In our framework, schema transformations are reversible, thus allowing query and data translation between schemas:

Page 41: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Page 42: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Page 43: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

addClass Series [p|(p,S)category]

addClass Doc [p|(p,D)category]

addClass Film [p|(p,F)category]

addClass Prog [p|(p,c)category]

Page 44: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

addSubClass Film Prog

addSubClass Doc Prog

addSubClass Series Prog

addClass Series [p|(p,S)category]

addClass Doc [p|(p,D)category]

addClass Film [p|(p,F)category]

addClass Prog [p|(p,c)category]

Page 45: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

addSubClass Film Prog

addSubClass Doc Prog

addSubClass Series Prog

addClass Series [p|(p,S)category]

addClass Doc [p|(p,D)category]

addClass Film [p|(p,F)category]

addClass Prog [p|(p,c)category]

delRel category [(p,F)|pFilm] U

[(p,D)|pDoc] U

[(p,S)|pSeries]

Page 46: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

addConstraint subset Film ProgaddConstraint subset Doc

Prog addConstraint subset Series

Prog

addNode Series [p|(p,S)category]addNode Doc [p|(p,D)category]addNode Film [p|(p,F)category]addNode Prog [p|(p,c)category]

delEdge category [(p,F)|pFilm] U [(p,D)|pDoc] U [(p,S)|pSeries]

delNode Programme ProgdelNode Category [F,D,S]

Page 47: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

delConstraint subset Film ProgdelConstraint subset Doc

Prog delConstraint subset Series

Prog

delNode Series [p|(p,S)category]delNode Doc [p|(p,D)category]delNode Film [p|(p,F)category]delNode Prog [p|(p,c)category]

addEdge category [(p,F)|pFilm] U [(p,D)|pDoc] U [(p,S)|pSeries]

addNode Programme ProgaddNode Category [F,D,S]

Page 48: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Given a transformation from a schema S1 to a schema S2, and a query Q on S1, we use the delete transformation steps to substitute for constructs of S1 which are not in S2 e.g. from the previous slide:

[title p | p Film U Doc] on:

translates into

[title p | p [p | (p,F) category] U

[p | (p,D) category] on:

Query Translation

Page 49: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Functions for Database Integration

In the formal specification of our framework, each schema transformation is a function

t : Database Database

where a database consists of schema+data

We are currently implementing our framework within the Automed project

We are planning to handle query language heterogeneity in Automed by translation into/from a functional intermediate query language

Page 50: 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

29th November 2001

Future Research

Extending Automed to also handle materialised views and view updates, leading to a data warehousing approach to data integration

Data warehousing of genomic data (in collaboration with Profs Thornton, Orengo, Barton, and Drs Keller, Martin, Shepherd)

Moving beyond database integration and database dynamics to data integration on the Web and Web dynamics:

• handling XML data sources within Automed

• developing an ECA rule language for XML