Peer Data Management, Concluded and Model Management Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 18, 2005

Peer Data Management, Concludedand Model Management

Zachary G. IvesUniversity of Pennsylvania

CIS 650 – Database & Information Systems

April 18, 2005

2

Administrivia

Next readings and summaries: Dong and Halevy on Personal Info

Management

2 paragraph summary of the problems they focus on, key contributions

From Piazza to pizza … and scheduling

3

Today’s Trivia Question

4

Our Discussion

The SW as originally posed: RDF as “semantic” format

Also RDFS schema format

Ontologies as the standard way of defining concepts

Description logics are the way most ontologies are defined (OWL language)

Piazza PDMS: Relations and views Query language as mapping language Transitive closure of composition of mappings

5

Peer Data Management: Decentralized Mediation for Ad Hoc Extensibility

DB Projects

UPenn UW Stanford IIT Mumbai

Data integration: 1 mediated schema, m mappings to sources

Peer data management system (PDMS): n mediated “peer schemas,” as few as (n - 1)

mappings between them – evaluated transitively m mappings to sources

6

Example Rule-Goal Tree Expansion

q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w)

SameProject(a1,a2,p) Author(a1,w) Author(a2,w)

ProjMember(a1,p)ProjMember(a2,p) CoAuthor(a1,a2) CoAuthor(a2,a1)

S1(a1,p,_) S1(a2,p,_) S2(a1,a2) S2(a2,a1)

q

r0 r1 r1

r3 r3r2 r2

Q’(a1,a2) :- S1(a1,p,_), S1(a2,p,_), S2(a1,a2) S1(a1,p,_), S1(a2,p,_), S2(a2,a1)

7

RDF vs. XML

RDF explicitly names relationships:(book, title, “ABC”)(book, writtenBy, author)(author, name, “John Smith”)

XML does not always:1. <book>

<title>ABC</title> <writtenBy> <author><name>John Smith</name></author> </writtenBy></book>

2. <book> <title>ABC</title> <author>John Smith</author></book>

title name

book authorwrittenBy

8

RDF vs. XML 2

RDF is subject-neutral (a graph) XML centers around a subject (a tree):

1. <book> <title>ABC</title> <author>John Smith</author></book>

2. <author> <name>John Smith</name> <book>ABC</book></book>

This may result in duplication of contained objects

9

An XML Version of the Semantic Web

Data model: XML + Schema Vast volumes of data already in XML (or exported as XML) CAVEAT: not all relationships are labeled in XML

(“XML has no semantics.”)

Concepts: Views ≈ classes; schemas ≈ ontologies Views define membership via queries; can reason about

containment CAVEAT: less expressive than OWL classes

Schema mappings: target schema as query over sourceSophisticated reasoning about mappings is possible by extending existing data integration techniques Can use mappings in in “forward” and “reverse” directions Allows for “chaining” of mappings to answer queries

10

Piazza with XML (WWW03)

Goals: Build on XQuery and XML (extended with RDF-style identity,

following lead of [Patel-Schneider & Simeon 02]) Remain computationally inexpensive Capture the common mapping types

Directional mapping language based on templates<output> {: $var IN document(“doc”)/path WHERE condition :}

<tag>$var</tag></output>

Translates between parts of data instances Restricted subset of XQuery that’s decidable to reason about Supports special annotations and object fusion

Can map XML-XML, XML-RDF, RDF-XML (at data level)

11

Mapping Example between XML Schemas

Target:pubs

book* title

author*

name

Source:authors

author* full-

name publication*

title pub-type

pub-type name

publication authorwrittenBy

title

12

Example Piazza Mapping

<pubs><book piazza:id={$t}>{: $a IN document(“…”)/authors/author, $an IN $a/full-name, $t IN $a/publication/title, $typ IN $a/publication/pub-type WHERE $typ = “book” PROPERTY $t >= ‘A’ AND $t < ‘B’ :}

<title>{$t}</title>

<author><name>{$an}</name></author></book>

</pubs>

13

Challenges

Query reformulation for XML is significantly harder Hierarchy, 1:n schema constraints, ability to

map from values to tags, … Redundant paths Can only do ~ the XML equivalent of

conjunctive queries

See the WWW03 paper (plus later work by Yu and Popa, Deutsch et al., many others) for details

14

What about Values?

Thus far, we’ve focused on schema mappings

Almost as important in the real world: mappings of values to values Proteins to binding sites SSNs to customer IDs etc.

The Hyperion system (KAM 03) focuses on computing transitive relationships between mappings In many cases, we only have partial transitive mappings Key idea: divide all of the mappings into partitions, each

of which can compute transitive closures separately

15

Assessment: The Semantic Web

The KB world focuses on expressively capturing concepts

The DB world focuses on integrating and restructuring data (but views are less expressive in certain ways)

Do either of these seem likely to change the world?

What barriers need to be removed?

16

From Managing the Web as a Database to Managing Databases of Databases

Many common operations in: Data integration Data interchange Schema design Semantic Web Schema maintenance/evolution

For instance: Creating a mediated schema Defining mappings between schemas Seeing what’s different between schemas

The vision: let’s build a system to manage metadata, not data!

17

Metadata Management

The challenges: There are lots of metadata representations

Different data models; different definition types (e.g., Java classes, XML Schemas, SQL DDL, …)

Many of the problems are unsolvable in the abstract e.g., schema matching But maybe we can customize tools for each task And maybe we can get user input to help

We want to create a clean, composable model of operators Should be “algebraic” in some sense, with nice properties Operators need to be generic but extensible

18

Data vs. Metadata vs. …

Data We know what this is

Metadata (models) Schemas, types, classes, etc.

Metamodels Things like the relational model, O-R model, …

Bernstein focuses on managing models, with customization for each metamodel (and perhaps special domains)

19

Models

A model is a set of objects with identity Objects have at least extended ER-style

traits: attributes/properties is-a, has-a relationships loose associations

All of these are assumed to have types

20

Mappings

A mapping describes a correspondence between parts of two models; it may be annotated with information about computing the transformation

Emp

Emp#

Name

Address

Mapee

1=

2≈

Employee

EmployeeID

FirstName

LastName

Phone

21

The Basic Algebraic Operators

MatchBasically, schema matching: takes two models and

returns a mapping between themElementary vs. complex match; reliance on morphisms

ComposeTakes two mappings and composes them

DiffTakes a model A, a mapping A B, and returns the part

of A that’s not mappedModelGen

Takes model A, creates new model B plus mapping A BMerge

Takes models A, B, mapping between them, returns the union C, plus mappings A C, B C

22

Model Management in Action

23

Schematic of Changes

the new parts in S2 thatneed to be propagated to d2

Dest. w/o deleted itemsfrom s1

the XML version of s2

24

Actual Operations

25

What’s Hard?

Match We saw that LSD is far from perfect, and it’s the best

out there…

Merge Can we make (A merge B) merge C = A merge (B merge

C)? (Buneman, Davidson, Kosky 92)

With Diff, how do we ensure a well-formed model as the result? They return a copy of the model, plus mappings

showing what is actually part of the diff

Composition – it isn’t always closed within the mapping language!

26

More Challenges

What about: Semantics of the meta-model – how do we

handle, e.g., constraints? What to do about approximate

correspondences? Can we actually make these things generic but

expressive enough to be useful?

Do you think this vision is feasible?

Documents

Peer Data Management, Concluded and Model Management Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 18, 2005