20
Page 1 Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008 Survey of Graph Database Models

Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008

  • Upload
    dinesh

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

Survey of Graph Database Models. Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008. Introduction. Graph Data Model? Data and/or the schema are represented by Graphs, or by data structures generalizing the notion of graph - PowerPoint PPT Presentation

Citation preview

Page 1: Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008

Page 1

Renzo Angles and Claudio GutierrezUniversity of Chile

ACM Computing Surveys, 2008

Survey of Graph Database Models

Page 2: Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008

Page 2

Introduction

• Graph Data Model?– Data and/or the schema are represented by Graphs, or by data

structures generalizing the notion of graph– Data Manipulation is expressed by graph-oriented operation

• Graph DB-Model?– A model in which the data structures for the schema and/or in-

stances are modeled as a directed, possibly labeled, graph, or generalizations of the graph data structure, where data manipu-lation is expressed by graph-oriented operations and type con-structors, and appropriate integrity constraints can be defined over the graph structure

Page 3: Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008

Page 3

Why a Graph Data Model?

• Natural modeling of data– Able to keep all the information about an entity in a single node

and showing related information by arcs connected to it– Visible to the user and allows a natural way of handling applica-

tions data

• Queries can refer directly to this graph structure– Allow users to express a query at a high level of abstraction– A data model where the operations over data are graph transformations

Comparison with other Database models

Page 4: Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008

Page 4

Motivations and Applications

• Graph DB are motivated by real-life applications where component interconnectivity is a key fea-ture– Classical App

• ‘See’ Data connectivity• Managing Transportation Network• Graphical and Visual interfaces• On-line hypertext

– Complex Networks• Social Networks• Information Networks• Biological Networks

Page 5: Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008

Page 5

Data structures

• The representation of entities and relations is fundamental to graph DB-models

• Graph DB-model is a framework for the presentation of con-nectivity among entities– Directed/Undirected graphs, Labeled/Unlabeled edges and nodes, Hy-

pergraphs

• Representation of Entities : Schema and Instance– Schema graph defines entity types(nodes labeled with type name) and

relation(edges labeled with relation names)– Instance graph contains entities (nodes labeled entity type or identifier)

and relation(labeled edge according to schema)– Tuple and sets (PaMal, GDM) and n-ary relations (GOAL, GDM)

Page 6: Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008

Page 6

Data structures (cont’d)

• Representation of Relations– Attributes

• Labeled edges directly related to nodes• In case of GROOVY, attributes are <node,edge,node> triples inside

hypernodes

– Entities• Most models do not support this feature because relations are rep-

resented as simple labeled edges

– Standard Abstraction• Is-part-of, is-composed by, n-ary relation

– Derivation• ISA, is-of-type

– Nested• This feature is naturally supported by using hypergraph structures

Page 7: Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008

Page 7

Integrity Constraints

• Schema-Instance Consistency– Entity Type checking

• The instance should contain only entities and relations from entity types and relations that were defined in the schema

• An entity in the instance may only have those relations or properties defined for its entity type

– Type checking constitute

• Object Identity and Referential Integrity– Set-based data models such as the relational model are value-based– Object Identity

• Every node has its own identifier

– Referential Integrity• ‘Only existing entities be referenced’

Page 8: Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008

Page 8

Query and Manipulation Languages

• A query language is a collection of operators or inferrencing rules• Existing Query Language

– G• Based on regular expressions• Graphical query: set of labeled directed multigraphs• Nodes are variables or constants• Edges can be labeled with regular expressions

– G+• Extension of G• Graphical query• Graph query + summary graph

– GraphLog (G-log)• Extension of G+• Adds negation• Graph pattern = graph query + edge query + summary graph• Includes transitive closure operator

Page 9: Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008

Page 9

GraphLog example

• Query A asks for the names of Mary’s grandparents (fixed path query)• Query B asks for the name of the maternal grandmother of Mary (tree-like

query)• Query C calculates Mary’s Ancestors (transitive closure)

Page 10: Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008

Page 10

A Genealogy Diagram – an example

Page 11: Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008

Page 11

LDM (Logical Data Model)

• The schema uses two basic type nodes for representing data val-ues (N and L), and two nodes (NL and PP) to establish relations among data values in a relational style

• The instance is a collection of tables, one for each node of the schema.

Page 12: Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008

Page 12

Hypernode Model

• The schema defines a person as a complex object with the prop-erties name and lastname of type string, and parent of type per-son

• The instance shows the relations in the genealogy among differ-ent instances of person

Page 13: Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008

Page 13

GROOVY

• At the schema level, we model an object PERSON as a hyper-graph that relates the attributes NAME, LASTNAME and PARENTS

• Value functional dependency NAME,LASTNAME → PARENTS logi-cally represented by the directed hyperedge ({NAME, LAST-NAME} {PARENTS})

Page 14: Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008

Page 14

Sematic-XT

• This model does not define an schema• In the first level, the graph contains the relations Name and Last-

name to identify people (P1, . . . , P6)• In the second level we use the abstraction of Person, to compress

the attributes Name and Lastname and represent only the rela-tion Parent between people

Page 15: Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008

Page 15

GGL

• Schema and instances are mixed• Packaged graph nodes (Person1, Person2, . . . ) are used to en-

capsulate information about the graph defining a Person• Relations among these packages are established using edges la-

beled with parent

Page 16: Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008

Page 16

PaMaL

• Schema: basic type (string), class (Person), tuple (X), set (*) nodes for the schema level

• Atomic (George, Ana, etc.), instance (P1, P2, etc), tuple and set nodes for the instance level

• Note the use of edges ∈ to indicate elements in a set, and the edge typ to indicate the type of class Person (these edges are changed to val in the instance level).

Page 17: Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008

Page 17

GRAM

• At the schema level, we use generalized names for definition of entities and relations

• At the instance level, we create instance labels (e.g. PERSON 1) to represent entities, and use the edges (defined in the schema) to express relations between data and entities

Page 18: Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008

Page 18

Object Exchange Model (OEM)

• Schema and instance are mixed• The data is modeled beginning in a root node &pp, with children

person nodes, each of them identified by an Object-ID (e.g. &p2)• These nodes have children that contain data (name and last-

name) or references to other nodes (parent)

Page 19: Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008

Page 19

RDF

Page 20: Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008

Page 20