Upload
gerard-mcdowell
View
215
Download
0
Embed Size (px)
Citation preview
bdbms: A Database System for Scientific Data Management Mohamed Y. Eltabakh, Mourad Ouzzani , Walid G. Aref , Ahmed Elmagarmid,
Yasin Silva, Umer Arshad, David Salt, Ivan BaxterPurdue University,
Department of Computer Science, Cyber Center, Department of Horticulture and Landscape Architecture
Annotation Management• Annotations at multiple granularities (tuple vs. column, cell)• Annotating data and operations• Provenance (lineage) is handled as a special type of annotations
Attach articles about this entry(Tuple level)
This column is computed using a prediction tool(Column level)
Experimentally verified(Cell level)
S1copy
S2copy
Local insert operation
P1
updateS3
overwrite
Q1: Where do these values come from?
Q2: What is the source of this value at time T?
Annotations Provenance (lineage)Data copied from
Database D1
(Table level)
Adding Annotations at various Granularities
Storage Optimization Techniques
Archiving/Restoring Annotations
Propagating/Filtering Annotations
ADD NNOTATION[AS VIEW]TO <annotation_table_names> VALUE <annotation_body>
[ON UPADTE PROPAGTE][ON AGGREGATION PROPAGATE]ON <SELECT_statement>
ARCHIVE NNOTATION
FROM <annotation_table_names> WHERE <conditions>
ON <SELECT_statement>
CREATE ANNOTATION TABLE <annotation_table_names> ON <user_table_name>
SELECT [DISTINCT] Ci [PROMOTE (Cj, Ck, …)], …
FROM Relation_name [ANNOTATION (S1, S2, …)], …
[WHERE <data_annotation_conditions>]
[GROUP BY <data_columns>
[HAVING <data_annotation_condition>]
Gene Gene_lab
Gene_provenance
Relation with annotation tables
Gene_public
Columns
Tuples
Time
(B1, T1)
(B2, T2)
(B3, T3)
(B4, T4)
(B5, T5)
Compression: Annotation tables store annotations in a compressed form Indexing: Building spatial index structures on annotations for efficient retrieval Categorization: Annotation tables allow categorization of annotations
Archived annotations are not propagated along with query results
ANNOTATION: qualifier to specify the propagated annotations PROMOTE: Carries the annotations from un-projected attributes Columns
Tuples
Time
(A1, T1)
(A2, T2)
(A4, T4)
X
Marked as archived
(A3, T3)
ADD ANNOTATION Query Processing Execute the SELECT statement Identify the output rows and columns Map the rows and columns to an ordered domain
Which mapping is more efficient? Storage_Order Mapping Correlated_Columns Mapping Correlated_Rows Mapping
Map the target table cells to be annotated to rectangles
Snapshot versus View Annotations Snapshot Annotations: command is evaluated once and the annotation is attached to the current query results View Annotations: command is evaluated on the current database snapshot and continuously applied over new tuples
Eager Approach: apply the annotation command at the insertion time
Lazy Approach: apply the annotation command at the query time
Q
A
Q
A
Q
A
Q
A
QA
QA
(1a) (2a) (3a)
(1b) (2b) (3b)
1
2 3
4
1
2
3
4
1
2
3
1
2
3
1
2
12
t1t2t3
t1t2t3
Tuples
Row-oriented division
Column-oriented division
Archiving Annotations
SELECT statement Query Processing Identify cells on which annotations are archived Map the cells to rectangles
Representation of Archived Annotations A single annotation rectangle may be divided
into smaller ones How to divide an annotation rectangle?
Non-traditional and Novel Access Methods• Efficient indexing structures• New operators to support complex search operations• Efficient query processing
Indexing compressed sequences
Data compression techniques
Biological sequences are very large
Compressed sequences
New index structures for compressed sequences
Indexing Compressed Sequences(SBC-Tree)
9 1 12 22 16
PT
41 20 35 5 29
PT
24 39 18 33 3
PT
16 29 3 37
PT
5 10 20 25 40 50 60 100 120 124 150 160 200 220 225
5 10 20 25 40 50 60 100 120 124 150 160 200 220 225 245 250 260 280 300Tag
NULL
G2
B4
A5
A4
B5
E3
S1
27 14 31 7 37
PT
245 250 260 280 300
assigned tags
B7
B6
min_tag1 max_tag1
Q1 (245, B0)
(160, A2)
min_tag2 max_tag2
Q3(20, NULL) (100, NULL)
Q2(160, NULL) (245, NULL)
Pre
ced
ing
RL
E-c
ha
ract
er
Compression techniques gain significant importance:
Significant storage reduction Reducing buffer requirements Reducing number of I/Os
Enhance the overall system performance
Spatial Data Indexing(SP-GiST Framework)
PostgreSQL Function Manager
PostgreSQL Engine
PostgreSQL Storage Manager
Sto
rage
inte
rfac
e SP-Gist Internal Methods
SP-Gist kd-tree
SP-Gist trie
s
e
p
rc
e
a
d
t
a
star
space spade
Trie variants Quadtree variants
Implementing non-traditional indexes involves significant overhead Functionalities (Insertion, deletion, searching), Storage management,
integration, Recovery and concurrency control
Extensible indexing frameworks Software engineering solution, One-time core development , Many times low-
cost instantiation of a variety of index structures