View
223
Download
0
Tags:
Embed Size (px)
Citation preview
ORDB ImplementationDiscussion
From RDB to ORDB
Issues to address whenadding OO extensions to DBMS system
Layout of DataDeal with large data types : ADTs/blobs– special-purpose file space for such data, with special access
methodsLarge fields in one tuple :– One single tuple may not even fit on one disk page– Must break into sub-tuples and link via disk pointers
Flexible layout : – constructed types may have flexible sized sets, , e.g., one
attribute can be a set of strings.– Need to provide meta-data inside each type concerning layout of
fields within the tuple– Insertion/deletion will cause problems when contiguous layout of
‘tuples’ is assumed
Layout of Data
More layout design choices (clustering on disk):
– Lay out complex object nested and clustered on disk (if nested and not pointer based)
– Where to store objects that are referenced (shared) by possibly several other and different structures
– Many design options for objects that are in a type hierarchy with inheritance
– Constructed types such as arrays require novel methods, like array chunking into (4x4) subarrays for non-continuous access
Objects/OIDs
OID generation : uniqueness across time and systemObject reference handling : – must avoid dangling references– semantics for object manipulation for shared objects
ADTs
– Type representation: size/storage– Type access : import/export– Type manipulation: special methods to serve as
filter predicates and join predicates– Special-purpose index structures : efficiency
ADTs
Mechanism to add index support along with ADT:– External storage of index file outside DBMS– Provide “access method interface” a la:
• Open(), close(), search(x), retrieve-next()• Plus, statistics on external index
– Or, generic ‘template’ index structure • Generalized Search Tree (GiST) – user-extensible• Concurrency/recovery provided
Query Processing
Query Parsing :– Type checking for methods– Subtyping/Overriding
Query Rewriting:– May translate path expressions into join operators– Deal with collection hierarchies (UNION?)– Indices or extraction out of collection hierarchy
Query Optimization Core
– New algebra operators must be designed :• such as nest, unnest, array-ops, values/objects, etc.
– Query optimizer must integrate them into optimization process :
• New Rewrite rules• New Costing• New Heuristics
Query Optimization Revisited
– Existing algebra operators revisited : SELECT– Where clause expressions can be expensive– So SELECT pushdown may be bad heuristic
Selection Condition RewritingEXAMPLE:(tuple.attribute < 50) – Only CPU time (on the fly)
(tuple.location OVERLAPS lake-object)– Possibly complex CPU-heavy computations – May Involve both IO and CPU costs
State-of-art: – consider reduction factor only
Now, we must consider both factors:– Cost factor : dramatic variations – Reduction factor: unrelated to cost factor
Operator Ordering
op1
op2
Ordering of SELECT Operators
– Cost factor : dramatic variations – Reduction factor: orthogonal to cost factor– We want: maximal reduction and minimal cost– Rank ( operator ) = (reduction) * ( 1/cost ) – Order operators by increasing ‘rank’
– High rank (good) -> low in cost, and large reduction– Low rank (bad) -> high in cost, and small reduction
Access Methods ( on what ?)
Indexes that are ADT specificIndexes on navigation pathIndexes on methods, not just on columnsIndexes over collection hierarchies (trade-offs)Indexes for new WHERE clause expressions not just =, <, > ; but also “overlaps”
Registering New Index (to Optimizer)What WHERE conditions it supportsEstimated cost for “matching tuple”– Given by index designer (user?)– Monitor statistics; even construct test plans
Estimation of reduction factors/join factors:Register auxiliary function to estimate factorProvide simple defaultsEstimation of method costs (~IO/CPU)
MethodsDynamic linking of methods (outside DB)Overwriting methods for type hierarchyUse of “methods” with implied semanticsIncorporation of methods into query process : termination? “untrusted” methods : methods corrupt server or modify DB content (side effects)Handling of “untrusted” methods :– restrict language; interpret vs compile, separate address space
as DB server
Query Optimization with MethodsEstimation of “costs” of method predicatesOptimization of Method execution:– Similar idea as handling correlated nested subqueries; must
recognize repetition and rewrite physical plan.– Provide some level of precomputation and reuse
Optimization of Method execution:– 1. If called on same input, cache that one result– 2. If on full column, presort column first (groupby)– 3. Or, precompute results of methods for each possible value in
domain; and put in hash-table : fct (val ); Look up in hash-table during query processing or even join with it,
instead of recomputing : val fct (val)
Query ProcessingUser-defined aggregate functions:– E.g., “second largest” or “second yellowest”
Distributive aggregates: incremental computation Provide:– Initialize(): set up state space– Iterate(): per tuple update the state– Terminate(): compute final result based on state; and cleanup state
For example : “second largest” – Initialize(): 2 fields– Iterate(): per tuple compare numbers– Terminate(): remove 2 fields
Following Disk Pointers?
Complex object structures with object pointers may exist (~ disk pointers)Navigate complex object into memory for a long-running transaction like in CAD designWhat to do about “pointers” between subobjects or related objects ?– Swizzle = replace OIDs dereferences by in-memory pointers, and
unswizzle back at end.– Issues : In-memory table of OIDs and their state; indicate in each
object pointer via a bit.– Different policies for swizzling: on access, attached to object brought
in, etc.
Models of PersistenceDifferent models of persistence for OODB implementations:Parallel type systems: – E.g., int and dbint– User must make decision at object creation time– Allow for user control by “casting” types
Persistence by container management:– Objects must be placed into “persistent containers” such as relations in order
to stay around– Eg., Insert o into Collection MyBooks;– Could be rather dynamic control without casting
Persistence by reachability :– Use global variable names to objects and structures– Objects being referenced by other objects that are reachable by application,
they by transitivity are also persistent.– need garbage collection
Summary
A lot of work to get there: From physical database design/layout issues up
to logical query optimizer extensions
ORDB: reuses existing implementation base and incrementally adds new features on (but relation is first-class citizen)