Stage1Raj.ppt

04/11/23 An open source DBMS for handheld devices

1

An open source DBMS for An open source DBMS for handheld deviceshandheld devices

by Rajkumar Sen

04/11/23An open source DBMS for

handheld devices

2

OutlineOutline

• Introduction and Motivation• System Architecture• Storage Model and Index• Query Processing• Data Synchronization• Transaction Management• Summary and Proposed Future Work


handheld devices

3

Introduction and MotivationIntroduction and Motivation

What is a handheld deviceA small computer with limited resourcese.g. Simputer, Palm devices, iPAQ etc.

The Simputer One of the most powerful handhelds Low cost and shareable Developed at IISc Intel StrongArm processor Flash Memory for stable storage (24MB) Limited main memory (32MB) Recognizes Smartcards Linux operating system


handheld devices

4


Why DBMS for a Simputer– Increasing number of applications e.g. Microbanking, E-governance, Agricultural market pricing

etc.– They deal with a fair amount of data– Complex queries involving joins and aggregates– Atomicity and Durability for data consistency – Ease of application development

Need for Synchronization– Data from remote server downloaded on the Simputer– Updates at both places– Common data needs to be synchronized

Open source since Simputer is designed to be low-cost


handheld devices

5


No open source DBMS designed for handheld devices Oracle Lite and DB2 Everyplace are not open source MySQL and BerkeleyDB primarily designed for disk-based

systems

Data in Flash Memory Access time less than disk Random access as good as sequential access

System Constraints Limited storage capacity Less main memory Slow writes in Flash Memory


handheld devices

6


Goal is to design a system that consists of

A lightweight relational DBMS that resides on the Simputer An application that downloads data from a remote server A synchronization tool

Database Module Toolkit

Developing the building blocks as modules Modules can be plugged depending on the type of application


handheld devices

7

System ArchitectureSystem Architecture

Key components of the system Data and Metadata Manager

Representation of relational data and data structures for metadata Main Memory Manager

Manages data to be kept in memory Transaction Manager

Ensures data consistency Access Right Manager

Security rules needed as Simputer is shareable Query Engine

SQL compiler and query processing FetchRelation Module

Fetches data from a remote server Synchronization Module

Synchronizes common data in Simputer and remote server


handheld devices

8

System ArchitectureSystem Architecture

Figure: System Architecture


handheld devices

9

Storage Model and IndexingStorage Model and Indexing Aim at compactness in representation of data as well as index

Existing models – Flat Storage

Tuples are stored sequentially. Ensures access locality but consumesspace. Access locality not an issue since data in flash memory

– Pointer-based Domain Storage• Eliminates duplicates • Values partitioned into domains which are sets of unique values• Tuples reference the attribute value by means of pointers• One domain shared among multiple attributes• No index

– Pointer-based Ring Storage (Bobineau et al)• Modification to Domain Storage• Uses domain structure as index• Tuple-to-tuple pointers connect two tuples that have same attribute

value• Index structure in the form of a ring


handheld devices

10

Storage Model and IndexingStorage Model and Indexing

Ring Storage Model contd.Advantages– Addresses data and index compactness– Join becomes easy

Drawbacks– For projecting value of an attribute, need to traverse half of the ring

Some other Models– DBGraph (Pucheral et al) Maintains value-to-tuple as well as tuple-to-value pointers.

Too many pointers– Domain Tree (Missikov et al) Maintains Domain trees, for simple projection need to scan

all the domain trees.– Another model (Krithi et al ) suggested for data warehouses uses projection index and join index


handheld devices

11


Our Model– Based on Domain Storage Model– Each domain element is a (length, value) tuple– Domains used for duplicates and variable width attributes– When primary key-foreign key relation, instead of storing value in child

table we store pointer to the tuple in the parent table which contains the value

Figure: Our storage model


handheld devices

12


Index structure– Sorted array of Tuple Identifier List (pointers to tuples)– Each List corresponds to a value, consists of pointers to those tuples that share it – Ordering preserved in the index structure, hence efficient execution of range

queries

Figure: Our index model


handheld devices

13

Storage Model and IndexingStorage Model and Indexing Our model eliminates the problems by storing some additional pointers

– Data storage separate from index structure, hence projection is easy– Index created on attribute rather than domains– For primary key-foreign key relation, join requires scanning of only child table– For domain based join attributes, join indices can be built

Figure: Join index


handheld devices

14

Query ProcessingQuery ProcessingEntire data is memory resident

– Aim to reduce number of tuples read from memory

Join without indices

Assuming R and S are the outer and inner relations with n and m tuples – Nested loop join

Most reasonable choice when very less memory available. No additional structure Cost of join is mn

– Hash Join Requires the creation of hash partitions in main memory.

Memory permitted, hash join is faster Instead of storing values in the partitions, we store pointers to tuples Cost = (m + n ) //for building hash partitions + (m + n) //for reading the hash partitions

(m + n) writes to memory and (m + n) pointer chases


handheld devices

15

Query ProcessingQuery Processing Sort-Merge Join Need memory for sorting While sorting, do not store the tuples but only pointers to the tuples

Cost = ( m log m + n log n) //for sorting + ( m + n) // for reading to merge

(m + n) writes to memory and (m + n) pointer chases

Join with Indices

– Indexed Nested loop join Relation on which index is available is treated as the inner relation Cost = n //for reading the outer relation R + n log m //for each tuple of R, index into S


handheld devices

16

Query ProcessingQuery Processing– Sort-Merge Join If indices available on both join attributes, sorting not needed

Cost = m + n //for merging Some cost involved in traversing the index and chasing pointers

Join Indices A Single traversal of the index and chasing pointers gives the tuples

which are candidates for joining

Other operations Selection, Projection and Sorting are straightforward.

Duplicate elimination, aggregation ,union, intersection, and difference

can be computed by sorting and building hash indices on relations


handheld devices

17

Query ProcessingQuery ProcessingQuery Plan Generation and Memory Allocation

– An optimal query execution plan is needed– Reduce materialization, left-deep tree and bushy trees are ruled out– Choose from right-deep and extreme right-deep tree depending on memory available for storing intermediate results– Query optimizer has to be memory cognizant

Memory must be optimally allocated among all operators since we cannot assume entire memory is available for each operatorArvind et. al. suggests memory allocation to operators based on cost functions

Aggregation, sorting and duplicate removal generally performed on materialized results. By enforcing a particular tuple arrival order at the leaf of the plan tree,pipelining of aggregation etc. is possible


handheld devices

18

SynchronizationSynchronizationMain data resides on a remote server

– Information in the form of relations in a remote DBMS– Relevant information is downloaded on the Simputer– Copies of data present on both Simputer and remote server– Need for synchronization

Two scenarios in data synchronization

1. Relation downloaded on the Simputer– Download the entire relation R only once– Subsequently, transfer only the differentials

Rsimpnew = Rsimpold U ∆Rmain

Rmainnew = Rmainold U ∆Rsimp

Problem arises when the same tuple is updated at both places.

Tuple Conflict Detection harder than resolution


handheld devices

19

SynchronizationSynchronization 2.Resultset downloaded on the Simputer

– Execute the query at the remote server and transfer only once– Subsequently transfer only the differentials– q= (R S) qnew = qold U (ir S)

qnew = qold − (dr S)

For each query, find out what differential to compute.• Resembles the Incremental View Maintenance Problem

In-place updates on the resultset in the Simputer means changes have to be propagated to the remote database

• Resembles the View Update Problem

Updates to downloaded relations and resultsets recorded in Intention List and Intention View List respectively. Determine whether the changes can be reflected


handheld devices

20

SynchronizationSynchronizationView Update Considerations

– Only legal view updates– Only legal database updates– Only underlying tuples affected– Insertion of extra tuples– Modification of other view tuples

Keller suggests translations for inserts, deletes, and modifications - Restricts the relations to BCNF - Joins should have an inclusion dependency - Our remote server relations need not satisfy

Langerak suggests translations based on extension joins - No restriction on normal forms and joins - Can be the basis of our synchronization engine

Most of the rules need data and metadata about the base relations - Remote database connection needed to validate updates


handheld devices

21

Transaction ManagementTransaction ManagementACID properties need to be maintained

– Global atomicity - data updates in both Simputer and smartcard.

- Simputer participates in distributed transactions– Simputer a single user system, why Concurrency Control - Transactions on behalf of data synchronization - Long transaction doing some aggregation

Concurrency Control

– Most of the transactions expected to complete fast– Lock contention reduces, hence large locking granules– Korth et al proposes a serial protocol . Main aim is to improve response time rather than throughput – Instead of hash tables, bits can be used for lock information


handheld devices

22

Transaction ManagementTransaction ManagementCommit processing

Shadow updates Data locality not an issue since data in memory Not suitable for pointer-based storage models If one domain value changes, its location also changes and all tuple pointers need to be updated Size of the shadow is also a concern

Log-based updates Better suited for pointer-based models Cost of maintaining the logs Pointer-based logging instead of value-based logging

Limited memory, hence Steal buffer replacement policy. Dirty blocks output to smartcard stable storage when connected If all dirty blocks of a transaction output to smartcard , no Undo

More detailed survey left as future work


handheld devices

23

Summary and Proposed Future WorkSummary and Proposed Future WorkDiscussion and solutions proposed can be generalized for anyhandheld device.However, device specific optimizations possible

Future Work

– More detailed survey of concurrency control and recovery – Protocols for validation of updates on downloaded data – Query engine needs to be designed– Access Rights Management issues need to be addressed– Database module toolkit– Implement the system on the Simputer – Evaluate the system

04/11/23 An open source DBMS for handheld devices

24

Thank YouThank You


handheld devices

25

ReferencesReferences1. A. Ammann, M. Hanrahan, and R. Krishnamurthy. Design of a

Memory Resident DBMS. In IEEE COMPCON, 1985.

2. C. Bobineau, L. Bouganim, P. Pucheral, and P. Valduriez. PicoDBMS: Scaling down Database Techniques for the Smartcard. In VLDB, 2000.

3. Stephen Blott and Henry F. Korth. An Almost Serial Protocol for Transaction Execution in Main Memory Database Systems. In VLDB, 2002.

4. DB2 Everyplace. http://www.ibm.com/software/data/db2/everyplace.

5. Anindya Datta, Debra VanderMeer, Krithi Ramamritham, and Bongki Moon. Applying Parallel Processing Techniques in Data Warehousing and OLAP. In VLDB, 1999.

6. A. Hulgeri, S. Sudarshan, and S. Seshadri. Memory Cognizant Query Optimization. In Advances In Data Management, 2000.


handheld devices

26

ReferencesReferences7. Arthur M. Keller. Algorithms for Translating View Updates to Database Updates for Views Involving Selections, Projections and Joins. In ACM PODS, 1985.

8. Rom Langerak. View Updates in Relational Databases with an Independent Scheme. In ACM PODS, 1990.

9. T. Lehmann and M. Carey. A Study of Index Structures for Main Memory DBMS. In VLDB, 1986.

10. M. Missikov and M. Scholl. Relational Queries in a Domain Based DBMS. In ACM SIGMOD, 1983.

11. Mysql. http://www.mysql.com.

12. P. Pucheral, P. Valduriez, and J.M.Thevenin. EÆcient Main Memory Data Management using the DBGraph Storage Model. In VLDB, 1990.13. The Simputer. http://www.simputer.org.


handheld devices

27


Aim at compactness in representation of data as well as index

Existing models – Flat Storage

Tuples are stored sequentially. Ensures access locality but consumesspace. Access locality not an issue since data in flash memory

– Pointer-based Domain Storage• Eliminates duplicates • Values partitioned into domains which are sets of unique value• Tuples reference the attribute value by means of pointers• One domain shared among multiple attributes• No index

Figure: Domain Storage


handheld devices

28

Storage Model and IndexingStorage Model and IndexingPointer-based Ring Storage (Bobineau et al)

– Modification to Domain Storage– Uses domain structure as index– Tuple-to-tuple pointers connect two tuples that have same

attribute value– Index structure in the form of a ring

Figure: Ring Storage

Documents

Stage1Raj.ppt