Upload
myron
View
35
Download
0
Tags:
Embed Size (px)
DESCRIPTION
An Efficient Multi-Dimensional Index for Cloud Data Management. Xiangyu Zhang Jing Ai Zhongyuan Wang Jiaheng Lu Xiaofeng Meng School of Information Renmin University of China. Outline . Motivation Query Answering on the Cloud Related Work - PowerPoint PPT Presentation
Citation preview
An Efficient Multi-Dimensional Index for Cloud DataManagement
Xiangyu Zhang Jing Ai Zhongyuan WangJiaheng Lu Xiaofeng Meng
School of InformationRenmin University of China
Outline Motivation Query Answering on the CloudRelated WorkEMINC: Index the Cloud Efficiently
◦Node Bounding◦Extended Node Bounding◦Cost Estimation based Index Update
EvaluationConclusion & Future Work
Outline Motivation Query Answering on the CloudRelated WorkEMINC: Index the Cloud Efficiently
◦Node Bounding◦Extended Node Bounding◦Cost Estimation based Index Update
EvaluationConclusion & Future Work
Motivation Cloud systems have been
justified as brilliant for web search applications◦Simple structure, mostly key-value
pairs◦Flexible, efficient for analytic work
However, they are insufficient for complex data management needs◦No powerful language as SQL◦Hard to process complex queries◦Lack of efficient index structures
Distributed Cloud base?• BigTable
• HBase
How to query on other attributes besides primary key?
Motivation As part of our Cloud-based DBMS
project, we aim to build efficient index structure on the Cloud.
Outline Motivation Query Answering on the CloudRelated WorkEMINC: Index the Cloud Efficiently
◦Node Bounding◦Extended Node Bounding◦Cost Estimation based Index Update
EvaluationConclusion & Future Work
Query Answering in the Cloud
Fast locating of relevant slave
nodes
Efficient lookup on each slave nodes
Outline Motivation Query Answering on the CloudRelated WorkEMINC: Index the Cloud Efficiently
◦Node Bounding◦Extended Node Bounding◦Cost Estimation based Index Update
EvaluationConclusion & Future Work
Related WorkS. Wu and K.-L. Wu, “An indexing framework for
efficient retrieval on the cloud,” IEEE Data Eng. Bull., vol. 32, pp.75–82, 2009.
H. chih Yang and D. S. Parker, “Traverse: Simplified indexing on large map-reduce-merge clusters,” in Proceedings of DASFAA 2009, Brisbane, Australia, April 2009, pp. 308–322.
M. K. Aguilera, W. Golab, and M. A. Shah, “A practical scalable distributed b-tree,” in Proceedings of VLDB’08, Auckland, New Zealand, August 2008, pp. 598–609.
Distributed DatabaseData slicing in DDBS
◦Horizontal, vertical, etc.◦Slice based on conditions◦Check condition conflict on query
processingData distribution on the Cloud is
different and could be very complex if expressed as set of conditions
Condition check is too expensive
Outline Motivation Query Answering on the CloudRelated WorkEMINC: Index the Cloud Efficiently
◦Node Bounding◦Extended Node Bounding◦Cost Estimation based Index Update
EvaluationConclusion & Future Work
Outline Motivation Query Answering on the CloudRelated WorkEMINC: Index the Cloud Efficiently
◦Node Bounding◦Extended Node Bounding◦Cost Estimation based Index Update
EvaluationConclusion & Future Work
EMINC: Node BoundingNode cube of a table on a slave
node◦Value range of table on this node
Id A B1 1 12 2 23 3 44 6 75 5 10
Node Cube: (1,1), (6,10)
EMINC: ArchitectureEach leaf node corresponds to one node cube
Use KD-Tree to maintain local index on slave nodes
EMINC: Query ProcessingGet query cube of the query and
look up in the R-Tree to get relevant data nodes◦1<x<2, 3<y<4 => Query Cube:
(1,3),(2,4)
YES NoNode Cube
Query Cube
Node Cube
Query Cube
Outline Motivation Query Answering on the CloudRelated WorkEMINC: Index the Cloud Efficiently
◦Node Bounding◦Extended Node Bounding◦Cost Estimation based Index Update
EvaluationConclusion & Future Work
EMINC: Extended Node BoundingProblem with single bounding
◦Bad performance for sparse nodeMany queries will be mislead to this node
EMINC: Cube Cutting
Single Node Cube with Low Accuracy
Multiple Node Cube with High Accuracy
EMINC: Cube Methods
Random cutting Equal cutting Clustering-based cutting
Outline Motivation Query Answering on the CloudRelated WorkEMINC: Index the Cloud Efficiently
◦Node Bounding◦Extended Node Bounding◦Cost Estimation based Index Update
EvaluationConclusion & Future Work
EMINC: Index Update StrategyIndex update issues:
◦Cubes may invalidate themselves after certain data update, thus need reconstruction
Insertion invalidates cube◦Create a node cube containing new
dataFor regular maintenance of index
◦Cost estimation based update strategy
EMINC: Cost Estimation StrategyCost of index update:
◦Recalculate cubes on local node◦Transfer to master node and
maintain R-Tree◦Query performance will be affected
Benefit of index update:◦More accurate query directing, less
waste
EMINC: Two Phase MethodAfter one update:
1. Wait for a time period of deltaT2. deltaT expires, check if an update
is neededDetermin deltaTCheck for updateAssumption :
Number of queries to be processed
Total size of node cubes of this node
EMINC: Phase OneAfter pervious update:benefit =
decrement-of-query/time* deltaT◦We enjoy the benefit of pervious
update for deltaT time periodcost = number-of-queries missed
◦Number of queries we could process if we use pervious update time to answer queries
EMINC: Phase Twobenefit > cost => deltaTAfter deltaT expires, check if an
update is needed. This check involves following:◦Record update frequency◦Expected benefit ratio◦Performance requirement
We leave this as future work
Outline Motivation Query Answering on the CloudRelated WorkEMINC: Index the Cloud Efficiently
◦Node Bounding◦Extended Node Bounding◦Cost Estimation based Index Update
EvaluationConclusion & Future Work
Evaluation 6 machines
◦1 as master node◦5 slave nodes simulating 100~1000
nodesEach machine had a 2.33GHz
Intel Core2 Quad CPU, 4GB of main memory, and a 320G disk.
Machines ran Ubuntu 9.04 Server OS.
Evaluation: Point Query
Evaluation: Range Query
Outline Motivation Query Answering on the CloudRelated WorkEMINC: Index the Cloud Efficiently
◦Node Bounding◦Extended Node Bounding◦Cost Estimation based Index Update
EvaluationConclusion & Future Work
ConclusionIn this paper we presented a
series of approaches on building efficient multi-dimensional index on Cloud platform.
We developed the node bounding technique to reduce query processing cost on the cloud platform.
In order to maintain efficiency of the index, we proposed a cost estimation-based approach for index update.
Future WorkComplete cost estimation modelTake replication of data into
considerationImplement in Hbase to further
verify performance
ThanksPlease visit our lab for more information:http://idke.ruc.edu.cn/