Lab of Web And Mobile Data Management （ WAMDM ） Youzhong MA

Index for Cloud Data Management

Lab of Web And Mobile Data Management（WAMDM）Youzhong MA

Outline

Motivating Applications Existing Technologies Conclusions & Future work

rowkey name Price number1 beer 3.00$ 1000

2 beer 7.00$ 2500

3 milk 2.00$ 1300

4 mlik 4.5$ 2100

Motivating Application

Cloud System

select sum(number) from Productwhere product.name = ‘beer’ and product.price <=10$ and product.price >=5$

Big Data in a Private Cloud

Table： Product

Queries with multi-attributes and non-rowkeyare quite common !

Page 4

Motivating Application: Mobile Coupon Distribution

Coupon

CurrentLocation Current

Location CurrentLocation

Distribution Policy• Area• # of coupons

Mobile CouponDistributer

Page 5

Motivating Application: Mobile Coupon Distribution

CurrentLocation

CurrentLocation

CurrentLocation


Location

CurrentLocation

CurrentLocation

CurrentLocation

CurrentLocation


Location

Distribution Policy• Area• # of coupons

CouponCouponCoupon

Large amounts of DataHigh Throughput

System ScalabilityMulti-Dimensional QueryNearest Neighbors Query

Efficient Complex Queries

125,000,000 subscribersin Japan

Outline


Existing TechnologiesMulti-

dimensional Queries

Scalability

Relational DBsSpatial DBs

Commercial products

but expensive

Open source products

Key-Value Stores

What We Want

at a reasonable price

Solutions-overview

Rowkey Non-rowkey

Single Dimensional

Index

[BigTable、 HBase]

[Point Query、 Range Query]

[Aguilera PVLDB’08][S.Wu Data Eng’09][S. Wu PVLDB’10]

Multiple Dimensional Index

[X.Zhang CloudDB’09][J.Wang SIGMOD’10][G.Chen VLDB’11][Y. Zou NPC’10][Shoji Nishimura MDM’11]

Local Index +

Global Index

NECCAS

Efficient B-tree Based Indexing for Cloud Data Processing

S. Wu, D. Jiang, B. C. Ooi, and K.-L. Wu. PVLDB'10

Efficient B-tree Based Indexing for Cloud Data Processing

Motivation Designing a scalable and high-throughput

indexing scheme to support efficient query for huge volumes of data in cloud

Low maintenance cost but also support parallel search

System Architecture

① Local Index

② BATONoverlay network

③ publish

Challenges How to select the local B+-tree nodes to publish in Global index? How to organize the global index? How to maximize the throughput?

Selecting local B+-tree nodes Cost modeling

Query cost1. routing cost：2. local search cost：

Update cost

： cost of sending an index message： cost of random I/O

1： Search in global index

2： Search in local index

21 log *2

N

( )*h n

21( )* log *2

g n N

Adaptive indexing strategy

Index expand Index collapse

Local Index

BATON： Balanced Tree Overlay Network

A distributed tree structure for P2P systems Supporting range search

Index Construction Assign a range to each node For each node n

The range of its left sub-tree is less than that of nThe range of its right sub-tree is larger than that of n

Publish local B+-tree node to BATON

Maximizing the throughput Eventual consistent model Lazy update

if the update does not affect the key range of a local B+-tree, the stale index will not affect the correctness of the query processing.

Eager update updates in the Left-most and right-most nodes

Pros and cons Pros

Supporting efficient point query and range query for non-rowkey

Proposed an adaptive indexing strategy based on the cost model of overlay routings

Cons Can not support multi-dimensional query

Multi-dimensional index

[X.Zhang CloudDB’09]

Multi-dimensional index

[J.Wang SIGMOD’10]

[G.Chen VLDB’11]

MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location

Aware Services Shoji Nishimura, Sudipto Das. MDM'11

Contributions Using linearization to implement a scalable multi-dimensional index structure layered over a range-partitioned Key-value store Implementing a K-d tree and a Quad tree by the design

Ordered Key-Value Stores

key00

key11

keynn

key00

key01

key0X

value00

value01

value0X

key11

key12

key1Y

value11

value12

value1Y

keynn valuenn

Index

BucketsSorted by key

Good at 1-D Range Query

LongitudeTime

Latit

ude

But, our target is multi-dimensional…

Naïve Solution: Linearlization

key00

key11

keynn

key00

key01

key0X

value00

value01

value0X

key11

key12

key1Y

value11

value12

value1Y

keynn valuenn

Projects n-D space to 1-D space

Simple, but problematic…

Apply a Z-ordering curve…

5 7 13 15

4 6 12 14

1 3 9 11

0 2 8 10

Problem: False positive scansMD-query on Linearized space

Translate a MD-query to linearized range query.

• Ex. Query from 2 to 9.

Scan queried linearized range. Filter points out of the queried area.

• ex. blue-hatched area (4 to 7)

Require the boundary information of the original space.

5 7 13 15

4 6 12 14

1 3 9 11

0 2 8 102

9

Build a Multi-dimensional Index Layer on top of an Ordered Key-Value store

MD-HBase

Single Dimensional IndexMulti-Dimensional Index

Ordered Key-Value Storeex. BigTable, HBase, …

MD-HBase

Space Partition By the K-d tree

0101 0111 1101 1111

0100 0110 1100 1110

0001 0011 1001 1011

0000 0010 1000 1010

Binary Z-ordering space

00 01 10 11

11

10

01

00

0101 0111 1101 1111

0100 0110 1100 1110

0001 0011 1001 1011

0000 0010 1000 1010

00 01 10 11

11

10

01

00

Partitioned space bythe K-d tree

How do we represent these subspaces?

bitwise interleaving

Key Idea: The longest common prefix naming scheme

0101 0111 1101 1111

0100 0110 1100 1110

0001 0011 1001 1011

0000 0010 1000 1010

00 01 10 11

11

10

01

00

000* 1***

Subspaces represented as the longest common prefix of keys!

Remarkable Property• Preserve boundary information

of the original space

1***

Left-bottomcorner

Right-topcorner

1000 1111*→0 *→1

(10, 00) (11, 11)

Build an index with the longest common prefix of keys

0101 0111 1101 1111

0100 0110 1100 1110

0001 0011 1001 1011

0000 0010 1000 1010

00 01 10 11

11

10

01

00 000* 001*

01**

1***

000*

001*

01**

1***

Index

Buckets

allocate per subspace

Reconstruct the boundary Info. &Check whether intersecting the queried area

Multi-dimensional Range Query

0101 0111 1101 1111

0100 0110 1100 1110

0001 0011 1001 1011

0000 0010 1000 1010

00 01 10 11

11

10

01

00

000*

001*

01**

10**

11**

Index

Filter

001*

000*

001*

10**

11**

01**

10**

Scan

Scan

Subspace Pruning

Scan 0010 -1001on the index

Variations of Storage Layer Table Share Model

Use single table, Maintain bucket boundary Most space efficiency

Table per Bucket Model Allocate a table per bucket Most flexible mapping

One-to-one, one-to-many, many-to-one Bucket split is expensive

Copy all points to the new buckets.

Region per Bucket Model Allocate a region per bucket Most bucket split efficiency Require modification of HBase

bucketstable

Experimental Results: Multi-dimensional Range Query Dataset: 400,000,000 points Queries: select objects within MD ranges and change selectivity Cluster size: 16 nodes MD-HBase responses 10~100 times faster than others

and responses proportional time to selectivity.

1

10

100

1000

0.01 0.1 1 10

Selectivity (%)

Resp

onse

Tim

e (S

ec)

MD-HBase HBase(ZOrder) MapReduce

Experimental Results: Insert Dataset: spatially skewed data MD-HBase shows good scalability without

significant overhead.

0

50,000

100,000

150,000

200,000

250,000

0 4 8 12 16 20

Number of nodes

Thou

ghpu

t(r

ecor

ds/s

ec)

MD-HBase

Hbase(Zorder)

Conclusions

Designed a scalable multi-dimensional data store. Mapping multi-dimension to single dimension Key Idea: indexing the longest common prefix of keys

Demonstrated scalable insert throughput and excellent query performance. Range Query: 10-100 times faster than existing

technologies. Insert: 220K inserts/sec on 16 nodes cluster without

overhead

CCIndex: A Complemental Clustering Index on Distributed Ordered Tables for Multi-dimensional Range Queries

Y. Zou, J. Liu, S. Wang. NPC’10

end

Introduction Motivation

Building index in DOTs to support multi-dimensional range query

High performance, low space overhead, high reliability DOT

Distributed Ordered Table BigTable， HBase

ObservationsUsually 3 to 5 replica in DOTs Index number is usually less than 5Random read is significantly slower than scan

Basic idea： Complemental Clustering Index

CCIT：convert slow random reads to fast sequential scan

CCT：for fast datarecovery

Challenges

Performance Reliability Space overhead

Performance

HBase 0.20.1 16 nodes 90 million

records

Query optimization based on the region-to-server mapping information

Reliability: Fault tolarance Get other index value

from CCTs Query the CCITs to

recover data Replicate CCTs

Space overhead

N： the index column number

X-axis Length of

record to length of index columns

Y-axis Overhead ratio

Conclusions

Proposed CCIndex to support Multi-dimensional range query in DOTs

Not suitable for more than 5 index columns Write operation is slower than the original table

Outline


Conclusions Index for non-rowkey in cloud data management system Solutions

Local index + global index Linearlization Secondary index

Key issues Index reliability Query result correctness Index maintenance…

Future work Study the architecture of HDFS and Hbase in detail Test the existing index solutions in Cloud Index framework and index structure

References M. K. Aguilera, W. Golab, and M. A. Shah. A practical scalable distributed b-tree. PVLDB, 1(1):598–

609, 2008.Y. Zou, J. Liu, S. Wang. CCIndex: a Complemental Clustering Index on Distributed

Ordered Tables for Multi-dimensional Range Queries. NPC’10.S. Wu and K.-L. Wu, “An indexing framework for efficient retrieval on the cloud,” IEEE

Data Eng. Bull., vol. 32, pp.75–82, 2009.J. Wang, S. Wu, H. Gao, J. Li, and B. C. Ooi. Indexing multi-dimensional data in a

cloud system. In SIGMOD, 2010.S. Wu, D. Jiang, B. C. Ooi, and K.-L. Wu. Efficient b-tree based indexing for cloud data

processing. PVLDB, 3(1):1207–1218, 2010.X. Zhang, J. Ai, Z. Wang, J. Lu, and X. Meng, “An efficient multidimensional index for

cloud data management,” in CloudDB, 2009, pp.17–24.Shoji Nishimura, Sudipto Das. MD-HBase: A Scalable Multi-dimensional Data

Infrastructure for Location Aware Services. MDM2011.

Thank you

Documents

Lab of Web And Mobile Data Management （ WAMDM ） Youzhong MA