48
Index for Cloud Data Management Lab of Web And Mobile Data Management WAMDM Youzhong MA

Lab of Web And Mobile Data Management ( WAMDM ) Youzhong MA

  • Upload
    robert

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

Index for Cloud Data Management. Lab of Web And Mobile Data Management ( WAMDM ) Youzhong MA. Outline. Motivating Applications E xisting Technologies Conclusions & Future work . Motivating Application. select sum(number) from Product where product.name = ‘beer’ - PowerPoint PPT Presentation

Citation preview

Page 1: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Index for Cloud Data Management

Lab of Web And Mobile Data Management(WAMDM)Youzhong MA

Page 2: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Outline

Motivating Applications Existing Technologies Conclusions & Future work

Page 3: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

rowkey name Price number1 beer 3.00$ 1000

2 beer 7.00$ 2500

3 milk 2.00$ 1300

4 mlik 4.5$ 2100

Motivating Application

Cloud System

select sum(number) from Productwhere product.name = ‘beer’ and product.price <=10$ and product.price >=5$

Big Data in a Private Cloud

Table: Product

Queries with multi-attributes and non-rowkeyare quite common !

Page 4: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Page 4

Motivating Application: Mobile Coupon Distribution

Coupon

CurrentLocation Current

Location CurrentLocation

Distribution Policy• Area• # of coupons

Mobile CouponDistributer

Page 5: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Page 5

Motivating Application: Mobile Coupon Distribution

CurrentLocation

CurrentLocation

CurrentLocation

CurrentLocation Current

Location

CurrentLocation

CurrentLocation

CurrentLocation

CurrentLocation

CurrentLocation Current

Location

Distribution Policy• Area• # of coupons

CouponCouponCoupon

Large amounts of DataHigh Throughput

System ScalabilityMulti-Dimensional QueryNearest Neighbors Query

Efficient Complex Queries

125,000,000 subscribersin Japan

Page 6: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Outline

Motivating Applications Existing Technologies Conclusions & Future work

Page 7: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Existing TechnologiesMulti-

dimensional Queries

Scalability

Relational DBsSpatial DBs

Commercial products

but expensive

Open source products

Key-Value Stores

What We Want

at a reasonable price

Page 8: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Solutions-overview

Rowkey Non-rowkey

Single Dimensional

Index

[BigTable、 HBase]

[Point Query、 Range Query]

[Aguilera PVLDB’08][S.Wu Data Eng’09][S. Wu PVLDB’10]

Multiple Dimensional Index

[X.Zhang CloudDB’09][J.Wang SIGMOD’10][G.Chen VLDB’11][Y. Zou NPC’10][Shoji Nishimura MDM’11]

Local Index +

Global Index

NECCAS

Page 9: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Efficient B-tree Based Indexing for Cloud Data Processing

S. Wu, D. Jiang, B. C. Ooi, and K.-L. Wu. PVLDB'10

Page 10: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Efficient B-tree Based Indexing for Cloud Data Processing

Motivation Designing a scalable and high-throughput

indexing scheme to support efficient query for huge volumes of data in cloud

Low maintenance cost but also support parallel search

Page 11: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

System Architecture

① Local Index

② BATONoverlay network

③ publish

Page 12: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Challenges How to select the local B+-tree nodes to publish in Global index? How to organize the global index? How to maximize the throughput?

Page 13: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Selecting local B+-tree nodes Cost modeling

Query cost1. routing cost:2. local search cost:

Update cost

: cost of sending an index message: cost of random I/O

1: Search in global index

2: Search in local index

21 log *2

N

( )*h n

21( )* log *2

g n N

Page 14: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Adaptive indexing strategy

Index expand Index collapse

Local Index

Page 15: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

BATON: Balanced Tree Overlay Network

A distributed tree structure for P2P systems Supporting range search

Page 16: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Index Construction Assign a range to each node For each node n

The range of its left sub-tree is less than that of nThe range of its right sub-tree is larger than that of n

Page 17: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Publish local B+-tree node to BATON

Page 18: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Maximizing the throughput Eventual consistent model Lazy update

if the update does not affect the key range of a local B+-tree, the stale index will not affect the correctness of the query processing.

Eager update updates in the Left-most and right-most nodes

Page 19: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Pros and cons Pros

Supporting efficient point query and range query for non-rowkey

Proposed an adaptive indexing strategy based on the cost model of overlay routings

Cons Can not support multi-dimensional query

Page 20: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Multi-dimensional index

[X.Zhang CloudDB’09]

Page 21: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Multi-dimensional index

[J.Wang SIGMOD’10]

[G.Chen VLDB’11]

Page 22: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location

Aware Services Shoji Nishimura, Sudipto Das. MDM'11

Page 23: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Contributions Using linearization to implement a scalable multi-dimensional index structure layered over a range-partitioned Key-value store Implementing a K-d tree and a Quad tree by the design

Page 24: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Ordered Key-Value Stores

key00

key11

keynn

key00

key01

key0X

value00

value01

value0X

key11

key12

key1Y

value11

value12

value1Y

keynn valuenn

Index

BucketsSorted by key

Good at 1-D Range Query

LongitudeTime

Latit

ude

But, our target is multi-dimensional…

Page 25: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Naïve Solution: Linearlization

key00

key11

keynn

key00

key01

key0X

value00

value01

value0X

key11

key12

key1Y

value11

value12

value1Y

keynn valuenn

Projects n-D space to 1-D space

Simple, but problematic…

Apply a Z-ordering curve…

5 7 13 15

4 6 12 14

1 3 9 11

0 2 8 10

Page 26: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Problem: False positive scansMD-query on Linearized space

Translate a MD-query to linearized range query.

• Ex. Query from 2 to 9.

Scan queried linearized range. Filter points out of the queried area.

• ex. blue-hatched area (4 to 7)

Require the boundary information of the original space.

5 7 13 15

4 6 12 14

1 3 9 11

0 2 8 102

9

Page 27: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Build a Multi-dimensional Index Layer on top of an Ordered Key-Value store

MD-HBase

Single Dimensional IndexMulti-Dimensional Index

Ordered Key-Value Storeex. BigTable, HBase, …

MD-HBase

Page 28: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Space Partition By the K-d tree

0101 0111 1101 1111

0100 0110 1100 1110

0001 0011 1001 1011

0000 0010 1000 1010

Binary Z-ordering space

00 01 10 11

11

10

01

00

0101 0111 1101 1111

0100 0110 1100 1110

0001 0011 1001 1011

0000 0010 1000 1010

00 01 10 11

11

10

01

00

Partitioned space bythe K-d tree

How do we represent these subspaces?

bitwise interleaving

Page 29: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Key Idea: The longest common prefix naming scheme

0101 0111 1101 1111

0100 0110 1100 1110

0001 0011 1001 1011

0000 0010 1000 1010

00 01 10 11

11

10

01

00

000* 1***

Subspaces represented as the longest common prefix of keys!

Remarkable Property• Preserve boundary information

of the original space

1***

Left-bottomcorner

Right-topcorner

1000 1111*→0 *→1

(10, 00) (11, 11)

Page 30: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Build an index with the longest common prefix of keys

0101 0111 1101 1111

0100 0110 1100 1110

0001 0011 1001 1011

0000 0010 1000 1010

00 01 10 11

11

10

01

00 000* 001*

01**

1***

000*

001*

01**

1***

Index

Buckets

allocate per subspace

Page 31: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Reconstruct the boundary Info. &Check whether intersecting the queried area

Multi-dimensional Range Query

0101 0111 1101 1111

0100 0110 1100 1110

0001 0011 1001 1011

0000 0010 1000 1010

00 01 10 11

11

10

01

00

000*

001*

01**

10**

11**

Index

Filter

001*

000*

001*

10**

11**

01**

10**

Scan

Scan

Subspace Pruning

Scan 0010 -1001on the index

Page 32: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Variations of Storage Layer Table Share Model

Use single table, Maintain bucket boundary Most space efficiency

Table per Bucket Model Allocate a table per bucket Most flexible mapping

One-to-one, one-to-many, many-to-one Bucket split is expensive

Copy all points to the new buckets.

Region per Bucket Model Allocate a region per bucket Most bucket split efficiency Require modification of HBase

bucketstable

Page 33: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Experimental Results: Multi-dimensional Range Query Dataset: 400,000,000 points Queries: select objects within MD ranges and change selectivity Cluster size: 16 nodes MD-HBase responses 10~100 times faster than others

and responses proportional time to selectivity.

1

10

100

1000

0.01 0.1 1 10

Selectivity (%)

Resp

onse

Tim

e (S

ec)

MD-HBase HBase(ZOrder) MapReduce

Page 34: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Experimental Results: Insert Dataset: spatially skewed data MD-HBase shows good scalability without

significant overhead.

0

50,000

100,000

150,000

200,000

250,000

0 4 8 12 16 20

Number of nodes

Thou

ghpu

t(r

ecor

ds/s

ec)

MD-HBase

Hbase(Zorder)

Page 35: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Conclusions

Designed a scalable multi-dimensional data store. Mapping multi-dimension to single dimension Key Idea: indexing the longest common prefix of keys

Demonstrated scalable insert throughput and excellent query performance. Range Query: 10-100 times faster than existing

technologies. Insert: 220K inserts/sec on 16 nodes cluster without

overhead

Page 36: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

CCIndex: A Complemental Clustering Index on Distributed Ordered Tables for Multi-dimensional Range Queries

Y. Zou, J. Liu, S. Wang. NPC’10

end

Page 37: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Introduction Motivation

Building index in DOTs to support multi-dimensional range query

High performance, low space overhead, high reliability DOT

Distributed Ordered Table BigTable, HBase

ObservationsUsually 3 to 5 replica in DOTs Index number is usually less than 5Random read is significantly slower than scan

Page 38: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Basic idea: Complemental Clustering Index

CCIT:convert slow random reads to fast sequential scan

CCT:for fast datarecovery

Page 39: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Challenges

Performance Reliability Space overhead

Page 40: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Performance

HBase 0.20.1 16 nodes 90 million

records

Query optimization based on the region-to-server mapping information

Page 41: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Reliability: Fault tolarance Get other index value

from CCTs Query the CCITs to

recover data Replicate CCTs

Page 42: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Space overhead

N: the index column number

X-axis Length of

record to length of index columns

Y-axis Overhead ratio

Page 43: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Conclusions

Proposed CCIndex to support Multi-dimensional range query in DOTs

Not suitable for more than 5 index columns Write operation is slower than the original table

Page 44: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Outline

Motivating Applications Existing Technologies Conclusions & Future work

Page 45: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Conclusions Index for non-rowkey in cloud data management system Solutions

Local index + global index Linearlization Secondary index

Key issues Index reliability Query result correctness Index maintenance…

Page 46: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Future work Study the architecture of HDFS and Hbase in detail Test the existing index solutions in Cloud Index framework and index structure

Page 47: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

References M. K. Aguilera, W. Golab, and M. A. Shah. A practical scalable distributed b-tree. PVLDB, 1(1):598–

609, 2008.Y. Zou, J. Liu, S. Wang. CCIndex: a Complemental Clustering Index on Distributed

Ordered Tables for Multi-dimensional Range Queries. NPC’10.S. Wu and K.-L. Wu, “An indexing framework for efficient retrieval on the cloud,” IEEE

Data Eng. Bull., vol. 32, pp.75–82, 2009.J. Wang, S. Wu, H. Gao, J. Li, and B. C. Ooi. Indexing multi-dimensional data in a

cloud system. In SIGMOD, 2010.S. Wu, D. Jiang, B. C. Ooi, and K.-L. Wu. Efficient b-tree based indexing for cloud data

processing. PVLDB, 3(1):1207–1218, 2010.X. Zhang, J. Ai, Z. Wang, J. Lu, and X. Meng, “An efficient multidimensional index for

cloud data management,” in CloudDB, 2009, pp.17–24.Shoji Nishimura, Sudipto Das. MD-HBase: A Scalable Multi-dimensional Data

Infrastructure for Location Aware Services. MDM2011.

Page 48: Lab of  Web And Mobile Data Management ( WAMDM ) Youzhong  MA

Thank you