20
Query Evaluation Techniques for Cluster Database Systems Andrey V. Lepikhov, Leonid B. Sokolinsky South Ural State University Russia 22 September 2010 1 ADBIS 2010

Query Evaluation Techniques for Cluster Database Systems Andrey V. Lepikhov, Leonid B. Sokolinsky South Ural State University Russia 22 September 2010

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Query Evaluation Techniques for Cluster Database Systems

Andrey V. Lepikhov, Leonid B. Sokolinsky

South Ural State University

Russia

22 September 2010

1

ADBIS 2010

ADBIS 20102

Outline

22 September 2010

MotivationProblem StatementBackgroundPartial mirroring methodResultsFuture work

Motivation

22 September 20103

Top500• Cluster: 84.8%• MPP: 14.8%• Others: 0.4%

ADBIS 2010

Problem Statement

22 September 20104

Not expensive parallel hardware needs not expensive parallel database management system

Today we have no such chip parallel database management system

ADBIS 2010

Background

22 September 20105 ADBIS 2010

Exchange operator

22 September 2010ADBIS 20106

pE p: port ψ: distributing function

Parallel plan for query Q = RS

22 September 2010ADBIS 20107

Query processing in cluster system

22 September 20108 ADBIS 2010

The problem

22 September 2010ADBIS 20109

Load balancing

Partial mirroring method

Fragmentation strategy Replication strategy

10

Fragmentation strategy

22 September 2010ADBIS 201011

FR

AG

ME

NT

AT

IO

N

Source relation P0

P1

Pn

• Relation is divided into fragments distributed among cluster nodes

• Each fragment is divided into sequence of segments with an equal length

• Segment is the minimal unit of replication

. . .

. . .. . .

. . .

Replication strategy

22 September 2010ADBIS 201012

Disk Di Disk Dj

ρj = 50%

Ri

jiR

Load balancing method

22 September 2010ADBIS 201013

D1 D2

t1

D1 D2

t2

D1 D2

- scheduled to process

D1 D2

t4

- processed

t3

P1

P2P1 P2

P1

P1

P2

P2

- not used

12S

1S 2S

21S

12S

1S 2S

21S

12S

1S 2S

21S

12S

1S 2S

21S

Parallel agent with two input streams

22 September 2010ADBIS 201014

Load balancing algorithm

22 September 2010ADBIS 201015

Parameters of experiments

22 September 2010ADBIS 201016

Speedup versus replication factor

22 September 2010ADBIS 201017

Speedup versus skew factor θ

22 September 2010ADBIS 201018

• 0.68 corresponds to the ”80-20” rule (80 percents of tuples of the relation will be stored in 20 percents of fragments)• 0 corresponds to the uniform distribution

Future Work

22 September 2010ADBIS 201019

To incorporate the proposed technique of parallel query execution into open source PostgreSQL DBMS.

To extend this approach on GRID DBMS for clusters with multicor processors.

22 September 2010ADBIS 201020

Thank you