15
Efficient Top-k Searching According to User Preferences Based on Fuzzy Functions with Usage of Tree-Oriented Data Structures Matúš Ondreička Superised by Prof. Jaroslav Pokorný Faculty of Mathematics and Physics Department of Software Engineering Charles University in Prague Czech Republic

Matúš Ondreička Superised by Prof. Jaroslav Pokorný Faculty of Mathematics and Physics

  • Upload
    levi

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

Efficient Top-k Searching According to User Preferences Based on Fuzzy Functions with Usage of Tree-Oriented Data Structures. Matúš Ondreička Superised by Prof. Jaroslav Pokorný Faculty of Mathematics and Physics Department of Software Engineering Charles University in Prague - PowerPoint PPT Presentation

Citation preview

Page 1: Matúš Ondreička Superised by Prof. Jaroslav Pokorný Faculty of Mathematics and Physics

Efficient Top-k Searching According to User Preferences

Based on Fuzzy Functions with Usage of Tree-Oriented Data Structures

Matúš OndreičkaSuperised by Prof. Jaroslav Pokorný

Faculty of Mathematics and PhysicsDepartment of Software Engineering

Charles University in PragueCzech Republic

Page 2: Matúš Ondreička Superised by Prof. Jaroslav Pokorný Faculty of Mathematics and Physics

Matúš Ondreička 2VLDB 2011 PhD Workshop

Research - outline introduction

top-k problem, user preferences, fuzzy functions related work technical solutions

Tree-Oriented Data Structures set of B+-trees multidimensional B+-tree multidimensional B+-tree with lists

MD-algorithm, MXT-algorithm experiments, current results motivation of future research

Page 3: Matúš Ondreička Superised by Prof. Jaroslav Pokorný Faculty of Mathematics and Physics

Matúš Ondreička 3VLDB 2011 PhD Workshop

Top-k problem top-k searching

the (few) best k objects with more attributes k objects with the highest ratting

according to user preferences based on fuzzy functions

efficient top-k searching without accessing all the objects allow the full support of model of user preferences

local preferences global preferences

Page 4: Matúš Ondreička Superised by Prof. Jaroslav Pokorný Faculty of Mathematics and Physics

Matúš Ondreička 4VLDB 2011 PhD Workshop

Model of user preferences local preferences

objects are preferred according to one attribute an attribute's domain is continuous

modeled with an fuzzy function fU(x): xA → [0, 1]

an attribute's domain is discrete evaluating of each value

ACER := 0.6, APPLE := 1.0, DELL := 0.9, SONY := 0.8 global preferences

objects are preferred according more attributes modeled with an aggregation function

@U(x): ( f1U(x), ..., fm

U(x) ) → [0, 1]

e.g. weighted average

fU(x)100% 1

xA0€ 1000€

w1 . f1U(x) + ... + wm . fm

U(x)w1 + ... + wm

@U(x) =

0% 0

Page 5: Matúš Ondreička Superised by Prof. Jaroslav Pokorný Faculty of Mathematics and Physics

Matúš Ondreička 5VLDB 2011 PhD Workshop

Motivation and related work XML, multimedia, the Web, etc. relational databases

Ilyas, Beskales, Soliman: A survey of top-k query processing techniques in relational database systems. 2008.

ranking functions query optimalization

Fagin's algorithms Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for

middleware. Journal of Computer and System Sciences 66, 2003. only support of a monotone ranking functions based on sorted lists no supporting of local user preferences

BASIC MOTIVATION FOR OUR RESEARCH

Page 6: Matúš Ondreička Superised by Prof. Jaroslav Pokorný Faculty of Mathematics and Physics

Matúš Ondreička 6VLDB 2011 PhD Workshop

Usage of B+-tree local user preference

by fuzzy function on monotonous interval

moving in leaf level ‘’ways’’ in leaf level continuously on all ‘’ways’’ comparing objects on

different ‘’ways’’ choosing the biggest on all

the ‘’ways’’ obtaining objects

during the computation of algorithm

with ratings in descending order

by fuzzy function fU

0.9 1.0

QE

R TYU

SD FGH

KCNM

0.2 0.5 0.8

0.6 0.7 0.80.3 0.4 0.50.0 0.1 0.2

0.5 0.6 0.7 0.8 0.9 10.40.30.20.10

1

0

w5w1 w2 w3 w4

Page 7: Matúš Ondreička Superised by Prof. Jaroslav Pokorný Faculty of Mathematics and Physics

Matúš Ondreička 7VLDB 2011 PhD Workshop

Fagin's algorithms TA (threshold algorithm) and NRA (no random access)

searches the best k objects according to monotone aggregate function @ without accessing all objects

preconditions a set of objects X with values of m attributes A1, ..., Am objects from the set X are stored in m lists L1, ..., Lm lists contain pairs (x, ax) lists are sorted in descending order monotone aggregation function @

multi-user solution lists are based on B+-tree algorithm can get pairs (x, fU(x))

from B+-tree sequentially in descending order according to

user's fuzzy function fU(x)

(x1, 1.0)(x2, 0.8)(x3, 0.6)(x4, 0.4)(x5, 0.2)(x6, 0.0)

(x3, 1.0)(x4, 0.8)(x6, 0.6)(x1, 0.4)(x2, 0.2)(x5, 0.0)

(x1, 1.0)(x4, 0.8)(x3, 0.6)(x5, 0.4)(x2, 0.2)(x6, 0.0)

L1 L2 L3

x1

A1B+-tree

x2 x3 x4 x5 x6 x1

A2B+-tree

x2 x3x4x5 x6 x1

A3B+-tree

x2 x3 x4x5x6

Page 8: Matúš Ondreička Superised by Prof. Jaroslav Pokorný Faculty of Mathematics and Physics

Matúš Ondreička 8VLDB 2011 PhD Workshop

Multidimensional B-tree

MDB-tree allows to index set of objects by m > 1 attributes in one data structure m levels, values of one attribute are stored in each level nodes are B+-trees, whose leaf nodes are linked in two directions

0.0A1

0.4A2

0.3A3

ABCD

0.5 0.9 1.0E1.0 0.0 0.0F1.0 0.0 0.0G1.0 0.0 0.7H1.0 0.4 0.7I1.0 0.7 0.4J1.0 0.7 0.6K

0.0 0.4 0.50.0 1.0 0.50.0 1.0 0.5

EA

0.3

B

0.5

CD

0.5

0.4 1.0 0.9

1.0

FG

0.0

H

0.7

I

0.7

J K

0.4 0.6

0.0 0.4 0.7

0.0 0.5 1.0

Page 9: Matúš Ondreička Superised by Prof. Jaroslav Pokorný Faculty of Mathematics and Physics

Matúš Ondreička 9VLDB 2011 PhD Workshop

search the best k objects in a multidimensional B-tree (MDB-tree)without getting all the objects

principle of MD-algorithm MD-algorithm searches MDB-tree with the recursive procedure it uses the temporary list TK of the best actual k objects

analogically to Fagin’s TA-algorithm it uses the best rating B(S) of B+-tree S

monotone aggregate function @ definition

B(S) of B+-tree S in i-th level of MDB-tree B(S) = @(k1, ..., ki-1, 1, ..., 1)

example:@(xA1, xA2)= xA1 + xA2

B(S)=1+1=

0.8

0.8

2.0

B(S)=0.8+1= 1.8

B(S)=0.8+0.7= 1.5

GHBA D E F

C

0.7

1.0

0.4

0.50.30.60.3

MD-algorithm

Page 10: Matúš Ondreička Superised by Prof. Jaroslav Pokorný Faculty of Mathematics and Physics

Matúš Ondreička 10VLDB 2011 PhD Workshop

0.60.5 0.2

0.6

1.0

Searching the best 3 objects1

0

1

0

1

0

f1U(x)

f3U(x)

f2U(x)

0.8 1.0 1.0

0.8 0.6

0.5 0.9 0.1 0.5 0.71.0 0.8 1.0 0.7

WE

RT

YU IPA S

DF GHJ

KLZ XCVB

QM

0.4 0.20.4 0.6 0.3 0.9 1.0 0.50.1

0

0.3

O

0.0

0.0S1

S2

S3S4 S5

S6

S7

S8

S9S10

1.0

B(S2)=1.0+1+1= 3.0

0.6

B(S3)=1.0+0.6+1= 2.6

0.6

B(S)=1.0+0.6+0.6= 2.2

object rating1st2nd3rd

C

2.2

0.5

B(S)=1.0+0.6+0.5=2.1

M

2.1

Q

2.1

0.2

B(S)=1.0+0.6+0.2= 1.81.8

TK

Page 11: Matúš Ondreička Superised by Prof. Jaroslav Pokorný Faculty of Mathematics and Physics

Matúš Ondreička 11VLDB 2011 PhD Workshop

MXT-algorithm based on integration of MD-algorithm and TA-algorithm uses new data structure: multidimensional B+-tree with lists

first n attributes (nominal) stored and searched in the same way as in MD-algorithm

last m - n attributes (ordinal) stored as groups of m - n Fagin's sorted lists searched by instances of Fagin's TA-algorithm

1.0A1

0.7A2

1.0A3

x1

0.2 1.0 0.4 0.7

0.0 0.6 1.0

x2

x3

x4

1.0 0.7 0.2x5

1.0 0.7 0.0x6

1.0 0.7 0.81.0 0.7 0.51.0 0.7 0.4

0.3 0.6 0.7

{x1, 1.0}{x2, 0.8}{x3, 0.5}{x4, 0.4}{x5, 0.2}{x6, 0.0}

{x3, 1.0}{x4, 0.7}{x6, 0.6}{x1, 0.3}{x5, 0.1}{x2, 0.0}

A1

A4A3

0.3A4

0.10.6

0.01.00.7

A2

A4A3A4A3A4A3A4A3A4A3A4A3

A2 A2

… … …... …

1.0A2

A4A3

0.1

A4A3

0.3

Page 12: Matúš Ondreička Superised by Prof. Jaroslav Pokorný Faculty of Mathematics and Physics

Matúš Ondreička 12VLDB 2011 PhD Workshop

An example of results implemented top-k algorithms

TA-algorithm, MD-algorithm, MXT-algorithm using lists based on B+-trees implementation in Java data structures have been tested in memory (not on disk)

tests results the number of obtained objects

real data 8 822 flats for rent in Prague ||dom(District)|| = 10 ||dom(Type)|| = 10 ||dom(Area)|| = 229 ||dom(Price)|| = 411

real user's preferences user prefers flats of some types in specific districts, smaller prices and bigger areas

Page 13: Matúš Ondreička Superised by Prof. Jaroslav Pokorný Faculty of Mathematics and Physics

Matúš Ondreička 13VLDB 2011 PhD Workshop

Motivation, future research improvements of performance of algorithms

heuristics to monitor a distribution of the key values in nodes

improvement of data structures. automatic arrangement levels in MDB-tree with lists, manage empty values

parallel computing in MXT-algorithm construction, instances of TA-algorithm would be computed concurrently

different models of user preferences attribute dependencies between more attributes similarity measures

to find k objects most similar to an object can be user preference user feedback

After running of first top-k query user tune his/her preferences and execute next top-k query different data models

very large data sets tree-oriented data structure allow to dynamise the environment while solving a top-k problem

data streams tree-oriented data structure as a sliding window

approximations, uncertain data, heterogeneous data web environment

more information resources distributed on the web

Page 14: Matúš Ondreička Superised by Prof. Jaroslav Pokorný Faculty of Mathematics and Physics

Matúš Ondreička 14VLDB 2011 PhD Workshop

An application TreeTopK

Page 15: Matúš Ondreička Superised by Prof. Jaroslav Pokorný Faculty of Mathematics and Physics

Matúš Ondreička 15VLDB 2011 PhD Workshop

Thank You for attention!