Upload
levi
View
30
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Efficient Top-k Searching According to User Preferences Based on Fuzzy Functions with Usage of Tree-Oriented Data Structures. Matúš Ondreička Superised by Prof. Jaroslav Pokorný Faculty of Mathematics and Physics Department of Software Engineering Charles University in Prague - PowerPoint PPT Presentation
Citation preview
Efficient Top-k Searching According to User Preferences
Based on Fuzzy Functions with Usage of Tree-Oriented Data Structures
Matúš OndreičkaSuperised by Prof. Jaroslav Pokorný
Faculty of Mathematics and PhysicsDepartment of Software Engineering
Charles University in PragueCzech Republic
Matúš Ondreička 2VLDB 2011 PhD Workshop
Research - outline introduction
top-k problem, user preferences, fuzzy functions related work technical solutions
Tree-Oriented Data Structures set of B+-trees multidimensional B+-tree multidimensional B+-tree with lists
MD-algorithm, MXT-algorithm experiments, current results motivation of future research
Matúš Ondreička 3VLDB 2011 PhD Workshop
Top-k problem top-k searching
the (few) best k objects with more attributes k objects with the highest ratting
according to user preferences based on fuzzy functions
efficient top-k searching without accessing all the objects allow the full support of model of user preferences
local preferences global preferences
Matúš Ondreička 4VLDB 2011 PhD Workshop
Model of user preferences local preferences
objects are preferred according to one attribute an attribute's domain is continuous
modeled with an fuzzy function fU(x): xA → [0, 1]
an attribute's domain is discrete evaluating of each value
ACER := 0.6, APPLE := 1.0, DELL := 0.9, SONY := 0.8 global preferences
objects are preferred according more attributes modeled with an aggregation function
@U(x): ( f1U(x), ..., fm
U(x) ) → [0, 1]
e.g. weighted average
fU(x)100% 1
xA0€ 1000€
w1 . f1U(x) + ... + wm . fm
U(x)w1 + ... + wm
@U(x) =
0% 0
Matúš Ondreička 5VLDB 2011 PhD Workshop
Motivation and related work XML, multimedia, the Web, etc. relational databases
Ilyas, Beskales, Soliman: A survey of top-k query processing techniques in relational database systems. 2008.
ranking functions query optimalization
Fagin's algorithms Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for
middleware. Journal of Computer and System Sciences 66, 2003. only support of a monotone ranking functions based on sorted lists no supporting of local user preferences
BASIC MOTIVATION FOR OUR RESEARCH
Matúš Ondreička 6VLDB 2011 PhD Workshop
Usage of B+-tree local user preference
by fuzzy function on monotonous interval
moving in leaf level ‘’ways’’ in leaf level continuously on all ‘’ways’’ comparing objects on
different ‘’ways’’ choosing the biggest on all
the ‘’ways’’ obtaining objects
during the computation of algorithm
with ratings in descending order
by fuzzy function fU
0.9 1.0
QE
R TYU
SD FGH
KCNM
0.2 0.5 0.8
0.6 0.7 0.80.3 0.4 0.50.0 0.1 0.2
0.5 0.6 0.7 0.8 0.9 10.40.30.20.10
1
0
w5w1 w2 w3 w4
Matúš Ondreička 7VLDB 2011 PhD Workshop
Fagin's algorithms TA (threshold algorithm) and NRA (no random access)
searches the best k objects according to monotone aggregate function @ without accessing all objects
preconditions a set of objects X with values of m attributes A1, ..., Am objects from the set X are stored in m lists L1, ..., Lm lists contain pairs (x, ax) lists are sorted in descending order monotone aggregation function @
multi-user solution lists are based on B+-tree algorithm can get pairs (x, fU(x))
from B+-tree sequentially in descending order according to
user's fuzzy function fU(x)
(x1, 1.0)(x2, 0.8)(x3, 0.6)(x4, 0.4)(x5, 0.2)(x6, 0.0)
(x3, 1.0)(x4, 0.8)(x6, 0.6)(x1, 0.4)(x2, 0.2)(x5, 0.0)
(x1, 1.0)(x4, 0.8)(x3, 0.6)(x5, 0.4)(x2, 0.2)(x6, 0.0)
L1 L2 L3
x1
A1B+-tree
x2 x3 x4 x5 x6 x1
A2B+-tree
x2 x3x4x5 x6 x1
A3B+-tree
x2 x3 x4x5x6
Matúš Ondreička 8VLDB 2011 PhD Workshop
Multidimensional B-tree
MDB-tree allows to index set of objects by m > 1 attributes in one data structure m levels, values of one attribute are stored in each level nodes are B+-trees, whose leaf nodes are linked in two directions
0.0A1
0.4A2
0.3A3
ABCD
0.5 0.9 1.0E1.0 0.0 0.0F1.0 0.0 0.0G1.0 0.0 0.7H1.0 0.4 0.7I1.0 0.7 0.4J1.0 0.7 0.6K
0.0 0.4 0.50.0 1.0 0.50.0 1.0 0.5
EA
0.3
B
0.5
CD
0.5
0.4 1.0 0.9
1.0
FG
0.0
H
0.7
I
0.7
J K
0.4 0.6
0.0 0.4 0.7
0.0 0.5 1.0
Matúš Ondreička 9VLDB 2011 PhD Workshop
search the best k objects in a multidimensional B-tree (MDB-tree)without getting all the objects
principle of MD-algorithm MD-algorithm searches MDB-tree with the recursive procedure it uses the temporary list TK of the best actual k objects
analogically to Fagin’s TA-algorithm it uses the best rating B(S) of B+-tree S
monotone aggregate function @ definition
B(S) of B+-tree S in i-th level of MDB-tree B(S) = @(k1, ..., ki-1, 1, ..., 1)
example:@(xA1, xA2)= xA1 + xA2
B(S)=1+1=
0.8
0.8
2.0
B(S)=0.8+1= 1.8
B(S)=0.8+0.7= 1.5
GHBA D E F
C
0.7
1.0
0.4
0.50.30.60.3
MD-algorithm
Matúš Ondreička 10VLDB 2011 PhD Workshop
0.60.5 0.2
0.6
1.0
Searching the best 3 objects1
0
1
0
1
0
f1U(x)
f3U(x)
f2U(x)
0.8 1.0 1.0
0.8 0.6
0.5 0.9 0.1 0.5 0.71.0 0.8 1.0 0.7
WE
RT
YU IPA S
DF GHJ
KLZ XCVB
QM
0.4 0.20.4 0.6 0.3 0.9 1.0 0.50.1
0
0.3
O
0.0
0.0S1
S2
S3S4 S5
S6
S7
S8
S9S10
1.0
B(S2)=1.0+1+1= 3.0
0.6
B(S3)=1.0+0.6+1= 2.6
0.6
B(S)=1.0+0.6+0.6= 2.2
object rating1st2nd3rd
C
2.2
0.5
B(S)=1.0+0.6+0.5=2.1
M
2.1
Q
2.1
0.2
B(S)=1.0+0.6+0.2= 1.81.8
TK
Matúš Ondreička 11VLDB 2011 PhD Workshop
MXT-algorithm based on integration of MD-algorithm and TA-algorithm uses new data structure: multidimensional B+-tree with lists
first n attributes (nominal) stored and searched in the same way as in MD-algorithm
last m - n attributes (ordinal) stored as groups of m - n Fagin's sorted lists searched by instances of Fagin's TA-algorithm
1.0A1
0.7A2
1.0A3
x1
0.2 1.0 0.4 0.7
0.0 0.6 1.0
x2
x3
x4
1.0 0.7 0.2x5
1.0 0.7 0.0x6
1.0 0.7 0.81.0 0.7 0.51.0 0.7 0.4
0.3 0.6 0.7
{x1, 1.0}{x2, 0.8}{x3, 0.5}{x4, 0.4}{x5, 0.2}{x6, 0.0}
{x3, 1.0}{x4, 0.7}{x6, 0.6}{x1, 0.3}{x5, 0.1}{x2, 0.0}
A1
A4A3
0.3A4
0.10.6
0.01.00.7
A2
A4A3A4A3A4A3A4A3A4A3A4A3
A2 A2
… … …... …
1.0A2
A4A3
0.1
A4A3
0.3
Matúš Ondreička 12VLDB 2011 PhD Workshop
An example of results implemented top-k algorithms
TA-algorithm, MD-algorithm, MXT-algorithm using lists based on B+-trees implementation in Java data structures have been tested in memory (not on disk)
tests results the number of obtained objects
real data 8 822 flats for rent in Prague ||dom(District)|| = 10 ||dom(Type)|| = 10 ||dom(Area)|| = 229 ||dom(Price)|| = 411
real user's preferences user prefers flats of some types in specific districts, smaller prices and bigger areas
Matúš Ondreička 13VLDB 2011 PhD Workshop
Motivation, future research improvements of performance of algorithms
heuristics to monitor a distribution of the key values in nodes
improvement of data structures. automatic arrangement levels in MDB-tree with lists, manage empty values
parallel computing in MXT-algorithm construction, instances of TA-algorithm would be computed concurrently
different models of user preferences attribute dependencies between more attributes similarity measures
to find k objects most similar to an object can be user preference user feedback
After running of first top-k query user tune his/her preferences and execute next top-k query different data models
very large data sets tree-oriented data structure allow to dynamise the environment while solving a top-k problem
data streams tree-oriented data structure as a sliding window
approximations, uncertain data, heterogeneous data web environment
more information resources distributed on the web
Matúš Ondreička 14VLDB 2011 PhD Workshop
An application TreeTopK
Matúš Ondreička 15VLDB 2011 PhD Workshop
Thank You for attention!