Upload
addison-lackland
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Similarity Evaluation Techniques for Filtering Problems
??Vagan TerziyanVagan TerziyanUniversity of JyvaskylaUniversity of Jyvaskyla
Evaluating Distance between Various Domain Objects and Concepts - one of the basic abilities of an intelligent agent
Are these two the same?
… No !The difference is
equal to 0.234
Contents
Goal Basic Concepts External Similarity Evaluation An Example Internal Similarity Evaluation Conclusions
Reference
Puuronen S., Terziyan V., A Similarity Evaluation A Similarity Evaluation Technique for Data Mining with an Ensemble of Technique for Data Mining with an Ensemble of ClassifiersClassifiers, In: A.M. Tjoa, R.R. Wagner and A. Al-Zobaidie (Eds.), Proc. of the 11th Intern. Workshop on Database and Expert Systems Applications, IEEE CS Press, Los Alamitos, California, 2000, pp. 1155-1159. http://dlib.computer.org/conferen/dexa/0680/pdf/06801155.pdf
Goal
The goal of this research is to develop simple similarity evaluation technique to be used for social filtering
Result of social filtering here here is prediction of a customer’s evaluation of certain product based on known opinions about this product from other customers
Basic Concepts:Virtual Training Environment (VTE)
VTE is a quadruple:
<D,C,S,P>• D is the set of goods D1, D2,..., Dn in the VTE;
• C is the set of evaluation marks C1, C2,..., Cm , that are used to rank the products;
• S is the set of customers S1, S2,..., Sr , who select evaluation marks to rank the products;
• P is the set of semantic predicates that define relationships between D, C, S
Basic Concepts:Semantic Predicate P
. te D to evalualect C to se
refuseselect or does not ,if S
;aluate D to ev
o select C refuses t,if S
;D product aluate the to ev
ark C selects mstomer S,if the cu
),S,CP(D
ij
k
i
jk
i
jk
kji
0
1
1
Problem 1:Deriving External Similarity Values
DC
S
DiCj
Sk
SDk,i
DCi,j
SCk,j
External Similarity Values
External Similarity Values (ESV): binary relations DC, SC, and SD between the elements of (sub)sets of D and C; S and C; and S and D.
ESV are based on total support among all the customers for voting for the appropriate connection (or refusal to vote)
DC
S
DiCj
Sk
SDk,i
DCi,j
SCk,j
Problem 2:Deriving Internal Similarity Values
D C
S
Di’
SSk’,k’’
DDi’,i’’ CCj’,j’’
Di’’
Cj’
Cj’’
Sk’
Sk’’
Internal Similarity Values
Internal Similarity Values (ISV): binary relations between two subsets of D, two subsets of C and two subsets of S.
ISV are based on total support among all the customers for voting for the appropriate connection (or refusal to vote)
D C
S
Di’
SSk’,k’’
DDi’,i’’ CCj’,j’’
Di’’
Cj’
Cj’’
Sk’
Sk’’
Why we Need Similarity Values (or Distance Measure) ? Distance between products is used to advertise the
customers a new product based on evaluation of already known similar products
distance between evaluations is necessary to estimate evaluation error when necessary, e.g. in the case of adaptive filtering technologies used
distance between customers is useful to evaluate weights of all customers when necessary, e.g. to be able to integrate their opinions by weighted voting.
Deriving External Relation DC:How well evaluation fits the product
DC CD P D C S D D C Ci j j i i j k i jk
r
, , ( , , ), ,
DC
S
DiCj
Sk2
DCi,j=3
Sk1
Sk3
Customers
Products Evaluation marks
Deriving External Relation SC:Measures customer’s competence in the use of evaluation marks
The value of the relation (Sk,Cj) in a way represents the total support that the customer Sk obtains selecting (refusing to select) the mark Cj to evaluate all the products.
SC CS DC P D C S S S C Ck j j k i j i j ki
n
k j, , , ( , , ), ,
Example of SC Relation
DC
SSk
Cj
D2
SCk,j=4
D1
D4
D3
CDj1 = -3
CDj2 = 6
CDj3 = 0
CDj4 = 1
Customers
Products Evaluation marks
Deriving External Relation SD:Measures customer’s competence in the products
The value of the relation (Sk,Di) represents the total support that the agent Sk receives selecting (or refusing to select) all the solutions to solve the problem Di.
SD DS DC P D C S S S D Dk i i k i j i j kj
m
k i, , , ( , , ), ,
Example of SD Relation
DC
SSk
Di
C1
SDk,i=2
C2
CD1i = -3
CD2i = 5
ProductsEvaluation marks
Customers
Normalizing External Relations to the Interval [0,1]
min(value)-max(value)
)valuemin( value=value valuegnormalizin
-
DC CDDC r
ri j j ii j
, ,,
2
SC CSSC n r
n rk j j kk j
, ,, ( )
( )
2
2 1
SD DSSD m r
m rk i i kk i
, ,, ( )
( )
2
2 1
n is the number of products
m is the number of evaluation marks
r is the number of customers
Competence of a customer
Di
Conceptual pattern of goods’ features
Conceptual pattern of evaluation marks definitions
GoodsEvaluation
marks
Cj
Customer
Competence in the goods
Competence in the evaluation marks
Customer’s Evaluation:competence quality in Products
Q Sn
SDDk k i
i
n( ) , 1
Customer’s Evaluation:competence quality in evaluation marks use
Q Sm
SCCk k j
j
m( ) , 1
Quality Balance Theorem
Q S Q SDk
Ck( ) ( )
The evaluation of a customer’s competence (ranking, weighting, quality evaluation) does not depend on the competence area “virtual world of products” or “conceptual world of evaluation marks” because both competence values are always equal.
Proof
Q Sn
SDn
SD m r
m rD
k k ii
nk i
i
n( )
( )
( ),,
1 1 2
2 1
1
2
2 1n
DC P D C S m r
m r
i j i j kj
m
i
n( ( , , )) ( )
( )
,
1
2
2 1m
DC P D C S n r
n r
i j i j ki
n
j
m( ( , , )) ( )
( )
,
...
...
1 2
2 1
1
m
SC n r
n r mSC Q S
k j
j
m
k jj
mC
k,
,
( )
( )( )
An Example
Let us suppose that four customers have to evaluate three products from virtual shop using five different evaluation marks available.
The customers should define their selection of appropriate mark for every product.
The final goal is to obtain a cooperative evaluation result of all the customers concerning the quality of products.
C set (evaluation marks) in the Example
Evaluation marks Notation
Nicely designed C1
Expensive C2
Easy to use C3
Reliable C4
Safe C5
S (customers) Set in the Example
Customers IDs Notation
Fox S1
Wolf S2
Cat S3
Hare S4
D (products) Set in the Example
D2 - Nokia Communicator 9110
D1 - Ultra Cast Spinning Reel
D3 - iGrafx Process Management
Software
Evaluations Made for the Good “Reel”
D1
P(D,C,S) C1 C2 C3 C4 C5
S1 1 -1 -1 0 -1
S2 0+ -1** 0 ++ 1* -1***
S3 0 0 -1 1 0
S4 1 -1 0 0 1Customer Wolf prefers to select mark Reliable* to evaluate “Reel” and it refuses to select Expensive** or Safe***. Wolf does not use or refuse to use the Nicely designed+ or Easy to use++ marks for evaluation.
Evaluations Made for the Good “Communicator”
D2
P C1 C2 C3 C4 C5
S1 -1 0 -1 0 1
S2 1 -1 -1 0 0
S3 1 -1 0 1 1
S4 -1 0 0 1 0
Evaluations Made for the Good “Software”
D3
P C1 C2 C3 C4 C5
S1 1 0 1 -1 0
S2 0 1 0 -1 1
S3 -1 -1 1 -1 1
S4 -1 -1 1 -1 1
Example: Calculating Value DC3,4
D3
P C1 C2 C3 C4 C5
S1 1 0 1 -1 0
S2 0 1 0 -1 1
S3 -1 -1 1 -1 1
S4 -1 -1 1 -1 1
r
kjikjiijji CCDDSCDPCDDC ,),,,(,,
4)1()1()1()1(),,(4
434,3 k
kSCDPDC
Resulting DC relation
DC C1 C2 C3 C4 C5
D1 2 -3 -2 2 -1
D2 0 -2 -2 2 2
D3 -1 -1 3 -4 3
Normalized and “Thresholded” DC relation
[DC] C1 C2 C3 C4 C5
D1 0.75 0.125 0.25 0.75 0.375D2 0.5 0.25 0.25 0.75 0.75D3 0.375 0.375 0.875 0 0.875
[DC] 0.75 C1 C2 C3 C4 C5
D1 1 -1 -1 1 0
D2 0 -1 -1 1 1
D3 0 0 1 -1 1
0 10.50.25 0.75
0 1-1
Result of Cooperative Goods Evaluation Based on DC Relation
D2 is reliable, safe, not expensive,
but not easy to use
D1 is nicely designed, reliable, not
expensive, but not easy to use
D3 is easy to use, safe, but not
reliable
An Example: Calculating Value SD1,1
D1
P C1 C2 C3 C4 C5
S1 1 -1 -1 0 -1S2 0 -1 0 1 -1
S3 0 0 -1 1 0
S4 1 -1 0 0 1
DC C1 C2 C3 C4 C5
D1 2 -3 -2 2 -1
D2 0 -2 -2 2 2
D3 -1 -1 3 -4 3
8)1()1(02)1()2()1()3(12),,(5
11,11,1 j
jj SCDPDCSD
m
jikkjijikiik DDSSSCDPDCDSSD ,),,,(,,,
An Example: Calculating Value SC4,4
n
ijkkjijikjjk CCSSSCDPDCCSSC ,),,,(,,,
DC C1 C2 C3 C4C5
D1 2 -3 -2 2 -1
D2 0 -2 -2 2 2
D3 -1 -1 3 -4 3
P C1 C2 C3 C4C5
D1
S41 -1 0 0 1
D2
S4-1 0 0 1 0
D3
S4-1 -1 1 -1 1
6)1()4(1202),,(3
444,4,4 i
ii SCDPDCSC
Resulting SD and SC relations
SD D1 D2 D3
S1 8 4 6
S2 6 4 6
S3 4 6 12
S4 4 2 12
SC C1 C2 C3 C4 C5
S1 1 3 7 4 3
S2 0 4 2 6 4
S3 1 3 5 8 5
S4 3 4 3 6 2
… or similar to “Software” .
Fox’s evaluations should be rejected ifthey concern goods similar to “Communicator”
Evaluations obtained from thecustomer Fox should be accepted if heevaluates goods similar to “Reels” ...
Normalized and “Thresholded” SD relation
[SD] 0.75 D1 D2 D3
S1 1 -1 1
S2 1 -1 1
S3 -1 1 1
S4 -1 -1 1
FoxWolfCatHare
Only evaluation from the customerCat can be accepted if it concernsgoods similar to “Communicator”
Normalized and “Thresholded” SD relation
[SD] 0.75 D1 D2 D3
S1 1 -1 1
S2 1 -1 1
S3 -1 1 1
S4 -1 -1 1
FoxWolfCatHare
All four customers are expectedto give an acceptable evaluations
concerning “Software” related goods
… or reliability of a good .
Evaluation obtained from the customer Fox should be accepted if it concern usability (easy to use) of a good...
Fox’s evaluations should be rejected
if they concern design of goods
Normalized and “Thresholded” SC relation
[SC]0.75 C1 C2 C3 C4 C5
S1 -1 0 1 1 0
S2 -1 1 -1 1 1
S3 -1 0 1 1 1
S4 0 1 0 1 -1
FoxWolfCatHare
Nicely designed Expensive
Easy to use Reliable Safe
Problem 2:Deriving Internal Similarity Values
D C
S
Di’
SSk’,k’’
DDi’,i’’ CCj’,j’’
Di’’
Cj’
Cj’’
Sk’
Sk’’
Internal Similarity Values
Internal Similarity Values (ISV): binary relations between two subsets of D, two subsets of C and two subsets of S.
ISV are based on total support among all the customers for voting for the appropriate connection (or refusal to vote)
D C
S
Di’
SSk’,k’’
DDi’,i’’ CCj’,j’’
Di’’
Cj’
Cj’’
Sk’
Sk’’
Deriving Internal Similarity Values
Set A Set I
A’
A”
A’I
IA”
A’A”I
A’
A”
a)
Set A
Set I
A’
A”
A’I
JA”
A’A”IJ
A’
A”
b)
Set J
IJ
Via one intermediate set Via two intermediate sets
Internal Similarity for Customers:Goods-based Similarity
D C
SS’S’’D
S’’
S’DS’’
S’D
S S S S S S S D DSD' '' ' '' ' '',
Goods
Customers
Internal Similarity for Customers:Evaluation marks-Based Similarity
D C
SS’S’’C
S’’
S’
CS’’
S’C
S S S S S S S C CSC' '' ' '' ' '',
Evaluation marks
Customers
Internal Similarity for Customers:Evaluation marks-Goods-Based Similarity
D C
SS’S’’CD
S’’
S’DS’’S’C
CD
S S S S S S S C CD DSCD' '' ' '' ' '',
Customers
Evaluation marks
Goods
Internal Similarity for Evaluation Marks
DC
S
C’C’’S
C’’
C’
C’S
SC’’
DC
S
C’C’’D
C’’
C’C’D
DC’’
DC
S
C’C’’DS
C’’
C’C’D
SC’’DS
Customers-based similarity Goods-based similarity
Goods-customers-based similarity
Internal Similarity for Goods
Customers-based similarity Evaluation marks-based similarity
Evaluation marks-customers-based similarity
DC
S
D’D’’S
D’’
D’
D’S
SD’’
DC
S
D’D’’C
D’’
D’ D’C
CD’’
DC
S
D’D’’CS
D’’
D’ D’C
SD’’CS
Normalized and “Thresholded” DDC relation
[CD] 0.75 D1 D2 D3
C1 1 0 0
C2 -1 -1 0
C3 -1 -1 1
C4 1 1 -1
C5 0 1 1
[DC] 0.75 C1 C2 C3 C4 C5
D1 1 -1 -1 1 0
D2 0 -1 -1 1 1
D3 0 0 1 -1 1
[DD] 0.75 D1 D2 D3
D1 1 1 -1
D2 1 1 0
D3 -1 0 1
similar
neutral
different
Conclusion
Discussion was given to methods of deriving the total support of each binary similarity relation. This can be used, for example, to derive the most supported goods evaluation and to rank the customers according to their competence
We also discussed relations between elements taken from the same set: goods, evaluation marks, or customers. This can be used, for example, to divide customers into groups of similar competence relatively to the goods evaluation environment