Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla [email protected]

Similarity Evaluation Techniques for Filtering Problems

??Vagan TerziyanVagan TerziyanUniversity of JyvaskylaUniversity of Jyvaskyla

[email protected]

Evaluating Distance between Various Domain Objects and Concepts - one of the basic abilities of an intelligent agent

Are these two the same?

… No !The difference is

equal to 0.234

Contents

Goal Basic Concepts External Similarity Evaluation An Example Internal Similarity Evaluation Conclusions

Reference

Puuronen S., Terziyan V., A Similarity Evaluation A Similarity Evaluation Technique for Data Mining with an Ensemble of Technique for Data Mining with an Ensemble of ClassifiersClassifiers, In: A.M. Tjoa, R.R. Wagner and A. Al-Zobaidie (Eds.), Proc. of the 11th Intern. Workshop on Database and Expert Systems Applications, IEEE CS Press, Los Alamitos, California, 2000, pp. 1155-1159. http://dlib.computer.org/conferen/dexa/0680/pdf/06801155.pdf

Goal

The goal of this research is to develop simple similarity evaluation technique to be used for social filtering

Result of social filtering here here is prediction of a customer’s evaluation of certain product based on known opinions about this product from other customers

Basic Concepts:Virtual Training Environment (VTE)

VTE is a quadruple:

<D,C,S,P>• D is the set of goods D1, D2,..., Dn in the VTE;

• C is the set of evaluation marks C1, C2,..., Cm , that are used to rank the products;

• S is the set of customers S1, S2,..., Sr , who select evaluation marks to rank the products;

• P is the set of semantic predicates that define relationships between D, C, S

Basic Concepts:Semantic Predicate P

. te D to evalualect C to se

refuseselect or does not ,if S

;aluate D to ev

o select C refuses t,if S

;D product aluate the to ev

ark C selects mstomer S,if the cu

),S,CP(D

ij

k

i

jk

i

jk

kji

0

1

1

Problem 1:Deriving External Similarity Values

DC

S

DiCj

Sk

SDk,i

DCi,j

SCk,j

External Similarity Values

External Similarity Values (ESV): binary relations DC, SC, and SD between the elements of (sub)sets of D and C; S and C; and S and D.

ESV are based on total support among all the customers for voting for the appropriate connection (or refusal to vote)

DC

S

DiCj

Sk

SDk,i

DCi,j

SCk,j

Problem 2:Deriving Internal Similarity Values

D C

S

Di’

SSk’,k’’

DDi’,i’’ CCj’,j’’

Di’’

Cj’

Cj’’

Sk’

Sk’’

Internal Similarity Values

Internal Similarity Values (ISV): binary relations between two subsets of D, two subsets of C and two subsets of S.

ISV are based on total support among all the customers for voting for the appropriate connection (or refusal to vote)

D C

S

Di’

SSk’,k’’


Di’’

Cj’

Cj’’

Sk’

Sk’’

Why we Need Similarity Values (or Distance Measure) ? Distance between products is used to advertise the

customers a new product based on evaluation of already known similar products

distance between evaluations is necessary to estimate evaluation error when necessary, e.g. in the case of adaptive filtering technologies used

distance between customers is useful to evaluate weights of all customers when necessary, e.g. to be able to integrate their opinions by weighted voting.

Deriving External Relation DC:How well evaluation fits the product

DC CD P D C S D D C Ci j j i i j k i jk

r

, , ( , , ), ,

DC

S

DiCj

Sk2

DCi,j=3

Sk1

Sk3

Customers

Products Evaluation marks

Deriving External Relation SC:Measures customer’s competence in the use of evaluation marks

The value of the relation (Sk,Cj) in a way represents the total support that the customer Sk obtains selecting (refusing to select) the mark Cj to evaluate all the products.

SC CS DC P D C S S S C Ck j j k i j i j ki

n

k j, , , ( , , ), ,

Example of SC Relation

DC

SSk

Cj

D2

SCk,j=4

D1

D4

D3

CDj1 = -3

CDj2 = 6

CDj3 = 0

CDj4 = 1

Customers

Products Evaluation marks

Deriving External Relation SD:Measures customer’s competence in the products

The value of the relation (Sk,Di) represents the total support that the agent Sk receives selecting (or refusing to select) all the solutions to solve the problem Di.

SD DS DC P D C S S S D Dk i i k i j i j kj

m

k i, , , ( , , ), ,

Example of SD Relation

DC

SSk

Di

C1

SDk,i=2

C2

CD1i = -3

CD2i = 5

ProductsEvaluation marks

Customers

Normalizing External Relations to the Interval [0,1]

min(value)-max(value)

)valuemin( value=value valuegnormalizin

-

DC CDDC r

ri j j ii j

, ,,

2

SC CSSC n r

n rk j j kk j

, ,, ( )

( )

2

2 1

SD DSSD m r

m rk i i kk i

, ,, ( )

( )

2

2 1

n is the number of products

m is the number of evaluation marks

r is the number of customers

Competence of a customer

Di

Conceptual pattern of goods’ features

Conceptual pattern of evaluation marks definitions

GoodsEvaluation

marks

Cj

Customer

Competence in the goods

Competence in the evaluation marks

Customer’s Evaluation:competence quality in Products

Q Sn

SDDk k i

i

n( ) , 1

Customer’s Evaluation:competence quality in evaluation marks use

Q Sm

SCCk k j

j

m( ) , 1

Quality Balance Theorem

Q S Q SDk

Ck( ) ( )

The evaluation of a customer’s competence (ranking, weighting, quality evaluation) does not depend on the competence area “virtual world of products” or “conceptual world of evaluation marks” because both competence values are always equal.

Proof

Q Sn

SDn

SD m r

m rD

k k ii

nk i

i

n( )

( )

( ),,

1 1 2

2 1

1

2

2 1n

DC P D C S m r

m r

i j i j kj

m

i

n( ( , , )) ( )

( )

,

1

2

2 1m

DC P D C S n r

n r

i j i j ki

n

j

m( ( , , )) ( )

( )

,

...

...

1 2

2 1

1

m

SC n r

n r mSC Q S

k j

j

m

k jj

mC

k,

,

( )

( )( )

An Example

Let us suppose that four customers have to evaluate three products from virtual shop using five different evaluation marks available.

The customers should define their selection of appropriate mark for every product.

The final goal is to obtain a cooperative evaluation result of all the customers concerning the quality of products.

C set (evaluation marks) in the Example

Evaluation marks Notation

Nicely designed C1

Expensive C2

Easy to use C3

Reliable C4

Safe C5

S (customers) Set in the Example

Customers IDs Notation

Fox S1

Wolf S2

Cat S3

Hare S4

D (products) Set in the Example

D2 - Nokia Communicator 9110

D1 - Ultra Cast Spinning Reel

D3 - iGrafx Process Management

Software

Evaluations Made for the Good “Reel”

D1

P(D,C,S) C1 C2 C3 C4 C5

S1 1 -1 -1 0 -1

S2 0+ -1** 0 ++ 1* -1***

S3 0 0 -1 1 0

S4 1 -1 0 0 1Customer Wolf prefers to select mark Reliable* to evaluate “Reel” and it refuses to select Expensive** or Safe***. Wolf does not use or refuse to use the Nicely designed+ or Easy to use++ marks for evaluation.

Evaluations Made for the Good “Communicator”

D2

P C1 C2 C3 C4 C5

S1 -1 0 -1 0 1

S2 1 -1 -1 0 0

S3 1 -1 0 1 1

S4 -1 0 0 1 0

Evaluations Made for the Good “Software”

D3

P C1 C2 C3 C4 C5

S1 1 0 1 -1 0

S2 0 1 0 -1 1

S3 -1 -1 1 -1 1

S4 -1 -1 1 -1 1

Example: Calculating Value DC3,4

D3

P C1 C2 C3 C4 C5

S1 1 0 1 -1 0

S2 0 1 0 -1 1

S3 -1 -1 1 -1 1

S4 -1 -1 1 -1 1

r

kjikjiijji CCDDSCDPCDDC ,),,,(,,

4)1()1()1()1(),,(4

434,3 k

kSCDPDC

Resulting DC relation

DC C1 C2 C3 C4 C5

D1 2 -3 -2 2 -1

D2 0 -2 -2 2 2

D3 -1 -1 3 -4 3

Normalized and “Thresholded” DC relation

[DC] C1 C2 C3 C4 C5

D1 0.75 0.125 0.25 0.75 0.375D2 0.5 0.25 0.25 0.75 0.75D3 0.375 0.375 0.875 0 0.875

[DC] 0.75 C1 C2 C3 C4 C5

D1 1 -1 -1 1 0

D2 0 -1 -1 1 1

D3 0 0 1 -1 1

0 10.50.25 0.75

0 1-1

Result of Cooperative Goods Evaluation Based on DC Relation

D2 is reliable, safe, not expensive,

but not easy to use

D1 is nicely designed, reliable, not

expensive, but not easy to use

D3 is easy to use, safe, but not

reliable

An Example: Calculating Value SD1,1

D1

P C1 C2 C3 C4 C5

S1 1 -1 -1 0 -1S2 0 -1 0 1 -1

S3 0 0 -1 1 0

S4 1 -1 0 0 1

DC C1 C2 C3 C4 C5

D1 2 -3 -2 2 -1

D2 0 -2 -2 2 2

D3 -1 -1 3 -4 3

8)1()1(02)1()2()1()3(12),,(5

11,11,1 j

jj SCDPDCSD

m

jikkjijikiik DDSSSCDPDCDSSD ,),,,(,,,

An Example: Calculating Value SC4,4

n

ijkkjijikjjk CCSSSCDPDCCSSC ,),,,(,,,

DC C1 C2 C3 C4C5

D1 2 -3 -2 2 -1

D2 0 -2 -2 2 2

D3 -1 -1 3 -4 3

P C1 C2 C3 C4C5

D1

S41 -1 0 0 1

D2

S4-1 0 0 1 0

D3

S4-1 -1 1 -1 1

6)1()4(1202),,(3

444,4,4 i

ii SCDPDCSC

Resulting SD and SC relations

SD D1 D2 D3

S1 8 4 6

S2 6 4 6

S3 4 6 12

S4 4 2 12

SC C1 C2 C3 C4 C5

S1 1 3 7 4 3

S2 0 4 2 6 4

S3 1 3 5 8 5

S4 3 4 3 6 2

… or similar to “Software” .

Fox’s evaluations should be rejected ifthey concern goods similar to “Communicator”

Evaluations obtained from thecustomer Fox should be accepted if heevaluates goods similar to “Reels” ...

Normalized and “Thresholded” SD relation

[SD] 0.75 D1 D2 D3

S1 1 -1 1

S2 1 -1 1

S3 -1 1 1

S4 -1 -1 1

FoxWolfCatHare

Only evaluation from the customerCat can be accepted if it concernsgoods similar to “Communicator”

Normalized and “Thresholded” SD relation

[SD] 0.75 D1 D2 D3

S1 1 -1 1

S2 1 -1 1

S3 -1 1 1

S4 -1 -1 1

FoxWolfCatHare

All four customers are expectedto give an acceptable evaluations

concerning “Software” related goods

… or reliability of a good .

Evaluation obtained from the customer Fox should be accepted if it concern usability (easy to use) of a good...

Fox’s evaluations should be rejected

if they concern design of goods

Normalized and “Thresholded” SC relation

[SC]0.75 C1 C2 C3 C4 C5

S1 -1 0 1 1 0

S2 -1 1 -1 1 1

S3 -1 0 1 1 1

S4 0 1 0 1 -1

FoxWolfCatHare

Nicely designed Expensive

Easy to use Reliable Safe

Problem 2:Deriving Internal Similarity Values

D C

S

Di’

SSk’,k’’


Di’’

Cj’

Cj’’

Sk’

Sk’’

Internal Similarity Values

Internal Similarity Values (ISV): binary relations between two subsets of D, two subsets of C and two subsets of S.

ISV are based on total support among all the customers for voting for the appropriate connection (or refusal to vote)

D C

S

Di’

SSk’,k’’


Di’’

Cj’

Cj’’

Sk’

Sk’’

Deriving Internal Similarity Values

Set A Set I

A’

A”

A’I

IA”

A’A”I

A’

A”

a)

Set A

Set I

A’

A”

A’I

JA”

A’A”IJ

A’

A”

b)

Set J

IJ

Via one intermediate set Via two intermediate sets

Internal Similarity for Customers:Goods-based Similarity

D C

SS’S’’D

S’’

S’DS’’

S’D

S S S S S S S D DSD' '' ' '' ' '',

Goods

Customers

Internal Similarity for Customers:Evaluation marks-Based Similarity

D C

SS’S’’C

S’’

S’

CS’’

S’C

S S S S S S S C CSC' '' ' '' ' '',

Evaluation marks

Customers

Internal Similarity for Customers:Evaluation marks-Goods-Based Similarity

D C

SS’S’’CD

S’’

S’DS’’S’C

CD

S S S S S S S C CD DSCD' '' ' '' ' '',

Customers

Evaluation marks

Goods

Internal Similarity for Evaluation Marks

DC

S

C’C’’S

C’’

C’

C’S

SC’’

DC

S

C’C’’D

C’’

C’C’D

DC’’

DC

S

C’C’’DS

C’’

C’C’D

SC’’DS

Customers-based similarity Goods-based similarity

Goods-customers-based similarity

Internal Similarity for Goods

Customers-based similarity Evaluation marks-based similarity

Evaluation marks-customers-based similarity

DC

S

D’D’’S

D’’

D’

D’S

SD’’

DC

S

D’D’’C

D’’

D’ D’C

CD’’

DC

S

D’D’’CS

D’’

D’ D’C

SD’’CS

Normalized and “Thresholded” DDC relation

[CD] 0.75 D1 D2 D3

C1 1 0 0

C2 -1 -1 0

C3 -1 -1 1

C4 1 1 -1

C5 0 1 1

[DC] 0.75 C1 C2 C3 C4 C5

D1 1 -1 -1 1 0

D2 0 -1 -1 1 1

D3 0 0 1 -1 1

[DD] 0.75 D1 D2 D3

D1 1 1 -1

D2 1 1 0

D3 -1 0 1

similar

neutral

different

Conclusion

Discussion was given to methods of deriving the total support of each binary similarity relation. This can be used, for example, to derive the most supported goods evaluation and to rank the customers according to their competence

We also discussed relations between elements taken from the same set: goods, evaluation marks, or customers. This can be used, for example, to divide customers into groups of similar competence relatively to the goods evaluation environment

Documents

Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla [email protected]