Fuzzy Modeling for Item Recommender Systemsnorcio/papers/Submitted...Fuzzy Modeling for Item Recommender Systems Or A Fuzzy Theoretic Method for Recommender Systems Azene Zenebe, Anthony

Fuzzy Modeling for Item Recommender Systems Or

A Fuzzy Theoretic Method for Recommender Systems

Azene Zenebe, Anthony F. Norcio

Abstract—Representation of features of items and user feedbacks that are subjective, incomplete, imprecise and vague, and reasoning about their relationships are major problems in recommender systems. The paper presents a Fuzzy Theoretic Method (FTM) for recommender systems that handles the non-stochastic uncertainty induced from subjectivity, vagueness and imprecision in the data, and the domain knowledge and task under consideration. FTM further advances methods of fuzzy modeling in recommender systems as well as empirically evaluates the methods’ performance through simulations using a benchmark movie data. FTM is comprised of representation method for items’ feature and user feedbacks on these items, and a content-based algorithm based on various fuzzy theoretic similarity measures such as the fuzzy theoretic extensions of the Jaccard Index, Cosine, Proximity or Correlation similarity measures, and recommendation strategies such as the maximum-minimum or weighted-sum fuzzy theoretic recommendation strategies. Compared to the baseline Crisp Set based method (CSM), simulation runs of the FTM using the movie data show an improvement in precision with comparable recall and F1-measure. The improvement in the performance of FTM is attributed to the use of fuzzy modeling. Also, lower model size and recommendation size produce satisfactory recommendation accuracy. Moreover, depending on a task, a guideline for recommender systems designers that will help them choose a combination of one of the fuzzy theoretic recommendation strategies and the fuzzy theoretic similarity measures is provided. From the results we can conclude that modeling uncertainty using fuzzy set and logic improves the performance of content-based recommender systems.

Index Terms—recommendation algorithm, uncertainty, fuzzy theory, similarity measure ( if resubmitted in

IEEE Cybernetics …)

Or

Index Terms - Fuzzy system models; Fuzzy inference systems, Recommender Systems, Learning under Uncertainty (if submitted in Journal of Fuzzy Sets and Systems @ http://www.elsevier.com/wps/find/journaldescription.cws_home/505545/authorinstructions)

I. INTRODUCTION

Recommender systems are systems that provide users with recommendation of items and

information that help them to decide which items to procure or look based on the individual

customer preferences [1]. Recommendation systems are generally consisting of background data

such as historical data consisting of ratings from users of items before the recommendation

http://www.elsevier.com/wps/find/journaldescription.cws_home/505545/authorinstructions

2

begins; input data such as features of items or a user’s ratings of items in order to initiate a

recommendation; and an algorithm to combine the two and generates a recommendation[2].

There are many advances in recommender systems research since their beginning. However,

there are various venues in which current recommender systems need improvements. As result,

the performance and acceptance of the state-of-the-art recommender systems is far from

satisfactory. Adomavicius and Tuzhilin [3] suggest various areas of improvements for current

recommender systems: better methods for representing user behavior and information about

items; more advanced recommendation modeling methods; incorporation of contextual

information into recommendation process; utilization of multi-criteria ratings; development of

less intrusive and more flexible recommendation methods; and development of recommender

systems effectiveness measures.

A. The Problem

Representation on and reasoning about the behavior of customer and features of items raised a

number of challenging issues. Attributes of items and user behavior are subjective, vague and

imprecise. These in turn induce uncertainty in representing and reasoning on items’ features,

users’ behavior, and their relationships. This uncertainty is non-stochastic that is induced from

subjectivity, vagueness and imprecision in the data, and the domain knowledge and task under

consideration.

In relation to the items, the uncertainty is associated to what extent (e.g. low to high) the items

have some features. For instance, given a movie to what extent does the movie have drama

content or is highly drama? In relation to user behavior such as interest, the uncertainty is

associated to how to measure and to represent user interest as precisely as possible. In relation to

3

the domain task (recommendation of items), the uncertainty is associated with the types of

relationship that exist between the user behavior and item features, among users in terms of their

characteristics, and among items in terms of their features; as well as how to model these

relationships. This non-stochastic uncertainty that inherently exists is not studied in previous

recommender systems research[1, 4-8]. Ignoring this type of uncertainty puts the modeling in a

less realistic setting, and the resulting model does not precisely represent the reality.

B. The Proposed Solution

Fuzzy set theory and logic provide a way to quantify the non-stochastic uncertainty that is

induced from subjectivity, vagueness and imprecision [9]. Compared with traditional statistical

methods such as the Bayesian method that handles uncertainty due to randomness, using fuzzy

theory that handles uncertainty due to subjectivity, imprecision and vagueness, has the following

benefits [10, 11]: (i) the membership function in fuzzy theory is deliberately designed to treat the

vagueness and imprecision in the context of the application. Therefore, it is more reliable and

accurate to use fuzzy theory to model subjectivity and vagueness in the attributes; (ii) the

membership function can be continuous, which are more accurate in representing the attributes

of items and user feedbacks; and (iii) the fuzzy mathematical method is easy to perform once the

membership functions of attribute have been defined. These properties promise to provide the

framework for addressing the representation and inference challenges faced in recommender

systems research.

Our research is motivated by the “reclusive methods” proposed by Yager [12] as

complementary methods to conventional collaborative methods within fuzzy set framework.

Yager [12] presents a methodology consisting of a collection of justifications and rules for the

4

recommendations based on fuzzy set and fuzzy logic. However, he assumes the availability of a

representation scheme for each object that allows the development of an appropriate tool for

calculating the similarity relationship over the set of objects. He also assumes the availability of

similarity index between two objects in the range 0 to 1. Furthermore, there is no empirical study

to support or refute the effectiveness of using fuzzy modeling for recommender systems.

This research further develops and empirically evaluates a Fuzzy Theoretic Method (FTM) for

recommender systems using the problem of movie recommendation as the domain. Particularly,

the proposed approach is content-based that uses features of items as background data and a

user’s feedbacks such as ratings of items of the same kind as input. One of the limitations of

current content-based approach is it requires sufficient data about the behavior of a user and

features of items; and its performance also depends on these data and how these data are used –

represented and reasoned.

Within the framework of fuzzy modeling theory, first we advance the representational method,

recommendation strategies, and similarity measures for items recommender systems by

accounting for the induced uncertainty in item features, and user feedback as well as by

combining users’ feedback and items’ features. That is, we, are presenting an alternative

representation scheme, inference mechanisms and an item-item similarity based recommendation

algorithm constituting FTM that would improve the performance of recommender systems by

modeling the non-stochastic uncertainty discussed. Second, we empirically assess the effect of

the FTM on the performance of a movie recommender system. Third, we empirically assess the

effect of the variants inference mechanisms – combination of the recommendation strategies and

similarity measures - on the performance of a movie recommender system that use FTM. Forth,

we study the effect of the training or model size, recommendation size and test size on the

5

performance of the movie recommender system.

C. The Results

Using benchmark movie data from the MovieLens recommender system, first, the mean

precision, recall, and F1-measure of 55.18, 37.93%, and 38.08% are obtained, respectively. For

recommender systems where precision is the most important metric[13], compared to the baseline

Crisp Set -based method (CSM) with mean precision, recall and F1-measure 48.74%, 39.2%, and

38.32 respectively, FTM provides greater precision. Second, the algorithm is scalable with linear

time complexity proportionality to the size of distinct values of features of item which are very

small (e.g. 20 for genres); and empirically a mean model time in range 0.083 to 0.12 seconds for

the different similarity measures are found, and recommendation time approximately 0.021

seconds per movie is obtained.

Third, model size (number of items initially required for which a user need to provide

feedback) of 5 to 15, and recommendation size (number of items recommended at one time) of 3

to 10 produces satisfactory recommendation accuracy. Fourth, the analysis shows that testing

size has significant effect on performance __ precision increases as testing size increases, where

as recall decreases as testing size increases. A greater performance is shown when the testing

size is between 41 and 60. Fifth, the results of analysis of variance show that there are significant

different among the different alternative combination of fuzzy theoretic similarity measures and

recommendation strategies in their recommendation accuracy and latency.

D. Conclusion and Contributions

From the results we can conclude that modeling uncertainty using fuzzy set and logic improves

6

the performance of content-based recommender systems. The contributions of this paper are

many folds:

a. The research provides a representation framework for features of items and users’

feedback based on fuzzy theory to represent and reason on uncertainty due to

subjectivity, imprecision and vagueness. The research empirically shows that modeling

non-stochastic uncertainty using fuzzy theory improves the performance.

b. The research provides new algorithms for content-based item recommender systems.

c. Results of an exploratory study using various alternatives Fuzzy Set Theoretic similarity

measures and Fuzzy recommendation strategies for recommender systems are presented.

d. The research provides a guideline for recommender systems designers that will help

him/her to choose a combination of one of the fuzzy theoretic recommendation strategies

and the fuzzy theoretic similarity measures. Moreover, the research provides results of a

detailed empirical study on the effect of these two factors as well as the other factors

model size, recommendation size, and testing size.

The remainder of the paper is organized as follows. Section II presents a review of related

literature1. Section III presents the representation methods, inference methods, similarity

measures and algorithms for recommendation of items. Section IV describes the dataset,

experimental settings, and evaluation metrics. Section V reports the results of the empirical

experiment and followed by the discussion in VI. Finally, conclusion and future research

directions are presented in Section VII.

1 Throughout the paper, we use the term item as a synonym to product, the term user as a synonym to customer and client, and the term feature as a synonym to attribute, characteristic, and variable.

7

II. RELATED LITERATURE

A. Recommender Systems

There are various classifications of recommendation methods. The three most widely used

methods of recommendation are content-based, collaborative filtering and hybrid methods. In

content-based recommendation, an item is recommended to a user mainly based on his/her

personal past actions like purchases, queries, and ratings and the characteristics of the item. In

collaborative recommendation, an item is recommended to a user based on other similar users’

actions like interests, preferences and ratings. This employs a learning technique that is called

collaborative filtering [4, 14-16]. Due to the limitations of the content-based and collaborative

filtering based recommendations, a hybrid approach is studied [2]. Collaborative filtering cannot

be applied to new items that have not been rated, and to a user who has not been assigned to a

group (a new user problem or the “cold start” problem). Content-based recommender systems

cannot predict user’s new behavior like having new interest because they are limited by what a

user has seen in the past; and it also requires careful features selection, representation, and

inference [1, 3, 17].

Based on the sources of data and how these data are used for recommendation, Burke [2] has

classified recommendation methods into: collaborative, content-based, demographic, utility-

based, and knowledge-based. There also various variants of hybrid methods that combines the

single methods. These methods are discussed in detail along with their pros and cons in [2].

Moreover, a recent survey of state-of-the-art recommender systems along with suggestions for

improvements is found in [3].

[Insert, here content-based on movie recommender systems] – see dissertation

8

In content-based recommendation, standard machine learning techniques such as

clustering, Bayesian networks and induction learning are applied in forming attribute-based

models. However, Bayesian network found to be suitable for environments in which user

preference models must not be updated rapidly or frequently; and clustering and decision

(association) rule techniques usually produce less-personal recommendation and results with

lower accuracy compared to the nearest neighbor algorithms [15]. Examples of application

systems that employ action-to-items based methods are Amazon Eyes and eBay Personal

Shopper.

Several numbers of studies on CF algorithms exists [6, 15, 18-21]. Deshpande and Karypis [6]

develop item-based Top-N recommendation algorithms that are collaborative type and faster

than traditional user-user collaborative algorithms with comparable recommendation recall.

Moreover, results of evaluation of these CF algorithms for recommender systems using the

MovieLens dataset are reported in terms of precision and F1 measure in [15, 19].

Research efforts that address the representation of and reasoning on user behavior and

information about items under uncertainty due to their subjectivity, incompleteness, imprecision,

and vagueness are rare, e.g. [22-25]. These researches focus on theoretical benefits of fuzzy set

and logic in data mining and on fuzzy clustering techniques such as [22, 25-31] ???. They are

also less applicable for recommender systems???. Yager [12] presents a theoretical framework

for constructing recommender systems using fuzzy modeling methods. This paper extends

Yager’s framework, and empirically evaluates the methods for movie recommendation. Nasraoui

and Petenes [24] have empirically compared the performance of their Web pages

recommendations using fuzzy inference engine with that of collaborative filtering and nearest-

profile based Web pages recommendation. They concluded that fuzzy methods have provided

9

higher coverage with low computational cost, are faster and require much lower main memory,

and are intuitive in dealing with the natural lack of clear boundaries in user interests.

B. Fuzzy Set and Fuzzy Logic

Fuzzy set theory consists of mathematical approaches that are flexible and well-suited to

handle incomplete information, the un-sharpness of classes of objects or situations, or the

gradualness of preference profiles [32]. Fuzzy set theory and logic provide a way to quantify the

uncertainty due to vagueness and imprecision [9]. Membership functions, a building block of

fuzzy sets, have possibilistic interpretation, which assumes the presence of a property and

compares its strength in relation to other members of the set.

A fuzzy set A in X is characterized by its membership, which is defined as [33]:

]1,0[:)( →∈ XxxAµ , where X is a domain space or universe of discourse. Alternatively, A can

be characterized by a set of pairs: })),(,{( XxxxA A ∈µ= . According to the context in which X is

used and the concept to be represented, the fuzzy membership function, )(xAµ , can have

different interpretations [34]. As a degree of similarity, it represents the proximity between

different pieces of information. For example, movie x in the fuzzy set of "drama movies " can be

estimated by the degree of similarity. As degree of preference, it represents the intensity of

preference in favor of x, or the feasibility of selecting x as a value of X. For instance, a movie

rating of 4 out of 5 indicates the degree of a user's satisfaction or liking with x based on certain

criteria, say movie attributes such as content-intensity of action, drama, and humor. These two

interpretations are used in this research.

• About Fuzzy Logic and how it is used here (one paragraph).

10

III. FORMALISM OF THE REPRESENTATION AND INFERENCE METHODS

The proposed fuzzy theoretic content-based approach is based on a user’s previous feedbacks,

and features of the new items and features of the set of items for which the user has provided

feedback. The rationale of this method is users are more likely to have interest in item like movie

that is similar to the items like movies they have experienced and liked. This approach is useful

for new item like movie with no or few user ratings. It is solely based on one user’s previous

interest expressed by ratings. The representation scheme, inference engine consisting of

recommendation strategies and similarity measures, and the algorithm of the proposed method

are presented in this section.

A. Items Representation using Fuzzy Set

For an item described with multiple attributes, more than one attribute can be used for

recommendation. Moreover, some attributes can be multi-valued involving overlapping or non-

mutually exclusive possible values. For example, movies are multi-genres and multi-actors [35].

These values of multi-valued attributes in an item can be represented more accurately with in a

fuzzy set framework than with in a crisp set framework.

Let an item Ij (j = 1 … M) be defined in the space of an attribute X ={x1, x2, x3, …. xL}, then Ij

can take multiple values such as x1, x2, …, and xL. If these values of X can be sorted in the

decreasing order of their presence in the item Ij expressed by degrees of membership, then the

membership function of item Ij to value xk (k = 1 …L), denoted by )(Iµ jxk, can be obtained

heuristically. Hence, a vector Xj={( xk, )(Iµ jxk), k= 1… L} is formed for Ij. )(Iµ jxk

can be

interpreted as the degree of similarity of Ij to a hypothetical (or prototype) pure xk type of the

item; or as the degree of presence of value xk in item Ij.

11

We use movie as the domain and movie genre as the attribute to formalize and apply the

heuristic. According to the work by Cooper-Martin [36] in movies marketing application, most

movies are selected for pleasure and expenditures of time. And users choose movies based on

what they like and enjoy. Furthermore, users use subjective movies features such as “funny”,

“romantic” and “scary” (all are a kind of movies genres) to select movies more than objective

features such as the director, theatre location and admission price which are useful but are less

important.

The reasons why genre is considered as the major attribute for the representation of movies,

and similarity computation are: movie genres describe the content of movies, and movies are

multi-genres [35]; and analysis of descriptions of main film genres shows that genre g1 of movies

(e.g. action) and genre g2 of movies (e.g. adventure) are overlapping in terms of their subject

matter and other movie’s attributes. As result of the findings in [36] movies highly liked by users

(rated 4 and above) can be grouped into similar categories by subjective features of movies such

as genre and MPAA rating.

Given the definition of a movie in the space of genre (G), a movie can have one major genre

denoted by x1 and one or more minor genres x2, x3, and so on, in the decreasing order of their

degrees of presence in a movie. The degree of membership of movie Ij (j = 1 …M) to genre xk (k

= 1 …N) is denoted by )(Iµ jxk, which can be obtained heuristically from domain analysis.

Hence, for Ij, we can form a vector Gj={( xk, )(Iµ jxk), k= 1… N}. Since membership function

in fuzzy set theory is deliberately designed to treat the vagueness and imprecision in the context

of the application [10], to determine the degree of genre presence in movies, we take the

following heuristic approach:

12

Step 1: Sort xk in descending order of )(Iµ jxk. In IMDB (www.imdb.com2) the genres of

movie Ij are presented in the order of significance. For example, movie ‘King Kong (2005)’ has

Action as a major genre, and Adventure as the 1st minor, Drama the 2nd minor, Fantasy as the 3rd

minor, and Thriller as 4th minor genres.

Step 2: Assign higher degrees of membership value to more important genres of a movie. For

instance,

If Ij has only one genre, then )(Iµ jxk= 1 and )(Iµ jxk

=0 for all k=2 to N.

If Ij has two genres, then )(Iµ jxk= 1, )(Iµ jxk

= 0.7 and )(Iµ jxk=0 for all k=3 to N.

If Ij has three genres, then )(Iµ jxk= 1.0 and )(Iµ jxk

= 0.50, )(Iµ jxk=0.20 and )(Iµ jxk

=0

for all k=4 to N.

Based on the heuristics illustrated for a movie, the possibility for item Ij to take different values

of X varies, and the membership function should meet the following three criteria: 1) assigning

higher degree of membership to major values than minor values; 2) assigning 0 to values that are

not associated with the item; 3) degrees of membership should be normalized to the range of

[0,1]; and 4) the same value of X at same rank positions between different items should have

varying degrees of membership values if the number of values of X associated with the items are

different. We represent this type of heuristic with a Gaussian-like fuzzy set membership

function, as shown in (1).

)1(|L|*x

j

k2/)( −= kr

kj rI αµ (1)

2 “IMDb History," vol. 2004: Internet Movie Database Inc., n.d.http://www.imdb.com/

13

where N=|Lj| is the number of values of X associated with Ij and rk (1≤rk ≤|Lj|) is the rank

position of value xk, and α > 1 is a parameter used as a threshold to control the difference

between consecutive values of X in Ij. Moreover, α is the only parameter that needs to be

determined for items under consideration.

The reciprocal function representation defined as 1/k for k=1 to N is also investigated. It does

not consider the total number of different genres exist in a movie leading to the same degree of

membership value for same genre at same rank position with different number of genres. We

also consider a uniform distribution membership of genres in a movie: assuming a movie with

multiple genres has equal degree of genre presence for all occurring genres represented by 1 or 0.

This is the baseline Crisp Set representation of movies in the space of genres used in the

empirical evaluation.

The type of function that is suitable dependents on the application context, however in

certain cases the meaning captured by fuzzy sets is not too sensitive to the variations in the

shape. In practice, triangular, trapezoid, Gaussian function, S-function, function−Γ and

exponential-like function are the most commonly used membership functions. Moreover, in

practice, suitable membership function's shape is assumed a priori and its parameters are

determined by domain expert or using machine learning techniques[37]. Therefore, the

membership function (1) is used. Other membership with similar characteristics as (1) can also

be considered.

For example, with α set to 1.2 (after various trails), the movie ‘King Kong (2005)’ is

represented in terms of genres: for |Lj| =5 and xj = {(Action, 1), (Adventure, 0.366), (Drama,

0.272), (Fantasy, 0.211), (Thriller, 0.168)}. Furthermore, for a user 7, Table 1 shows the

14

representation of sample of the rated movies. The validity and reliability of the representation

and inference mechanism are further investigated in section V using the movie dataset.

Table 1: Membership degree of movie to genres for a user 7 Gj (vector for jth movie)

Movie (Ij)Rating xj1 xj2 xj3 … xj15 xj16

56 4 0.6831.0000.438 … 0.000 0.000

79 5 1.0000.0000.000 … 0.467 0.000

89 3 0.6830.0000.000 … 0.000 1.000

.. .. .. .. … .. ..

254 2 0.4380.0001.000 … 0.000 0.0001016 1 0.0000.0001.000 … 0.000 0.000

The representation scheme can be extended to recommender systems based on a combination

of multiple attributes. For example, one can use movie genre describing the content of movies as

the first attribute and actresses/actors as the second attribute. The actors in a movie can be

represented in a vector A={a1, a2, … ak} for K actors. The degree of role or importance of an

actor or actress ak in a movie mi can be represented by degree of membership associated with the

fuzzy set degree of role or importance. That is, Aj ={(ak, )j(mkaµ ), for k=1 to K}, where

)j(mkaµ can be determine heuristically. A single attribute of an item and a user’s feedback on the

items are considered in this paper.

The representation scheme presented for movie can be generalized and applied to any item

with similar characteristics as movie, i.e. multi-valued attributes that can be ordered. A few

examples are music, TV shows, and Book. The heuristic that leads to the membership function in

(1) is developed based on our analysis of the movies dataset, and literature on movies[36] and a

limited experiment on α. This heuristic, for instance, assumes that two genres will not have equal

degree of presence in a movie. This assumption requires further research. Moreover, in future

15

research, studying various membership functions and finding optimal α through evolutionary

computing is needed.

B. User Feedback Representation using Fuzzy Set

User rating is the most widely used user feedback in recommender systems. It is a proxy

variable used for measuring user degrees of interest in an item. In the popular collaborative

algorithm (e.g. [15],[6]) user ratings are represented and interpreted as binary values-those liked

or disliked. In 5-scale ratings, above 3 are considered as liked. However, user rating is

intrinsically imprecise as user may give different ratings to same item at different time and

situation due to the difficulty to make a distinction between rating 4 and 5, and 1 and 2 by the

users. Moreover, the same rating say 4 on a scale of 5 given by two users do not necessarily

imply equal degrees of interest in an item. For pessimist users, a rating of 4 may mean strongly

liked but for optimist rater it may mean somewhat liked. Is the difference between ratings 3 and

4 same as the difference between 4 and 5? These all contribute to fuzziness that arises from

human thinking processes instead of randomness associated with experiments.

Therefore, user interest based on user rating is treated as fuzzy variable and its uncertainty is

represented using possibility distribution function (п). п is defined to be numerically equal to

membership function [10]. For the fuzzy variable degrees of interest in an item (DI) consisting

of Strongly Liked (SL), Liked(L), Indifferent(I), Disliked (D), and Strongly Disliked (SD) fuzzy

values, and associated with user rating (R) expressed in continuum from Minimum value (Min)

to Maximum value (Max). Then, the proposition ‘a User has strongly liked an item I’ has the

possibility distribution function пR(I)=µSL(R=r) , for r between Min and Max. Under this

interpretation or semantic of the fuzzy variable DI, user rating is represented and reasoned using

possibility distribution function by treating the rating as fuzzy number. For instance a rating 4 on

16

5 a scale which refers to strongly liked is represented in terms of its possibility distribution

function values={µSL(R=4)/4, µL(R=4)/4,…,µSD(R=4)/4}. Without losing generalization, a half

triangular fuzzy number, which is the simplest models of uncertain quantity, is used to represent

the degree of positive experience a user has in relation to an item. Equation 2 defines the half

triangular fuzzy number membership function; for user rating r on Ii є [Min to Max] and A is a

fuzzy set on DI

µA(Ii) =(r - Min)/(Max – Min) (2)

As result, a set of items liked by a user denoted by E is defined as: {Ii : µA(Ii) > 0.5, i.e., Ii : r >

(Min+Max)/2}. Due to current practice and available datasets, the range of rating is restricted to

1 to 5. Extending the possible range of rating, e.g., from -100 to 100, and representing within

possibility theory framework are expected to produce more accurate measurement and

representation of user interest in an item. These effects in turn are expected improve the accuracy

of recommender systems that need further study.

C. Inference Engine and Algorithm

Based on the representation scheme defined for items and user feedback to these items, the

inference engine consisting of the recommendation strategies and the similarity measures are

defined.

1) Recommendation Strategies There are different alternatives for inference and making recommendation within fuzzy and

possibility theory framework [12]. The two alternatives that regard both user feedbacks such as

previous user ratings and similarity between previously liked items and a new item are

considered.

17

a) Weighted-Sum recommendation method

For each targeted item Ij, calculate the weighted sum recommendation confidence as:

),()()(IRkIj1 jkE kE IISI∑ ∈

= µ (3)

E is a set of positively experienced (liked) items by users, and )( kE Iµ is the membership of

the item Ik to the fuzzy set E. Furthermore, ),( jk IIS is the similarity between Ij and Ik computed

using the techniques defined next. A normalized evaluation score for each jI ’s recommendation

confidence is obtained using {[ ])(IR/)(IR)(INR k1j1j1k

Max=

b) Maximum-Min or Maximum-Product recommendation methods

For each targeted item Ij, calculate the max-min recommendation confidence as:

{ ( )][ )(),()(2 kEkjEI

j IIISMaxIRk

µ∧=∈

Where ∧ is substituted by min operator, and results in:

{[ {( ) ])(),,()(2 kEkjEIEi

j IIISMinMaxIRk

µ∈∈

= (4)

A normalized evaluation score can be obtained by dividing each jI ’s recommendation

confidence by the maximum of recommendation confidence scores. Finally, the results of

recommendation confidence score )( jk IR is a degree of support to recommend jI , and can be

interpreted as how much the recommender system assumes this user will like the item from the

set of items that are not experienced by the user.

2) Fuzzy Theoretic Similarity Measures

One of the most important issues in recommender systems research is computing similarity

between users, and between objects (items, events, etc.). This in turns highly depends on the

appropriateness and accuracy of the methods of representation.

18

In fuzzy set and possibility framework, similarity of users or items is computed based on the

membership functions of the fuzzy sets associated to the users or items features. Similarity is

studied and applied in taxonomy, psychology and statistics. Similarity is subjective and context

dependent. The set-theoretic, proximity-based and logic-based are the three classes of measures

of similarity [38]. Based on the results of the study by Cross and Sudkamp [38] those measures

that are relevant for items recommendation application are adapted .

jI and kI are defined as {( xi, )(Iµ jx i), i= 1… N} and {( xi, )(Iµ kx i

), i= 1… N}; and

represent the possibility distribution of the item Ij and Ik in the space of X with cardinality N. A

similarity measure between Ij and Ik is denoted by ),( jk IIS . The similarity measures that are

considered are: Fuzzy Set Theoretic in (5), Fuzzy Theoretic Cosine in (6), Fuzzy Theoretic

Proximity (Minkowski’s distance based) in (7) and Fuzzy Theoretic Correlation-like in (8).

{{ )()((

)()(),(1

jIxIx

jIxIxIIS

ik

ii

ik

i

ijk

µµ

µµ

U

I=

=

∑∑

∑

2)((2))((

)(*)(),(2

k iik

i

i ik

ijk

jIxIx

jIxIxIIS

µµ

µµ

{

21

1

2)()(),(2

Where

)(),(

),(21),(3

∑=

−=

−=

L

i jIixkI

ixjIkId

jIixkI

ixi

Max

jIkIdjIkIS

µµ

µµ

(5)

(6)

(7)

21)j(I

ixµ*2jIZ

21)k(I

ixµ*2kIZ

Where

22

214

∑

−

=

∑

−

=

+−= )j,Ik(Id

jIZkIZ

)j,Ik(IS

As a baseline, for assessing the effect of fuzzy representation scheme and recommendation

strategy on item feature (e.g. genre) represented in crisp set in addition to user feedback (e.g.

rating), is defined in (9). It is called Crisp Set Based Similarity Measure (CSM). In (9) )( ki

Ixµ is

1 if the feature value ix exists in kI . Otherwise )( ki

Ixµ is assigned to 0. (5) and (9) are similar

in all respect except the difference in the representation scheme. Moreover, in (5) and (9) the

intersection operator is replaced by minimum; maximum operator replaces union; and | | is the

cardinality measure replaced by the sum of the membership degrees.

{{ )()((

)()(),(5

jIxIx

jIxIxIIS

ik

ii

ik

i

ijk

µµ

µµ=

U

I

For example, movies I1=Copycat 1995: Crime/ Mystery/Thriller/Drama, and I2=Gru

Horror/Thriller/Mystery with their crisp (I1 and I2) and fuzzy (G1 and G2) represen

shown in Table 2. Using crisp set theoretic (9), crisp cosine, and crisp distance me

similarity between I1 and I2 are 0.50, 0.58, and 0.73 (|1 – distance|), respectively. Fin

fuzzy theoretic similarity measures (5), (6), (7), and (8) the similarity coefficients are

0.55 and 0.18. The difference in the representation leads to the variation in similarity

)

(8)

(9

19

dge 2004:

tations are

asures the

ally, using

0.24, 0.25,

measures

20

between I1 and I2. The effect of these differences in similarity measures on recommender

systems’ accuracy is evaluated in Section Five.

Table 2: Movies Representation in Space of Genres Crime Horror Mystery Thriller DramaI1 1 0 1 1 1G1 1 0 0.44 0.35 0.29 M2 0 1 1 1 0G 0 1 0.41 0.47 0

3) Algorithm and Computational Complexity

The general algorithm is presented in Figure 1. First, the algorithm uses items for which the

user has expressed a positive degree of interest or preference; and then the algorithm finds

similar items that the user has not known, experienced or possessed. Using the representation

schemes, the various similarity measures and inference strategies various algorithms are

designed and implemented. The evaluation of these algorithms for a simulated movie

recommender system is presented in Section V.

i) Compute Recommendation Scores for targeted items //Let jI and kI are defined as {( xi, )(Iµ jx i
), i= 1… N}; and {(xi, )(Iµ kx i), i= 1… N}

//Input: user feedbacks (RI) =Vector of < item id, attributes’ values xi, feedbacks on the item> //Input: Items available for recommendation (PI) = Vector of < item id, attributes’ values> //Input: training size (R), potential number of items for recommendation (P)//Output: Degree of memberships of an item to values of an attribute //Output: Degree of a user’s interest in an item as recommendation confidence score //Output: NRCS=<user id, item id, recommendation score, normalized score> // E = Recent items with positive feedback by a user, defined as: {Ii : r > (Min+Max)/2} // where A is the set Strongly Liked (SL) or Liked(L). For each user do {

E � Select R items with positive feedback from RI //For R items with positive feedback from RI For k=1 to R do {

Ik � Load the kth item from E For ∀xi, compute )(Iµ tx i

//Fuzzification of attribute’s content of an item using (1) Compute µA(Ik) //Fuzzification of user feedback (rating) on an item using (2)

} next item //For all targeted items available for recommendation compute confidence scores For j= 1 to P do { Ij � Load the jth item from PI For ∀xi, compute )(Iµ jx i

For k=1 to R do { compute ),( jk IIS //similarity between two items using any of (5) to (9) RCS (Ij) � Compute Recommendation Score (µA(Ik), ),( jk IIS ) using (3) or (4) }}next targeted item // Compute Normalized recommendation scores For ∀j, maxNR � Maximum{RCS (Ij)} For j= 1 to P do { NRCS(Ij) � RCS(Ij)/maxNR }

}next user ii) TOP-N Recommendation //Input: Normalized Recommendation Confidence Scores (NRCS) //Output: Top-n items For each user Sort NRCS by normalized recommendation score in descending order Select top n items Recommend the top n items Figure 1: Algorithm for Item-Item Similarity Based Recommender System

21

22

Let M be the total number of items for which feedbacks are available from a user, e.g. 20 to

737 movies rated. Let N be the total unique values of the feature considered, e.g. 20 genres.

Then, to compute recommendation scores for a single item (say movie) for a user, the

computational complexity of the algorithm is in the order of O(MN). This computational

complexity is greatly reduced compared with those of both user-user and item-based

collaborative filtering algorithms, which have an upper bound computation complexity in the

orders of O(n2m) and O(m2n) respectively for m items rated by n different users [6]. Moreover,

since M*N is lower than m*n, the proposed algorithm is scalable and its time complexity is less

than that of traditional collaborative algorithms.

IV. EVALUATIONS

The performance of the proposed method and algorithm are evaluated using movie as an item.

We select movie as the test domain because movies have multi-valued attributes with vague and

subjective features. For example, movies are multi-genres, multi-actors, etc. [35]. Moreover, the

following features of movies support inference based on movie genres and user past movies

watching behavior [36]: (i) As one of the experiential products, mostly movies are selected for

pleasure and expenditures of time. Thus, consumers choose movies based on what they’ve liked

and enjoyed before; and (ii) Consumers use subjective features such as “funny”, “romantic” and

“scary” (reflect movies genres) to select movies more than objective features such as the

director, actresses/actors, theatre location, and admission price.

Simulation recommender systems for movies are designed and developed to assess the

effectiveness of the proposed methods. The simulation system works as follows:

a. For each user, it randomly splits the ratings dataset into a training set and a test set.

23

b. Using the training set, it computes recommendation confidence score for each item in the

test set using the different similarity measures and recommendation strategies, Figure 1.

c. For each user, using the movies in the testing set, it computes Top-N recommendations and

the recommendation accuracy– precision, recall, and F1–measure. Moreover,

computational time for learning user preference and presenting the recommendation are

recorded.

d. Using different random selection of the movies into testing and training sets, 10 different

runs are executed to avoid sensitivity to sampling bias, and the average results are reported.

A. Dataset and Preprocessing

The benchmark dataset from MovieLens at the University of Minnesota

(http://movielens.umn.edu), which has been widely used in recommendation research, is

employed in this study. The dataset includes movie attributes, user ratings, and simple user

demographic information. The dataset consists of 100,000 ratings (1-5) from 943 users on 1682

movies; and each user has rated at least 20 and at most 737 movies. In the dataset, movies are

described with: movie id, movie title, release date, video release date, IMDb URL, and 20 genres

including action, adventure, animation, children's, comedy, crime, documentary, drama, fantasy,

film-noir, horror, musical, mystery, romance, sci-fi, thriller, war, western, family, and unknown.

Genres in the MovieLens dataset are represented with binary values, which do not reflect the

true content of movies in the genre space. Therefore, we use the proposed representation scheme

by incorporating information about movie genres from the Internet Movie Database (imdb.com),

which is a large database consisting of comprehensive information about past, present and

upcoming movies.

http://movielens.umn.edu/

24

B. Evaluation Metrics

The accuracy metric and latency are computed for measuring the performance of the proposed

method.

1) Accuracy metrics Accuracy is a commonly used metric for a recommender system based on user tasks or goals

[13]. The accuracy metrics includes predictive and recommendation accuracy measures.

Predictive accuracy measures such as mean absolute error, mean square error and percentage of

correct predictions are found to be less appropriate when the user task is to find ‘good’ items and

when the granularity of true value is small because predicting a 4 as 5 or a 3 as 2 makes no

difference to the user [39], [13]. Instead, recommendation accuracy metrics including recall,

precision and F1-measures are more appropriate. Approximations to the true precision and recall

are computed using movies for which ratings are provided and held for testing. This approach of

measuring performance is widely used in recommender systems research, e.g., [6], [19].

Precision measures the ratio of correct recommendations being made. Recall reflects the

coverage or hit rate of recommendations. F1-Measure = (2*precision*recall)/(precision + recall)

is a single metric that combines precision and recall. They are defined as:

Precision = (no of movies in the TOP-N movies with rating ≥ 4) / N

Recall = (no of movies in the TOP-N with rating ≥ 4) / M

Where M is total number of movies rated as interesting (rating ≥ 4) in the test cases; and N is

total number of movies recommended.

2) Computational Time Computational time refers to the time the system takes to provide list of ‘good’ movies for a

user. It is consisting of model time and recommendation time. Model time is the time to predict

the degree of a user interest in a movie. Recommendation time is the time to find top-N ‘good’

movies for a user. Both are measured in milliseconds.

25

C. Experimental Design

The different evaluation metrics defined such as precision, recall, F1-measure, model time and

recommendation time are the dependent variables. Table 3 presents the independent variables.

On each of the dependent variable, a 2 x 5 x 6 x 12 factorial design is conducted. In this design

the fixed factors are recommendation strategy, similarity measure, training size (no of training

cases) and recommendation size (value of N for Top-N). The effect of testing size is also studied

as covariate variable. Moreover, normalization is applied on the recommendation scores; all

statistical testes are done at 5% level; and movies with ratings 4 and 5 are considered as movies

liked by users.

Table 3: Independent Factors Independent Factor Description Possible values Recommendation strategy

Fuzzy reasoning as recommendation strategies

Weighted-Sum or Maximum-Min

Similarity Measure Analogical reasoning using fuzzy set and possibility theory

Crisp Set-theoretic3, Fuzzy Set-Theoretic, Cosine, Proximity-based, and Correlation-like

Model size is the number of training cases (the K most liked movies by a user)

Movies rated by users

5,10,15,20,25,30

Recommendation size is Top-N

Total number of items to be recommended

3,4, 5, 10,15, 20, 25, 30, 35, 40, 45, 50

No of testing cases (no of items considered for recommendation)

Movies assumed not seen or experienced by a user

Total number of rated movies minus total number of movies considered for training

Simulation implementations and simulation runs are performed on an Intel Pentium(R) 4,

running at 2.66 GHz, 504 Mega Bytes of memory, and Windows operating systems workstation.

3 This is used as a baseline to assess the effect of fuzzy representation and reasoning

26

The Java 2 programming language, Sun ONE Studio 4 CE integrated development environment

(IDE) for Java technology and Microsoft Access 2002 are used as software tools for

implementation and simulation runs of the movie recommender system.

V. RESULTS

After presenting the result of the reliability and validity test, first the mean performance of the

proposed algorithms is presented along with the results of comparative analysis between a Fuzzy

Theoretic Method (FTM) that includes Fuzzy Set-Theoretic, Cosine, Proximity-based, and

Correlation-like, and the baseline Crisp Set similarity-based method (CSM) defined in (9).

Second, the results of analysis of the effect of the different factors __ similarity measure,

recommendation strategy, training size, recommendation size and testing size __ on the

recommendation performance are presented. Finally, the results of a reasonable comparison in

the performance between the proposed method and other methods that have used the MovieLens

dataset are presented.

A. Validity and Reliability Analysis

The FTM uses recently movies liked by the user; and it depends on the degree of similarity

between the targeted movie and those rated movies as well as the aggregation strategy. They are

based on their degree of memberships to genres under the assumption that users prefer movies

with similar genres. Figure 2 visually presents the distribution of the genres by the three

preference categories for user 7 (see sample data in Table 1). It shows that User 7 likes drama,

romance and adventure; dislike thriller; and indifferent to comedy, action and crime.

27

disliked liked indifferent

Preference

0.00000

0.10000

0.20000

0.30000

0.40000

0.50000

Mean

DramaComdeyActionThrillerRomanceAdventureAnimationCrime

Figure 2: Genre Preferences distribution for User 7: <age=57, Male, Administrator>

Based on the representation of movies’ genres using (1) as indicated in Figure 2, we check

the soundness of the representation by verifying the following assertion: users have preference to

movies of specific genres (e.g. Drama, Comedy and Family) with some degree of memberships

over others (e.g. Action, Horror and Adventure). First, movies are grouped into three groups:

group 1 consisting of movies with rating 1 and 2 (labeled as disliked), group 2 consisting of

movies with rating 4 and 5 (labeled as liked), and group three consisting of movies with rating 3

(labeled as indifferent). Second, One-way ANOVA is run and it shows that there are significant

differences in the average degree of membership of movies for the different genres among those

movies which are liked disliked and indifferent by users with the exception of Animation,

Documentary, Musical and Science Fiction. Had it been a random occurrence of genres with

respect of users’ interest, the results would have been non-significant. The reason for non-

significant difference in the four genres is due to their rare occurrence in movies database.

Therefore, the representation scheme supports the assertion that users could prefer one or more

28

genres with varying grades.

Since recommendation are made using recommendation scores, another way of checking the

reliability of the representation and inference procedure is to test correlation between user ratings

and recommendation confidence scores computed using (3) and (4) per a user. Using the

Spearman's rank correlation, significant bi-variants relationships are found. This validates the

soundness of the inference procedure.

B. Overall Performance

Mean of recommendation accuracies by recommendation strategy and similarity measure are

presented in Figure 3 (a) and (b). The maximum mean precision, recall and F1 measure are

around 55%, 38%, and 38% for FTM compared to the 49%, 39% and 38% obtained for the CSM

(Crisp Set Similarity based) approach, respectively. Figure 4 shows the average precision for

different recall levels. Overall an unsteady decline in precision is observed as recall increases,

and the decline is consistent through all similarity measures. The Fuzzy Set and Cosine based

approaches have better precision for fixed recall. Why? How? So what ???

29

0.00

0.10

0.20

0.30

0.40

0.50

0.60

precision45 recall45 f1ratio

Crsip Set

Fuzzy Set

Cosine

Proximity

Corrleation

Figure 3 (a): Average/Mean recommendation accuracies by similarity measures for

Weighted-Sum Recommendation Strategy

0.00

0.10

0.20

0.30

0.40

0.50

0.60

Precision Recall F1-Measure

Crsip Set

Cosine

Fuzzy Set

Proximity

Corrleation

Figure 3 (b): Average/Mean recommendation accuracies by similarity measures for

Max-Min Recommendation Strategy

30

Figure 4: Average Precision by Recall for Weighted-Sum Approach

C. Effect of Recommendation Strategies

Figure 5 shows the effect of recommendation strategy on performance. The precision is not

affected by the recommendation strategy, but for recall and F1 measure the Max-Min

recommendation strategy yield an improved results. Therefore, depending on the tasks,

recommender systems designers can choose one of these recommendation strategies that would

produce optimal performance. For instance, Weighted-Sum recommendation strategy would be a

choice for task of finding top 10 “good’ items because it results in higher precision with lower

latency; and Maximum-Minimum aggregation strategy would be a choice for the task of finding

all “good” items because it results in higher recall with lower latency. Further analysis is

31

performed in order to identify whether these differences are significant or not, and results are

presented in the following subsections.

Figure 5 (i) : Mean recommendation precision

Figure 5 (ii): Mean recall

32

Figure 5 (iii): Mean F1 measure D. Comparison between Fuzzy-Theoretic Approaches and the baseline Crisp Set-Theoretic

Approach

Pair-wise multiple comparisons test using Bonferroni4 test is conducted between mean

performance of Crisp Set theoretic approach and the varieties of Fuzzy theoretic approaches. The

results are summarized as follows.

i. With weighted-sum recommendation strategy, all the Fuzzy Set-theoretic

similarity measures driven movie recommender system average precisions and F1-measure (with

the exception of proximity similarity measure where its average F1-measure is not significantly

different) are greater than the corresponding Crisp Set-theoretic similarity measure driven movie

recommender system. Moreover, there is no statistically significant difference between the two

groups in their average recall except in the case of the proximity-based similarity measure where

its average recall is lower.

4 The Bonferroni test, based on Student's t statistic, adjusts the observed significance level for the fact that multiple comparisons are made. It is commonly used multiple comparison tests, and more powerful for a small number of pairs [SPSS PC User Manual].

33

ii. With maximum-minimum recommendation strategy, all the fuzzy-theoretic

similarity measures driven movie recommender system average precisions are greater than the

corresponding crisp set-theoretic similarity measure driven movie recommender system.

Furthermore, all the fuzzy-theoretic similarity measures driven movie recommender system

average recalls are less than the corresponding crisp set-theoretic similarity measure driven

movie recommender system. Finally, except cosine similarity measure where there is no

significant difference, all the fuzzy-theoretic similarity measures driven movie recommender

system average F1-measures are less than the corresponding crisp set-theoretic similarity

measure driven movie recommender system.

In summary, the Fuzzy set theoretic and fuzzy based cosine, proximity and Correlation

similarity measures result in greater mean value of the precision compared to the crisp set

theoretic similarity measure using the two recommendation strategies. For recall, only

comparable results are obtained by both recommendation strategies. For F1-measure, better

results are obtained by weighted-sum recommender approach except when proximity similarity

is used. In case of the minimum-maximum recommendation strategy for F1-measure,

comparable results are obtained.

E. Comparison among the Fuzzy-Theoretic Approaches

Furthermore, the performance differences among the other fuzzy-set theoretic similarity

based approaches are analyzed using Bonferroni test. The results show significant different

among the different alternative combination of fuzzy theoretic similarity measures and

recommendation strategies in their recommendation accuracy. For precision, fuzzy theoretic set

and cosine similarity measures perform better for both inference strategies. For recall and F1-

34

measure, correlation and cosine similarity measures performer better with maximum minimum

inference strategies. In all cases, distance based similarity measure performs poor. The ordered

performance by the inference strategies and similarity measures on the different metrics are

presented in Table 4. For precision Cosine and Fuzzy Set based are the better choices. For recall,

Fuzzy Set based is a better choice when the recommendation strategy is weighted-sum, but

Cosine is the better choice when the recommendation strategy is Max-min. For F1 measure,

Correlation-like and Cosine based are the better choices when the recommendation strategy is

weighted-sum, but Cosine is the better choice when the recommendation strategy is Max-min.

Table 4: Summary of ordered performance by the inference strategies and similarity measures Metric Inference strategy Order for movie recommendation Precision Weighted-Sum Prox < Cos=FS=Corr

Maximum-Minimum Prox < Corr=Cos=FS Recall Weighted-Sum Prox < Cos < FS

FS=Corr; Cos=Corr Maximum-Minimum FS < Prox < Corr < Cos

F1-Measure Weighted-Sum Prox < Cos < Corr=FS Maximum-Minimum Prox=FS < Corr < Cos

F. Model Size and Recommendation Size Sensitivity Analysis

Since similar patterns are obtained in both recommendation strategies, results from

weighted-sum recommendation strategy are presented. Overall, as shown in Figure 6 (i), (ii) and

(iii), precision does not increase as the training size increases; and recall and F1-measure are not

significantly increasing as the size of training increases. The reason for this is once sufficient

number of items that suggest the preference of a user are gathered additional feedback on items

do not improve the preference of the user as well as the recommendation accuracies. Pair-wise

mean comparison test show no significant difference in performance by training size. Similar

pattern is also reported in [6]. These results are consistent through all similarity measures across

35

the two recommendation strategies. Furthermore, overall, a training size of 5 to 15 appears to

produce satisfactory recommendation accuracy.

Figure 6 (i): Means plot of recommendation accuracy metrics for Precision

Figure 6 (ii): Means plot of recommendation accuracy metrics for Recall

36

(c)F-ratio Figure 6 (iii): Means plot of recommendation accuracy metrics for F1-measure

Overall, as shown in Figure 7 (i), when top-N increases the precision is not significantly

increasing. Hence a Fuzzy theoretic based recommender system does not need to recommend a

large number of movies to achieve a better precision. However, recall increases as the size of

recommended items increases. The results are consistent through all similarity measures across

the two recommendation strategies. Furthermore, overall, 3 to 10 recommendations look to

produce satisfactory recommendation accuracy. Moreover, the change in precision in Crisp–

based approach is greater than that of FTM. This indicates the former approach is more sensitive

to recommendation size.

37

Figure 7 (i): Means plot of Precision for different similarity measures by top-N

Figure 7 (ii): Means plot of Recall for different similarity measures by top-N

38

Figure 7 (iii): Means plot of F1-measure for different similarity measures by top-N

G. Effect of Testing Size

Testing size is the actual number of items that are considered for recommendation, and from

which a few relevant items will be recommended to a user. Given the large number of

prospective items in a database, does a recommender system consider all or subset of these items

to come up with a few top interesting items? Does the size affect the accuracy of the

recommendation?

The Univariate analysis shows that testing size has significant effect on precision, recall and

F1-measures. As presented in Figure 8, for a marginal recommendation size, precision increases

as testing size increases, where as recall decreases as testing size increases. An approximate

optimal performance seems to be gained when the testing size is between 21 and 60. Further

analysis is left for future research.

39

Figure 8: Mean recommendation accuracies by testing size

H. Computation Time

Performance of simulated recommender system is analyzed in terms of its model time and

recommendation time. The means model time using the different similarity measure is in range

0.03 to 0.12 seconds for both the weighted-sum recommender and the maximum-minimum

recommender method. For top-10 the FTM mean recommendation time is 0.21 seconds for 10

movies and approximately 0.021 seconds per item. Moreover, analysis of variance shows

significant variations (sign.=0.00) in mean model time by similarity measures, as shown in

Figure 9. When we compare the mean model time of the Crisp-based with that of the FTM it is

low because the FTM has additional fuzzification and inference processes.

40

Figure 9: Means of Model Time for Weighted-Sum Recommendation Approach

I. Comparison with Related Conventional Approaches

Creating an equivalent base for a fair comparison between the results of the proposed approach

and the results of reported studies for movies recommendation using the same MovieLens

dataset is not straightforward. For example, comparison with collaborative filtering algorithms

sounds is unfair because our approach has additional item feature. Moreover, the experimental

set-ups are different because they are not standardized. Notwithstanding these limitations we

present the results of approximate comparisons. We are not claiming to prove that our approach

in its present state is superior to existing approaches. Rather, we are attempting to show the

potential of this approach for recommender systems and provide a framework for future research

in this area.

Among other related successful studies using MovieLens, most studies reported results

for top-5, top-10, top-15, and top-20 with model size of 20 to 50 as in [21], [15], [6], [19].

41

Results using model size of 20 and recommendation size of 10 are optimal in those studies.

These results are summarized in Table 5 along with equivalent results of the proposed FTM

methods.

Probabilistic memory-based collaborative filtering [20] is another approach to modeling

uncertainties due to stochastic nature of preference. Evaluations of the approach showed that on

EACHMOVIE (ratings on movies) dataset, the mean precision, recall, and F1 measure of top-10

are 66%, 51%, and 57% respectively; and on JESTER (ratings on jokes) dataset, the mean

precision, recall, and F1 measure of top-10 are 40%, 47%, and 43% respectively. Other studies

that used both EACHMOVIE and MOVIELENS reported greater performance on EACHMOVIE

[40],[6]. For example, Deshpande & Karypis [6] reported 40% of top-10 hit rate for

EACHMOVIE compared to 26% for MovieLens dataset. Therefore, we expect that the

performance of the proposed algorithms should be better than probabilistic CF approaches on

these datasets. However, this assertion needs to be empirically verified in future research.

As indicated in Table 5, the results for proposed approach have a greater average precision,

comparable recall and greater F1 measure. This improvement has to be due to the representation

scheme and inference strategy used in the FTM.

42

Table 5: Summary of Performance Comparison Approaches Recommendation

Size Model size Performance

Metrics Results

Precision 52.7%

Recall 28.4%

Propose herein

(FTM best results)

Top-10 20

F1-measure 31.6%

Precision 44.1%

Recall 32.91%

The Baseline (CSM) Top-10 20

F1-measure 34.58%

Item-Based CF [6]

User-Based CF [6]

Top-10

Top-10

20

20

Recall

Recall

27%

28%

User-Based CF [4] Top-10 50 F1-measure 23%

User-based (belief distribution

and nearest-neighbor algorithm) [19]

Top-15 3:1 split Precision 23%

VI. DISCUSSION

The issues we would like to discuss about the results of proposed method are about: the

features used in the model and their representation, recommendation approach, similarity

measure, model size, recommendation size, and testing size along with their effect on the

performance.

Using the baseline Crisp set theoretic similarity-based (9) Crisp set theoretic similarity measure

based recommender systems that use genres and ratings represented in crisp set result in lower

precision compared to Fuzzy theoretic similarity measures based recommender systems that use

genres and ratings represented using Fuzzy set. For FTM, approximately the precision increases

by 19.50% (from 0.44.1 to 0.52.7, Table 5). Precision is improved because the inclusion of the

43

item feature with the fuzzy theoretic-based representation and inference mechanism reduces the

noises in the recommended items.

Recall is not improved and further investigation of the other factors reveals that training size

found to be a moderating factor (see Figure 6 (ii)). For training size 10 to 25, it is found that

FMT recalls are greater than that of CSM. Further research is needed to find why recall for CSM

is larger at 30 and if it continues for training size above 30. Moreover, compared to precision,

recall is highly affected by Top-n, testing size and training (see Figures 5, 6 and 7).

In recommender systems, the focus is on finding ways to improve precision rather than recall

because the system can only recommend a limited number of items, usually 5 to 20. Therefore,

careful modeling of features of item and user behavior using fuzzy theory has improved the

precision with comparable recall.

Discussion about approach and similarity measures: {follow here!!} w.r.t other methods

Overall, the max-min recommendation strategy produces better performance in terms of recall

and F1 measure. For precision, there is no significance difference between the two approaches.

Why?

Overall, the Cosine and Fuzzy Set similarity measures produce better performance. The result

is consistence with previous results. Why FS and cosine???

The movie by genre matrix that is used to compute similarity is spares. Nearly 50%, 34%,

13%, 3%, 0.7%, 0.3% of the movies in the data set have 1, 2, 3, 4, 5, or 6 genres, respectively.

With vector of size 20 genres for a movie most of the values are zeroes. This affect most the

proximity and correlation-like similarity measures. Cosine and fuzzy set based similarity

measures are not affected. Why see below:

44

In case of proximity measure, the similarity coefficient can be nonzero even when there

is no common element between two objects. It is appropriate in applications where we want to

know the “ nearest” fuzzy value even in case when the system does not find any fuzzy value with

non-empty intersection. This means fuzzy sets with non-empty intersection can have the same

similarity as fuzzy sets with empty intersection. The sparsity in the movie by genres matrix has

less desirable to the distance-based similarity measures. The experimental results have also

shown supporting results. Correlation-like produces the next worst performance because: it has

similar characteristics as the proximity measure. However, compared to distance–based measure,

it is a weighted squared Euclidean distance.

The cosine measure is based both on magnitude and direction of the vectors. This measure

captures a scale invariant understanding of similarity, and it ignores the effect of the proportional

size differences. Furthermore, Fuzzy Set based Similarity Measure is based on the set-theoretic

operations on fuzzy sets, fuzzy set cardinality and commonality of attributes as the foundation.

The sparsity in the movie by genres matrix has less effects on the cosine and fuzzy set based

similarity measures. The experimental results have also shown supporting results.

The model size, recommendation size and testing size sensitivity analysis produce lower values

which are consistence results with previous results such as [1, 6, 15, 41]. This in turn also

supports the scalability of the algorithm for online recommendation.

Some of the potential applications of the proposed methodology are in recommender systems,

information filtering, tutoring systems, and web mining. The successful application of the

proposed method for movies recommender system makes their extension to recommendation of

other items such as music, jokes, books, etc. in e-commerce application highly promising. The

application of the proposed method for information filtering such as news filtering is also

45

promising. Related literature such as [18, 39, 42] can be extended to handle uncertainty due to

imprecision and vagueness using the proposed methods and algorithms. For instance,

representation of user news interest, content of news, and their relationships as well as

computing similarity and recommendation scores can be done in similar ways as for the movie

recommender system.

VII. CONCLUSION AND FUTURE RESEARCH

This research develops a fuzzy theoretic methodology for recommender systems. Using actual

data on movies, the results of the simulation study provide empirical evidence that supports the

effectiveness of FTM in terms of higher recommendation accuracy, lower model size and

recommendation size, and lower latency – lower model time and recommendation time. Hence,

integrating user and item features, and using fuzzy and possibility theories as the foundation for

representing and reasoning about uncertainty contribute to the effectiveness and efficiency of the

proposed method for recommender systems.

Furthermore, the study shows that there are significant different among the different alternative

combination of fuzzy theoretic similarity measures and recommendation strategies in their

recommendation accuracy. Hence, depending on the tasks, recommender systems designers can

choose a combination of recommendation strategies and similarity measures that would produce

optimal performance. A guideline is provided. Moreover, the study shows the effect of model

size, recommendation size and test size on the recommendation accuracy and latency on a FTM

based recommender system. Roughly speaking, the performance of FTM is better than

collaborative filtering. However, compare to collaborative filtering approaches that relies only on

user feedbacks such ratings, the proposed approach requires item features and domain

knowledge. A promising development is an automatic identification of these features, e.g., movie

46

genres by exploiting audio-visual cues presented in the movies.

Further studies are planned to extend this approach in several directions. First, inclusion of

additional attributes e.g. actors/actresses, and directors for movies expected to improve the

performance of the system. Second, genetic algorithm can be used for learning and tuning the

fuzzy sets parameters of the membership function defined in (1) as well as to find a better

membership function. Third, test the FTM approach with additional datasets and domain

applications to see the generalization of the results.

47

References

[1] J. B. Schafer, K. J, and J. Riedl, "Electronic Commerce Recommender Applications," Journal of Data Mining and Knowledge Discovery, vol. 5, no.1/2, pp. 115-152, 2001.

[2] R. Burke, "Hybrid Recommender Systems: Survey and Experiments," User modeling and user-adapted interaction, vol. 12, no.4, pp. 331-370, 2002.

[3] G. Adomavicius and A. Tuzhilin, "Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions," IEEE Transactions of Knowledge and Data Engineering, vol. 17, no.6, pp. 734-749, 2005.

[4] B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Riedl, "Analysis of Recommender Algorithms for E-Commerce," Proc. Proceedings of the ACM E-Commerce 2000 Conference, pp. 158-167, 2000.

[5] B. Mobasher, R. Cooley, and J. Srivastava, "Automatic Personalization Based on Web Usage Mining," Communication of ACM, vol. 43, no.8, pp. 142-151, 2000.

[6] M. Deshpande and G. Karypis, "Item-Based Top-N Recommendation algorithms," ACM Transactions on Information Systems, vol. 22, no.1, pp. 143-177, 2004.

[7] R. Mukherjee, P. S. Dutta, and S. Sen, "MOVIES2GO: A new approach to online movie recommendation," Proc. Joint Conference on Artificial Intelligence's (IJCAI's) and Workshop on Intelligent Techniques for Web Personalization, 2001.

[8] R. Mukherjee, N. Sajja, and S. Sen, "A Movie Recommnedation System - An Application of Voting Theory in User Modeling," User modeling and user-adapted interaction, vol. 13, no.1-2, pp. 5-33, 2003.

[9] L. A. Zadeh, "Fuzzy Logic, Neural Networks, and Soft Computing," Comm. ACM, vol. 37, no.3, pp. 77-84, 1994.

[10] S.-M. Hsu, C. Wu, and T.-W. Tien, "A Fuzzy Mathematical Approach for Measuring Multi-facet Consumer Involvement in the Product Category Evaluation," Marketing Research On-Line, vol. 3, pp. 1-19, 1998.

[11] P. Smets, "Theories of Uncertainty," in Handbook of Fuzzy Computation, E. Ruspini, P. P. Bonissone, and W. Pedrycz, Eds. Philadephia, PA: Institute of Physics Publishing, 1999.

[12] R. R. Yager, "Fuzzy logic methods in recommender systems," Fuzzy Sets and Systems,vol. 136, pp. 133-149, 2003.

[13] J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl, "Evaluating Collaborative Filtering Recommender Systems," ACM Transactions on Information Systems, vol. 22, no.1, pp. 5-53, 2004.

[14] K.-W. Cheung, J. T. Kwok, M. H. Law, and K.-C. Tsui, "Mining customer product ratings for personalized marketing," Decision Support Systems, vol. 35, pp. 231-243, 2003.

[15] B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Riedl, "Item-based collaborative filtering recommendation algorithms," Proc. 10th International World Wide Web Conference (WWW10), pp. 285-295, 2001.

[16] G. Linden, B. Smith, and J. York, "Amazon.com recommendations: item-to-item collaborative filtering," Internet Computing, IEEE, vol. 7, no.1, pp. 76-80, 2003.

[17] I. Zukerman and D. W. Albrecht, "Predicitive Statistical Models for User Modeling," User Modeling and User_Adapted Interaction 11: 5-18, 2001., vol. 11, pp. 5-18, 2001.

48

[18] J. A. Konstan, D. Maltz, J. Herlocker, L. Gordon, and J. Riedl, "GroupLens: applying collaborative filtering to Usenet news," Communications of the ACM, vol. 40, no.3, pp. 77-87, 1997.

[19] M. R. McLaughlin and J. L. Herlocker, "A Collaborative Filtering Algorithm and Evaluation Metric that Accurately Model the User Experience," Proc. 27th Annual ACM Conference on Research and Development in Information Retrieval, pp. 329-336, 2004.

[20] K. Yu, A. Schwaighofer, V. Tresp, X. Xu, and H.-P. Kriegel, "Probabilistic Memory-Based Collaborative Filtering," IEEE Transactions of Knowledge and Data Engineering,vol. 16, no.1, pp. 56-69, 2004.

[21] G. Karypis, "Evaluation of Item-Based Top-N Recommendation algorithms," Proc. 10th Conference of Information and Knowledge Managment, 2001.

[22] T.-H. Hsu, K.-M. Chu, and H.-C. Chan, "The Fuzzy Clustering on Market Segment," Proc. The Ninth IEEE International Conference on Fuzzy Systems, pp. 621-626, 2000.

[23] U. Kaymak, "Fuzzy target selection using RFM variables," Proc. IFSA World Congress and 20th NAFIPS International Conference, 2001. Joint 9th, pp. 1038-1043, 2001.

[24] O. Nasraoui and C. Petenes, "Combining Web Usage Mining and Fuzzy Inference for Website Personalization," Proc. Fifth WebKDD Workshop: Web mining as a Premise to effective and Intelligent Web Applications(WebKDD'2003), in conjunction with the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 37-45, 2003.

[25] M. Ozer, "User segmentation of online music services using fuzzy clustering," Omega: The International Journal of Management Science, vol. 29, no.2, pp. 193-206, 2001.

[26] S. Mitra, S. K. Pal, and P. Mitra, "Data Mining in Soft Computing Framework: A Survey," IEEE Transactions on Neural Networks, vol. 13, no.1, pp. 3-14, 2002.

[27] J. Basak and M. Kumar, "E-commerce and soft computing: scopes of research," IETE Technical Review, vol. 18, no.4, pp. 283-293, 2001.

[28] G. Meghabghab, "Mining User's Web Searching Skills through Fuzzy Cognitive State Map," 2001.

[29] M. Setnes and U. Kaymak, "Fuzzy modeling of client preference from large data sets: an application to target selection in direct marketing," IEEE Transactions on Fuzzy Systems,vol. 9, no.1, pp. 153-163, 2001.

[30] J. M. Sousa, U. Kaymak, and S. Madeira, "A comparative study of fuzzy target selection methods in direct marketing," Proc. Proceedings of the 2002 IEEE International Conference on Fuzzy Systems, pp. 1251-1256, 2002.

[31] R. R. Yager, "Targeted E-commerce Marketing Using Fuzzy Intelligent Agents," IEEE Intelligent Systems, vol. 15, no.6, pp. 42-45, 2000.

[32] D. Dubois and H. Prade, "General Introduction," in Fundamentals of Fuzzy Sets, The Handbooks of Fuzzy Sets, H. Prade, Ed. Boston: Kluwer Academic Publishers, 2000, pp. 21-124.

[33] L. A. Zadeh, "Fuzzy Sets," Information Control, vol. 8, pp. 338-353, 1965. [34] T. Bilgiç and I. B. Turksen, "Measurement of Membership Functions: Theoretical and

Empirical Work," in Fundamentals of Fuzzy Sets, Handbook of Fuzzy Sets and Systems,D. Dubois and H. Prade, Eds. Boston: Kluwer, 2000, pp. 195-232.

[35] R. Altman, Film/Genre. London: British Film Institute, 1999. [36] E. Cooper-Martin, "Consumers and Movies: Some Findings on Experiential Products,"

Advances in Consumer Research, vol. 18, pp. 372-378, 1991.

49

[37] W. Pedrycz and F. Gomide, An Introduction to Fuzzy Sets. Cambridge, Massachusetts: The MIT Press, 1998.

[38] V. V. Cross and T. A. Sudkamp, Similarity and Compatibility in Fuzzy Set Theory: Assessment and Applications. Heidelberg; New York: Physica-Verlag, 2002.

[39] D. Billsus and M. J. Pazzani, "User Modeling for Adaptive News Access," User modeling and user-adapted interaction, vol. 10, no.2-3, pp. 147-180, 2000.

[40] D. O. Sullivan, B. Smyth, and D. Wilson, "Preserving Recommender Accuracy and Diversity in Sparse Datasets," International Journal on Artificial Intelligence Tools, vol. 13, no.1, pp. 219-235, 2004.

[41] A. I. Schein, A. Popescul, L. H. Ungar, and D. M. Pennock, "Methods and metrics for cold-start recommendations," Proc. 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2002), pp. 253 - 260, 2002.

[42] L. Ardissono, L. Console, and I. Torre, "On the application of personalization techniques to news servers on the www," in In Proc. AI*IA 99, Lecture Notes in Computer Science:Springer Verlag, 2000.

[43]

50

Appendix

Table 4.3: Notation and Definition of Variables

Variable Name Description Possible Value(s)

Approachid Recommendation method which refers

to the aggregation function used.

Weighted-Sum or Max-Min

Similarityid Similarity Measures Crisp Set, Cosine, Fuzzy Set,

Proximity, Correlation

Userid User identification number 1 to 943

trainingsize or

notrainingcase

Number of movies used as training case 5, 10, 15, 20, 25 and 30

Topn No of movies (N) to be recommended 3, 4, 5, 10, 20, …, 50

testingsize No of movies used as testing cases 3 to 539

recomtime Time required for N recommendations >0

precision45 Precision as defined in Chapter 3 Between 0 and 1

recall45 Recall as defined in Chapter 3 Between 0 and 1

f1ratio F1-measure as defined in Chapter 3 Between 0 and 1

FTM

CSM

Weighted-Sum similarityid Precision Recall F1-MeasureCrsip Set 0.48739784 0.320742 0.33334666Cosine 0.55005312 0.316626 0.33781755Fuzzy Set 0.55176256 0.321070 0.34300966Proximity 0.53925867 0.298921 0.32270392Corrleation 0.54910064 0.318104 0.34346133

Maximum-Sum similarityid Precision Recall F1-MeasureCrsip Set 0.48366128 0.392206 0.38324557Cosine 0.54998004 0.379255 0.38075219Fuzzy Set 0.54798302 0.350619 0.37108524Proximity 0.54347867 0.356230 0.37097099Corrleation 0.55098015 0.371140 0.37473446

51

Statistics for Top 10 and Training Size 20 approachid similarityid precision45 recall45 f1ratio

Mean .44090414 .238442 .27026395Std. Deviation .273553416 .2312662 .163949833

Crsip Set

N 704 704 633Mean .52528494 .227816 .26415138Std. Deviation .260475409 .2241420 .148228242

Cosine


Fuzzy Set


Proximity


Weighted-Sum

Corrleation


Crsip Set


Cosine


Fuzzy Set


Proximity


Corrleation


Maximum-Sum

Total

N 3582 3582 3390

Documents

Fuzzy Modeling for Item Recommender Systemsnorcio/papers/Submitted...Fuzzy Modeling for Item Recommender Systems Or A Fuzzy Theoretic Method for Recommender Systems Azene Zenebe, Anthony