Assessing Personalisation Techniques on Anonymous … Problem Statement ... of a webpage to predicting and anticipating the needs and ... retrieve valuable customer data as even if

The candidate confirms that the work submitted is their own and the appropriate credit has been given where reference has been made to the work of others.

I understand that failure to attribute material which is obtained from another source may be considered as plagiarism.

(Signature of student)

Assessing Personalisation Techniques on Anonymous Users in Major E-Commerce

SystemsJohn SchaerComputing

Session (e.g. 2005/2008)

1

Chapter 1

Introduction

This introductory chapter serves to quantify the aims, objectives and scope of the project and what

will be completed and delivered. The project schedule is also included.

1.1 Problem Statement

Personalisation features and techniques are only fully applicable to users who can be clearly defined

and compared by web-based systems and, although features exist that can operate on anonymous

users, they are not as efficient or integrated as their registered counterparts.

1.2 Aim

The aim of this project is to determine the extent to which personalisation techniques are applicable in

popular web-based systems operating on anonymous users. Using this information, enhancements to

these systems can be proposed.

1.3 Objectives

The objectives of the project are to:

• Identify the different personalisation techniques that exist for anonymous users.

• Determine the extent of applicable personalisation from a justified selection of web-based e-

commerce systems.

• Evaluate the quality of the current customisation techniques.

• Suggest why, where possible, customisation techniques are not being used in the selected Web-

based systems and suggest enhancements that can be made.

• Discuss the feasibility and practicality of using Web 2.0 features as a possible part of these

enhancements.

1.4 Minimum Requirements

The minimum requirements are:

• A report researching and evaluating personalisation techniques that have been integrated in a

selected sample of web-based e-commerce systems, considering their performance on anonymous

2

users. This will then be supplemented by an informed and valid in-depth discussion of the feasibility

of using Web 2.0 to compliment, or even in lieu of, current personalisation systems.

The possible extensions are:

• A technical software implementation of a discussed Web 2.0 feature, with an appropriate

justification of its usability and feasibility in terms of providing personalisation for anonymous users.

• A evaluation of the software implementation.

1.5 Consideration of Past Projects Within the School of Computing

There have been a range of past projects that have implemented personalisation techniques into web-

based systems and one past project that has evaluated and assessed personalisation techniques in

current systems. However, none of these projects have considered the subject of anonymous users and

how successful these techniques are. With the growth of e-commerce web-based systems and

anonymous users, it is a worthy and considerable problem to examine as the basis of a project.

1.6 Project Schedule

The final project schedule is shown overleaf (Fig 1.1.). It is the fourth iteration of the project

schedule. The original project schedule and the second and third iterations are included in appendices

B, C and D.

The original project schedule changed only slightly to become the second iteration after the Mid-

Project Report was returned. This was to accommodate changes suggested by the assessor; notably to

provide a possible software implementation. Although the inclusion of a software implementation

seems as if it would drastically change a project schedule, it should be noted that the original schedule

was perhaps too generous in its original time allocations in regards to the second half (Semester 2) of

the project. The original schedule was followed and on-track for the first half (Semester 1) of the

project

The second iteration was altered after difficulties in quantifying an assessment scheme that would

justify the scope of the project. This involved focusing the project to include just e-commerce web-

based systems. The third iteration reflected comments made in the project progress meeting.

The project schedule is reflected upon in Appendix A.

3

4

Chapter 2

Background Research

To understand the problems that are inherent with anonymous users and personalisation techniques in

web-based e-commerce systems it first must be understood what a personalisation technique is; how it

works, how it is implemented and why it has been implemented. This is important in order to properly

identify what personalisation techniques have been implemented in the web-based systems being

assessed and what limitations these techniques have.

The following research aims to explain this, as well as what anonymous users are defined as in the

reading and what research has been made in the personalisation field and what assessment criterion

have been proposed.

2.1 Introduction to Personalisation

‘The personalisation technology is fast evolving; its use spreads quickly. In the years to come all Web

applications will embed personalisation components and this philosophy will be part and parcel of

everyday life tasks’ [20].

Personalisation of web-based systems refers to the provision of personalised content and

recommendations provided from the system to the user. It has enabled ‘mass customisation’ to

individual customers direct from the vendor and it can range from simple, subtle changes of the

content and presentation of a webpage to predicting and anticipating the needs and goals of a user

[23].

Personalisation and customisation systems have evolved significantly from check-box user applied

personalisation to more integral and intuitive techniques and methods [20]. Every user visiting a

website leaves a trace of their activities in a log file, showing the pages they have visited and their

click patterns [33]. Analysis of such data provides the system with information on how to personalise

or customise the system to cater for a particular type of user. This analysis can then be applied to the

system either through content, structure and/or presentation. It is because of these advances in the

field that personalisation is able to intuitively occur for users that provide little to no personal

information or preferences; anonymous users.

5

As Markellou et al. [20] describe, content personalisation typically produces additional information

for the user (for example, in an E-commerce context, other items/products they may be interested in),

structure personalisation changes the link structure of documents and web pages in a website

(providing or removing relevant links) and presentation personalisation changes the layout and format

of the system produced pages (for example, for use on PDAs or mobile phones). Thus the analysis of

data is only as useful as the subsequent delivery of information to the user [12]. If the user cannot

interpret the recommended information, as reliable as it may be, then it is useless.

2.2 Impetus for the Implementation of Personalisation

Survanashi et al. [34] writes;

‘Understanding behaviours, characteristics and preferences of users is essential for satisfactory user

experience. This can help companies retain current customers, attract new customers, improve cross

marketing and sales...’

Through a better understanding of customers and their goals and intentions, web-based systems

(particularly e-commerce systems) can better tailor their efforts to meet these needs as well as going

beyond, through marketing such as sales promotions and advertisements [28]. A highly personalised

and customisable allows for particular competitive advantage, too, for e-commerce based systems

[15][12], offering a better user experience [19] and, again, encouraging repeat-visits and repeat-

buying from customers.

Kim [18] details that there is a correlation between the advent of social networking sites (such as

Facebook and MySpace) and their provision of consumers to e-commerce sites. They are

complimenting the trust-worthiness of recommender systems as whereas before a person would be

more influenced by a personal recommendation by a person they knew but are now more prepared to

believe a systems recommendations.

It can also prove hugely beneficial for the owners of the systems. Particularly in an e-commerce

domain, personalisation allows for the quick testing and integration of the introduction of new

products and new marketing campaigns [19] It is especially beneficial for web-based systems to try

and incorporate personalisation systems for anonymous users as it may encourage them to register and

reuse the site again in future [34].

This is critically discussed and considered further in the development of the project’s methodology.

6

2.3 Knowledge Discovery

Knowledge Discovery is the process of extracting potentially valuable patterns from data to

retrieve behavioural content [31]. Web mining methods are an example of knowledge discovery.

Knowledge Discovery is complex, iterative and highly interactive [17].

The Knowledge Discovery Process is split into 5 main steps, as described by Schulz and Hahsler [11]

First is the selection of data, whether to use existing transaction logs or collate data especially for the

purpose of generating recommendations. The selected data is then pre-processed and cleaned for noise

and inconsistencies before being transformed into a format suitable for data mining. The data can then

be data mined for significant, interesting patterns, after which it is interpreted in the context of the

system it will be used for. Finally, the results are formatted and presented in an appropriate way (i.e. a

recommendation system).

2.3.1 Web Mining

Web mining refers to three knowledge discovery domains; Web Content Mining, Web Structure

Mining and Web Usage Mining, as Hay et al [15] describe;

• Web Content Mining

Web Content Mining extracts knowledge from the content of documents and their descriptions.

• Web Strucutre Mining

Web Structure Mining extracts knowledge from the organisation, structure and linking of webpages.

• Web Usage Mining

Web Usage Mining (WUM) extracts knowledge from the usage patterns of webpages. The user leaves

a trace of their activity and their click patterns in a log file made automatically by websites [27]. This

behaviour is then compared to an already constructed offline user access mode [32]. Through this

process, the system is able to implement varying degrees of personalisation and customisation

techniques by identifying the user’s individual needs and preferences.

However, particularly for leading E-commerce web based systems, this generates a huge amount of

data; analysis of which is hard to do by hand [13].

7

2.4 Data

The construction and maintenance of datasets bring with them formidable challenges. They are not

only intrinsically large in nature and expected to grow even further, but they also contain missing

entries. For example, customers don’t view all webpages and they do not purchase all items [26]. The

growth of the system also means that the predictions have to be constantly updated; else they will

have a very short lifetime [36]. Over a long period of time, therefore, existing user patterns that have

become invalid must be pruned so they do not taint future patterns [21]. There are three sets of data

that can be used in the mining of users interactions to build recommendation systems; user entry data,

server logs and cookie logs [25] and application server logs [19].

2.4.1 User Entry Data

User entry data is explicit information provided by the user in a direct way, such as online forms or

questionnaires [25]. These can include the likes of personal user preferences and interests, but

typically contain information such as name, address, etc, to be used as part of the registration process

for users.

2.4.2 Web Server Logs

A web server log is automatically generated by a server when a user visits the web pages of its

website [25]. A server log can contain information such as the IP address of the user, what webpages

the user accessed, the time the user visited the website and the overall time the user spent at the

website [25].

Although most servers generate web server logs, they should not be the only source of data used for

web mining. The limits of web server logs are explained by Kohavi [19]. Foremost, web server logs

do not capture events critical to both E-commerce and web mining; such as ’add to cart’ and ’change

quantity’. Capturing such events could retrieve valuable customer data as even if a shopping cart has

been discarded before the final purchase stage, it would still present valuable user information and

preferences. Web server logs also do not capture form information that the user has filled in on

webpages, which, again, would be very useful in web mining. Finally, most of the data in web server

logs is completely redundant to web mining. For example, they contain the requests for every image

8

on every page, which is of no use to web mining as every user has to request those images so no

differentiation or preferences can be derived.

2.4.3 Application Server Logs

Application Server Logs (logging at the application layer) offer many solutions to the problems

raised in web server logs. Again, as Kohavi explains [19] they can capture critical events and

web forms that web server logs cannot, they eliminate redundant trivial data, and as they are

set on the application layer which controls and identifies user sessions and registration as well

as user logins and logouts, all this information can be logged.

2.4.4 Cookies

Cookies are automatically generated in the form of a small text file by a server each time a

user visits the web pages of its website also [25]. Cookies attribute each user with a unique

identification code, rather than using an IP address, so they can be identified each time they

visit the website [25].

2.5 Users

The primary aim of customisation and personalisation is to automatically tailor the system to the users

needs without inhibiting the user anymore than necessary. The use of user surveys and questionnaires

are both intrusive and labour intensive [28][25] and are referred to explicit user information. Explicit

profiling typically involves generic user information such as data of birth, address, etc, as well as

dynamic changing information such as favourite television program [34]. As such, web based systems

have to extract the users preferences and aims from their interactions with the data [25]. Typically,

web users are hesitant to provide explicit personal data, from a both effort and privacy viewpoint [20].

If the identity of the user of the system is unknown, then the system cannot construct an individual

purchase history [10] as it has no reference to the user. The user may well have visited the system

before, but the system is unable to differentiate one anonymous user from another. As such, the only

information available regarding anonymous users are their navigational activity and interactions with

the webpages. Thus, every action of the user must be used and considered as accurately as possible

[32].

2.6 Modelling User’s Patterns

9

2.6.1 Clustering

Clustering is a method either for grouping together similar patterns found from users browsing

sessions or for grouping the web pages of a website into groups that are accessed together [4]. This

creates a neighbourhood of user models, whereby similarities between users can be weighed against

and matched to neighbouring matching attributes [25][18].

2.6.2 Association Rules

Association rules aim at discovering all frequent patterns among transactions [9]. It refers to a

sequential data pattern; typically the user accessing a set of pages in a given order.

2.7 The Web Environment

The World Wide Web has many inherit problems which can impede the process of data mining on its

domain. Yates and Riberio-Neto [4] approach some of these problems. Firstly, the very nature of the

Web, a volatile, unstructured and dynamic environment spanning many different computers,

architectures and platforms, poses obvious mining challenges. The data within varies greatly in

quality, structure and even format (for example, Chinese and Japanese alphabets). Yet, it is this sheer

scope of the World Wide Web that brings many potential customers for E-commerce systems to

exploit.

2.8 Web 2.0

Web 2.0 is a term that has been coined to describe an advent of concepts that promote and encourage

user participation and collaboration that aid both information retrieval and creation [24] using the

platform of the World Wide Web. Popular features, or tools, that typically describe or are associated

with Web 2.0 are wikis, blogs, tagging, virals, etc. However, the phrase is often extended to also

include tools that not only afford greater interactivity between user and system, but also that seek to

close the gap between user and system, e.g. sharing media (e.g. embedded flash video files as part of

the content of webpages), RSS feeds, etc [24]. It has become a popular buzz-word to describe the

omnipresent nature of the world wide web in popular culture.

2.8.1 Tagging

10

Tagging is a form of user created content [24]. It enables users to describe content, from products for

sale on an e-commerce website to articles written by someone on a blog, with succinct keywords. The

purpose of tagging is to help categorise information with succinct and concise keywords for the users

own benefit or for other users, as well as to comment on information by adding such content.

2.8 Recommender Systems

Recommender systems are increasingly becoming a standard across retailer web sites [11] as

previously observed customer behaviour is most important feature to determine future customer

behaviour [10]. Their goal is not to recommend products necessary to the user but rather to boost

sales or products, or to benefit the system in some way [8]. A recommender system is an anonymous

implicit system which uses observed user behaviour to provide links to related webpages, without

significant human effort and interference [10].

2.8.1 Collaborative Filtering

Yang and Parthasarathy [36] define collaborative filtering as;

‘Collaborative filtering uses consumer specific historical data to classify users into different

groups such that each group has similar interests or buying patterns.’

After these classifications have been constructed, it can then predict a users possible future interest by

referencing the group that they have been classified into. As such, collaborative filtering can find

relationships between items that have no content similarities but are linked between the users

accessing them [13]. Thus, the recommendations are collected implicitly from the users as they are

not expected to give input. Rather, the users interests are generated using the web logs they have

created [34][12].

By its nature, collaborative filtering is not accurately applicable to completely anonymous users as

they cannot be classified or compared to these constructed groups based on preferences. The system

can only infer preferences from anonymous users, which may or may not be accurate. Approaches

that have been developed to combat this, such as offline clustering of page accesses, have thus far

proved expensive and process-consuming to implement [36]. In the context of recommendation

systems, collaborative filtering is applied in an item-to-item sense; products are recommended to

users based on other users experience [30].

11

Collaborative Filtering suffers many significant problems, notably, that of sparsity [6] If an item has

only a few ratings then it will not be used as its correlation between other items cannot be calculated

[6]. Such sparsity is likely to occur particularly with less popular items or ones that are new to the

system and thus have yet to be rated yet. It can also occur if there are simply more items than there are

users who can rate them [15].

As systems grow, as does the complexity and scalability of the recommendation systems. A trade-off

between keeping user details up to date whilst keeping response times low [14] as to not affect the

users of the systems. Also, an increased amount of users and products means that the system has to

therefore look at more and more users and products [30][13]. Failure to do so, particularly in an e-

commerce web-based system where business and consumer patterns are likely to change dynamically

and quickly, can lead to incorrect and invalid recommendations to the users [33].

2.8.2 Preference Based Recommendation

An alternative to Collaborative Filtering is Preference-Based Recommendation whereby the user is

explicitly prompted to express their preferences so that the system can retrieve results and

recommendations based on these preferences [30].Whilst this method can operate successfully on

even a large set of volatile attributes without suffering from aforementioned latency, sparsity or

scalability problems, it is intrinsically based on the input from the user. To build a successful

preference based model, the user would have to input a large amount of detailed, correct information

[30]. From a HCI perspective, especially, it may be hard to actually prompt the user into giving such

information as although the user knows what they want, how can they succinctly input this

information into the system without ambiguity or confusion. Also, if it is dynamic information and

likely to change over time then the user has to remember to change their preferences accordingly as

time goes on, else the information become useless in recommendations.

There is also the issue with users actually being bothered rating items. Recommender systems are

dependent on user input; they are integral to the success of the recommendations. Yet, as Bergholz

succinctly writes [6] ’people like to review items they like, they don’t bother with the rest. But that

gives an incomplete picture to the system’. This is, of course, arguable, as if a user has a particular

gripe with a product they are even more likely to review it then they would if they were entirely

satisfied with it. It would be more unlikely for a user to be bothered to review a middling item, but the

general apathy of users reviewing items (and thus contributing to the system) is a valid point to

consider.

12

The major benefit of recommender systems comes from the fact that they can recommend

items/products to the consumer without actually having had to analyse the item/product being

recommended; [6] it can do so on the strength of either other users ratings or from the analysis of

learned usage patterns. Of course, therein lies its major weakness also as it is not only questionable

how reliable such ratings are, but such systems are prone to attacks such as the insertion of false

information to deliberately manipulate the ratings, known as shilling [36][30]. The aim of such attacks

are to bias the systems output [22] perhaps to the manipulators advantage (e.g. on auction systems

such as Ebay.com). It has been proven possible to successfully attack as system without knowing

much about either the system itself or its users.

2.8.3 Recommendations: The Banana Problem

Herlocker [16] states that a system can retrieve obvious and popular recommendations that are

accurate with reasonable coverage, but are in fact impractical and almost useless. This is because the

items are so popular and so obvious that, for the mass majority, they seem to be accurate because

most people like them or have bought them. This is commonly referred to as ‘the banana problem’ (or

the ‘Harry Potter’ problem) as it is the equivalent of a system recommending bananas to a user’s

grocery list. The system will think this is a good recommendation because it will be able to match a

current user’s shopping to other (past) user’s in the system and most of these other users will have

bought bananas. So bananas seems like a good recommendation. However, most customers buy

bananas and the ones who don’t typically have a good reason for not buying them. So suggesting

bananas to a user who doesn’t have bananas on their list isn’t a good idea, because they’ve already

made a concrete decision not to buy them – they’re not buying them because they haven’t heard of

them before.

Although impractical, Herlocker notes that obvious recommendations do have use to the user as they

produce confidence in the system, albeit, superficially. If a user is recommended an item that they

would probably have bought (e.g. a banana), or were going to buy, they think that the system is

accurate. However, the system may just as well recommend the most popular items, without

comparing users to each other, because it will have the same results.

2.8.4 Recommendations: Novelty

Herlocker [16, 4] also introduces the concept of novel recommendations and stresses the importance

of them. A item is considered novel if the user is not aware of it and would not have been aware of it

were it not for the recommendation provided by the system. Although important (it could be argued

that the whole point of recommendation systems is to provide the users with items that they did not

13

know about and match their needs and wants) the achievement and success of novelty is difficult to

measure as it differs entirely from user to user.

2.9 Direct Recommendation and Pervasive Recommendation

A direct recommendation systems responds to the users direct interaction with the system with

a direct response to their request [12], e.g. with a recommendation. Pervasive recommendation

systems produces advertising and marketing. The marketing produced in a system (e.g. banner

ads in a webpage) is personalised to the current user [12].

2.10 Evaluation

As recommender systems use complex algorithms and heuristics, they are difficult to evaluate.

Evaluation methods typically only evaluate how well the patterns from datasets are discovered,

rather than the usefulness of the recommendations themselves. As such, their performance can

be measured against their impact on the system that they are supporting [11]. This can be, for

example, the additional sales that the implementation of a recommendation system generated.

They can also be evaluated from a HCI perspective for the presentation of the recommendations

generated [11]. Evaluation methods are discussed in more detail in the Methodology chapter of the

report.

2.11 Previously Developed Tools and Techniques

Although a variety of approaches exist in literature attempting to provide personalisation to

anonymous users [15][14] [25][32], these approaches are limited in several respects. The different

techniques developed are applied on a variety of different websites, so comparison of the

results is difficult [28]. Furthermore as only the user traces are used there is no way of knowing

to what extent the techniques were successful [28]. One such method is the use of Markovian

chains which uses prior categorisation of web pages to model clickstreams on a transition matrix

between page categories [14]. Another approach is the use of Open Profiling Standard (OPS)

[25][20]. This is where a user builds a profile offline which can be integrated into the Internet

browser they are using. These profiles can be accessed by each website the user visits, providing

instant access to reliable information (specified by the user themselves) regarding the user. [32]

details the Feature Matrices (FM) model which can adapt to user behaviour in a real-time

environment., as well as consider partial navigation patterns. [15] describes the development of

a tool called INSITE which uses on-the-fly customisation, rather than offline models, against

the users navigational trails.

14

One significant problem raised by existing approaches is the inability to monitor and record

how long a user spends looking at a webpage [15]. This can be achieved by calculating the

time difference of two consecutive requests from a user [13] and such data could form interesting

patterns, as it could be deduced that the longer a user looks at a page then the more interested

the user is in that information. Of course, this is still an assumption in itself as the user may

have left the computer and left that page open or there may be more information on that web-

page than other pages on average, for example. Regardless, if such data could be interpreted

successfully then it would be a contributing useful factor in web mining.

The very nature of the Internet raises problems also. The Internet is inherently a changing dynamic

environment and techniques for personalisation have to adapt accordingly, by changing

the user model used to provide users with personalisation and recommendations [34].

15

Chapter 3

Designing an Appropriate Assessment Criteria

Designing a methodology that will provide a assessment of personalisation techniques in regards to

anonymous users needs to consider many important factors that will affect the quality (and subsequent

assessment) of personalisation techniques in response to anonymous users. These include why

anonymous users (identification) exist and the extent which they can be ‘used’ by a system and why

e-commerce systems have been selected for assessing and why these systems may use personalisation.

These must be considered else the quality of the system cannot be determined.

3.1 Defining Anonymous Users

As the background reading has highlighted, an anonymous user is one who offers no data (that can be

turned into information) to the system. In relation to this project and assessing personalisation

systems, the truest definition of an anonymous user cannot be used as there will be no data

whatsoever for the system to use. Also, different works found in the reading use different meanings

for both anonymous users and implicit information.

Thus, for the purpose of this study an anonymous user will be defined as one that offers minimal data

to the system. The data will be implicit and passive rather than explicit, e.g. the anonymous user will

not create a profile or enter personal information or preferences. The system will have to assume these

preferences and interests from the (minimal) data supplied.

As such, it would be unfair to assess the quality of an e-commerce system in comparison to a ‘known

user’ as the personalisation will be completely different; both in terms of what is supplied and the

quality of it. A known user will not only have had previous interaction with the system but will have

explicitly provided data too. Thus they can not only expect informed recommendations that are of a

higher quality, but also different personalisation techniques too.

3.2 Customer attitudes

Privacy is a notable factor as, as [26] states, the more the users gives to a system, the more the system

can give back to them in the form of a personalised and unique experience. However the more

hesitant and concerned the user is with regards to submitting personal information the less likely it is

that they will receive an intuitive and useful experience from the system.

A particular privacy concern arises from click stream data, retrieving information automatically form

a users IP address, etc. It is obvious that these may cause concerns from users who feel that what

16

they’ve been doing and where they’ve been doing it is being recorded [26]. There are personalisation

techniques that automatically use a user’s IP address to provide personalised content to a user, but

these are typically associated with nefarious websites or pervasive marketing (“Find Singles in your

area!”).

The reasons as to why a user may be considered with privacy are numerable. Foremost is the issue of

advertising. A particularly unwelcome consequence from the advent of the World Wide Web is more

intuitive marketing. With users being bombarded with advertising as they merely browse webpages

through to them sending an email, it is no surprise that a user may be hesitant to give something like

just their email to avoid to this hassle; especially if they are fearful of specific tailor-made targeted

marketing that exploits of all of their personal data. Or, if this information is not used by the web-

based system that the user is using, that it may be passed on to another organisation. So, in respect to

this project’s methodology, a site that uses IP addresses or pervasive intuitive marketing will not be as

a sign of quality personalisation – as they are seldom complementary to the user.

Such concerns are relevant to the preparation of a methodology as they can restrain the

implementation of personalisation techniques. For example, systems will shy away from using tools

that users think are intrusive or suspicious; e.g. using an IP address to providing instant location

information, and thus should not be ‘penalised’ for not using them. It also potentially means that

systems have to be more adaptable with the potentially little information that they can receive from

users.

3.3 E-commerce Systems

Although the background reading has shown that personalisation features can be applied across a

variety of different web-based systems such as search engines, social networking websites,

information retrieval websites, etc, e-commerce has been chosen as the focus of the evaluative study

for several reasons. Foremost, the different types of web-based systems must be considered in relation

to anonymous users.

Simply, an anonymous user cannot use a social networking website for its intended purpose without

contributing explicit information to the system as the purpose of such websites are to build and show

(explicit) relationships between different users. So they obviously cannot be used for this study.

Secondly e-commerce based systems have the strongest incentive to offer personalisation to its users.

Although this point is arguable, it must be stressed that the primary purpose of an e-commerce system

is to make profit. They can do so by either retaining users so they visit and the use the site more often

or by persuading users to buy more products. Both of these can be achieved by successful

personalisation and are explained in further detail below (System’s objectives).

17

Finally, the implementation of personalisation techniques has become a set standard in e-commerce

systems. Successful e-commerce systems are expected to at least have a basic personalisation

interface, whereas not all search engines or information retrieval systems, for example, necessarily

need to deploy such techniques to achieve their primary objectives. The ubiquitoness of using

Amazon.com as an example of implementing successful personalisation in the background reading

may be used as corroborating proof of this.

3.3.1 Impetus for E-commerce Personalisation

The background reading has touched upon the importance and effectiveness of personalisation

especially in regards to e-commerce systems. The reasons for e-commerce systems utilising and

implementing these techniques would provide an indication of quality; does the personalisation

achieve the systems objectives and goals, and does it fulfil the user objectives and goals?

3.4 Identification of System’s Objectives

Three main reasons can be identified as to why an e-commerce system would personalise. These are

summarised below;

1.) It draws attention to the business and its products and services, so that it will hopefully turn

browsers into buyers and gain consumer loyalty

2.) It implants a message; impacting upon the user in such a way that the user will remember the

business

3.) It persuades and cross sells; how personalisation can convince a user to consider another

product and convince a user to use their business over another.

It is, however, difficult to infer from these objectives how effective the personalisation implemented

actually is in achieving them. Evaluating the personalisation techniques of a business from the

perspective of how the personalisation has increased sales is out of the scope of a project such as this

as simply it would require a mass of sales data from e-commerce businesses. Even then, the sales data

may not necessarily prove that a user bought an item purely on the strength of a provided

recommendation. Likewise, evaluating how a system has implanted a message or persuaded users

would require intensive studies of users; and it would be difficult for users to confidently then say that

they bought an item because of just the persuasion of the system’s personalisation.

Thus, from an assessment perspective, the success of a system achieving its (identifiable) objectives is

difficult to evaluate.

18

3.5 Identification of User’s Objectives

A user may be using an e-commerce system for several reasons [16]:

1.) To buy, or look at, a particular product

2.) To compare prices

3.) To read/write user reviews of a product

4.) To receive recommendations from the system

5.) Browsing with no clear intent

6.) Find a mixture of new and old items

When applied to the assessment of personalisation techniques, studying user’s objectives may prove

as inconclusive as the achievement of the system’s objectives. Users may not be able to explicitly

state whether they definitely would buy a product just because of a recommendation from the system.

For example, the user may have heard of the item that the system is recommending and in such a case

the system is just prompting the user to buy it; they may very well have been intending to buy the

item. Likewise, if a user buys a product because of another users review of that product can it be said

that the system itself has provided the recommendation? So, in such examples, can the system be said

to have supplied a ‘quality’ recommendation?

Thus, with respect to a user’s objectives, a system will find it even more difficult to associate an

anonymous user with a given task as the system has no prior information about that user and has no

indication of what the user might be doing. A user may use a system at first just to ‘test the water’;

they may be evaluating it through using it and, if they find the system effective, they become a

registered (perhaps paying) user. Therefore the assessment of personalisation techniques cannot be

made through the identification of user’s tasks.

It would be similarly difficult to recreate a particular user objective as a means of assessing a system;

how would a ‘comparing prices’ session clearly differentiate from a ‘find a mixture of new and old

items’ session?

3.6 Consideration of other relevant methodologies and approaches

There is no evidence in the literature of evaluations focused entirely on anonymous users. As such

there is no predefined standardised metric that can be used across a range of e-commerce systems for

measuring quality in regards to anonymous users. However, there are evaluations of personalisation

systems which can be considered to justify and compliment the chosen methodology for this study.

19

3.6.1 Herlocker

Herlocker’s approach to evaluating recommender systems is particularly exhaustive and raises many

important issues. Foremost, it discusses the fact that different algorithms are developed and

implemented across different domains for different reasons, e.g. a large dataset will use a different

algorithm to one that has a small dataset, a system with more users that items will use a different

algorithm to one that is vice versa, etc [16]

This is a contributing reason as to why this study is focusing on ‘major e-commerce systems’ as it can

be assumed that the system’s datasets and the algorithms used on them will be similar, if not the same.

Another point raised is how a user who just intends to browse a system will prioritise the interface and

usability of the system rather than necessarily just the recommendations offered (which will still be

useful). The usefulness of the system, it can be argued, must be a priority in an evaluation of a system

anyway – a system cannot just depend on accuracy of recommendations.

As discussed in the background reading, novelty is considered “one of the most valued characteristics

of systems recommendations” [16] and refers to items the user wasn’t aware of and probably wouldn’t

know about if it weren’t for the system. However, its quality and effectiveness cannot be measured in

any qualitative way as how obscure an item is depends entirely upon the user. It would be even harder

still for anonymous users as the system is at the disadvantage of not entirely knowing the user’s

preferences, buying history, etc. The quality of such a recommendation would rely on explicit

prompting of the user; “was this useful?” and as such cannot be used with regards to anonymous

users.

3.6.2 Wu

Wu’s methodology is considerably less comprehensive than Herlocker’s evaluative techniques. The

methodology [ 35] awards ‘weights’ for implementation of both explicit and implicit personalisation

techniques and these are tallied up to give each system a score and an indication of the amount of

personalisation implemented. Although it does consider implicit (and thus anonymous user)

personalisation, this is only in conjunction with explicit personalisation and it does not measure the

accuracy of the personalisations: just the ‘presence’ of them.

It is useful, however, in its identification and classification of personalisation techniques that can be

implemented in systems.

20

3.7 Conclusion

Thus with these concerns systems need to be more adaptable with the potentially little information

that they can receive from the users. Users are apprehensive that their actions are being watched and

assumptions are being made about their habits and interests.

21

Chapter 4

Developed Methodology

Considering the aforementioned possible inhibitions upon critically assessing the quality of

personalisation features on anonymous users, a justified methodology has been devised. The

following chapter discusses the different iterations taken towards developing this methodology. A

rapid prototyping approach was adopted for each iteration to test whether it would be feasible and

practical as an assessment technique.

4.1 Initial Decisions

Considering the background reading and the possible inhibitions to an assessment such as this, the

following decisions were to be implemented in the final methodology. Foremost, the method of using

a ‘check-list’ of possible personalisation features that can be used on anonymous users and then

seeing if the systems had these features implemented was not going to be used [5] as this ignores

context and makes the assessment of the systems difficult; especially as the absence of a feature is not

necessarily indicative of a systems quality.

It was also decided that the systems personalisation features would not be given ‘weights’ to indicate

and measure quality [35]. The reasoning for this, again, is that a more qualitative approach to

measuring quality would be developed.

The first iteration of the assessment criteria developed the idea of creating a ‘session’, or walkthrough

that would be used on each system to be assessed. As the sample session would be the same, a

consistent testing method would have been used for each assessment (and subsequent evaluation).

The session was intended to create random user activity, e.g. the user goes to the main page, searches

for a product, reads the product information, clicks a recommendation, etc.

However, this proved impractical for several reasons. Foremost, it was difficult to create a sampling

session (which would be the exact same) that could be used across a variety of different e-commerce

web-based systems. For example, the devised sample would have involved the user searching, then

clicking the top recommended item, then returning to the homepage to see if there was customised

content, etc. However, as the session progressed it would have been harder to recreate as the content

on the pages always differed, e.g. if different recommendations were retrieved, then different links

would be followed, etc. Likewise, if a product is not found in a search, then the session cannot be

replicated across different systems. This would have been an obvious problem if the systems

22

evaluated sold completely different products from one another. As such, it was decided that the final

iteration of the methodology would not compare the results of one system to another.

Also, it was difficult to describe the session recreated and the results at each stage in a critical way,

without it being tedious.

Obviously, however, assuming the role of a ‘user’ had to be done in order to assess the system’s

personalisation in regards to users, but it was decided at an early stage in the project that an exact

session of user activity would not be used across all the systems to be tested.

4.1.2 Selection of Systems to be Assessed

It was stated early in the project that the assessment would be made upon a variety of different e-

commerce systems. However, when adopting the aforementioned rapid prototyping approach to

testing the devised assessment criteria, it quickly became clear that most major e-commerce systems

all had the same, if not similar, personalisation features that operated on anonymous users. The only

difference was the content returned and recommended to users and changes in the presentation of this

content.

If a system has, therefore, been classified as a major e-commerce web-based system, then it would be

difficult to accurately and confidently suppose why its personalisation features in regards to

anonymous users are not as good as another businesses personalisation features. Any arguments made

would be baseless assumptions and would either be because one business wasn’t as big as another

business or the business has decided that they won’t implement these features. Also, a comparison

between the different systems would not provide more justification for the assessment criteria as the

assessment criteria has been designed and justified in the previous section, based on research and

background reading. It would demonstrate that the assessment criteria can be applied to different e-

commerce systems, but the point of the assessment criteria is to see the extent to which anonymous

users can be offered personalisation; not see how it differs between different systems.

The intention of the report was to see if anonymous users could be offered quality personalisation

and, therefore, the decision has been made to apply the assessment criteria to a well-regarded

successful major e-commerce that offers personalisation: Amazon [2]. The assessment criteria has

been applied to Amazon.co.uk.

This choice has been reflected upon, critically, in the evaluation of this report.

4.2 Methodology

A framework was developed that identified possible personalisation features into the following

categories. Thus, rather than looking for particular techniques (the implementation of which could

23

differ between different systems) it will look for the implementation between categories. These

categories will be discovered and assessed by browsing through the system, but not by using a

predefined session. These categories and why they are being assessed are described in detail below:

4.2.1 Control

Control refers to whether the user has command over the system; whether they can choose to

interrupt, continue/expand (e.g. “See more recommendations) or terminate the recommendation. This

has been distinguished as ‘implicit’ data, rather than ‘explicit’, as the user is not giving the user any

information that the system can interpret for personalisation uses. If the user were to comment on a

personalisation (e.g. “This Was/Wasn’t Useful”), then this would be “explicit”, as the system can

infer preference from this. Rather, the user is controlling the usability of the interface.

4.2.2 Content

Content refers to the accuracy of the recommendations that the system provides to the user. This has

been used as the primary indicator of the system’s quality and the quality will be measured by the

precision of the returned recommendations to the user.

Precision and accuracy, in a personalisation context, is typically measured by:

No of relevant recommendations/ no of recommendations retrieved

The relevance of an item is difficult to measure as it requires the justification and personal opinion of

the user of the system. Whether a recommended item is relevant differs between users to user. Thus

for the purposes of this assessment a clear and justified definition of ‘relevance’ must be given.

The relevance of a returned recommendation will be based on a comparison between the product

details of each item recommended and the original product.

Although relevance is typically measured with recall also, this is completely impractical to do so in

this context [16] as it would involve measuring whether each item is relevant in the entire catalogue of

a system.

4.2.3 Interface

This refers to the usability of the system and its integration of personalisation features into its

interface. One such example would be “Items you Recently Viewed”, “The Page you Made”, etc.

The above categories have therefore been used to develop the assessment of web-based systems, and

the assessments of the selected systems are as follows.

24

4.2.4Coverage/Scope

Coverage typically refers to how much of the system is being searched when recommendations are

being retrieved to users. Using Amazon.com for the assessment criteria allows the ‘novelty’ of an

item to be assessed, as each item in Amazon has a sales rank. This sales rank will be used to gauge the

popularity, and thus the ‘novelty’, of recommendations retrieved by the system.

The average of the sales ranks from the recommendations returned by the system will be taken in an

effort to attempt to measure the possible coverage of the system.

25

Chapter 5

Implementation of Assessment Criteria

5.1 Overview of the Testing Session

The system was tested with a cleared internet cache and all cookies from previous (prototyping)

sessions were cleared. The system assumed that an entirely new user was using it and this user had not

provided any personal preferences or information to the system. No explicit information (e.g.

preferences, ratings) were given to the system. The session only involved searching for products

(which isn’t explicit as it does not infer personal information or preference) and selecting

recommended products from the system.

5.2 Control

During the session, it was possible to alter and

modify the browsing history in order to refine

subsequent recommendations that the system was

going to offer based on the user’s past activity, or

to amend any possible mistakes the user may

make, e.g. so if the user clicks on the wrong item by mistake the personalisation of the system can

still be made to work in their favour.

5.3 Content

The calculations and results of the precision of the systems recommended items are as follows.

The first set of recommendations were provided after viewing only one item (a DVD). The precision

was:

14 (relevant items returned) / 43 (all items returned) = 0.325

The cache was cleared and a new session was created. A different item (software) was viewed and the

precision was:


This was repeated and another new item (sports backpack) was viewed. The precision was:


This was repeated one final time with another new item (jewellery):

Fig 5.1 Fig 5.2

26


The reason this process of finding the recommendations for just one item was repeated so many times

was to see if recommendations depended on the type of item looked at. It was assumed that more

popular items would retrieve more recommendations (after the user just looking at one item) and the

system did so after the user viewed the DVD and software (popular types of item). The accuracy was

fairly high for these items; especially considering the user had only viewed one item.

The jewellery was used as it was an item that few people shopped for (it had a high sales rank;

compared to the low, popular sales ranks of the DVD and software). Although it received a perfect

precision score, this was because it only retrieved two items.

The assumption was made after these early results that the precision of items may increase as the

browsing history of the anonymous user becomes more filled.

The second sets of recommendations were provided after viewing 5 different items in the same

session:


This was repeated again with 5 different items:


This was finally repeated with 5 different items, but items all of the same category (DVDs):


The average of the precision under these conditions had not changed much from the initial testing

with only one item viewed. The precision of the items retrieved was of a similar accuracy. For the

third session, items all from the same (popular) category was used in an effort to see whether the

system would fine more accurate recommendations if the items had been rated by many other users.

However, this only resulted in a small increase of accuracy.

Finally, precision of recommendations provided after a session of viewing 15 different items:


This was again, a similar figure to the past sessions. It seems fair to conclude that the accuracy of

recommendations returned to anonymous users would always be fairly low. Without the user

27

explicitly rating items/and or the recommendations of the system, then the system can only reach low

levels of accuracy.

However, although the accuracy did not increase the more the user built up a browsing session, the

accuracy is still impressive for a system that is operating on an anonymous user that it knows little

about.

5.4 Interface

The system was able to create a personalised page for the anonymous user with the ‘Page you Made’

option; as shown in Fig 5.3

The user is able to view all recent items as well as, as previously mentioned, edit them to their liking.

The recommendations are also available on this page (Fig 5.4) which the user can see more of it they

so choose.

The interface also personalises the main ‘home’ page. It features personalised links to

‘Customers with similar searches purchased’, ‘Recommended Items based on your

browsing history’ and ‘You recently looked at’ personalised links.

Thus, the interface is very intuitive and adaptable, even for anonymous users.

5.5 Coverage / Scope

The results of the coverage/scope of the system are included in Appendix E.

The average rank was 30541. Although this average has been affected by occurrences of

items with very high sales ranks, even a cursory glance at the returned data shows that

Amazon retrieved a mixture of both obvious and non-obvious items; suggesting that it was successful

in retrieving novel results for anonymous users.

5.6 Conclusion

In conclusion, therefore, Amazon was successful to an extent, when being assessed for its

personalisation techniques with regards to anonymous users.

Fig 5.3

Fig 5.4

28

It offered an intuitive and high usable and adaptable interface, offering control to the user, as well as

reasonable coverage of its catalogue of items when retrieving recommendations.

However, the quality of the system did not improve significantly even when the anonymous user

provided more information to the system. For example, the same quality of items (ranked by their

accuracy and relevance) were retrieved even when the user offered more inferred preferences (by

looking at more items). This suggests that collaborative filtering definitely has a limit in regards to the

quality it can offer to anonymous users.

29

Chapter 6

Feasibility of Web 2.0 in E-commerce

6.1 Justification for Proposal of Web 2.0 Features

The reason for considering Web 2.0 features to compliment traditional personalisation and

collaborative filtering techniques arose from a question posed by Schafer [29]; whether tagging can be

used, naturally, in conjunction with collaborative filtering methods to aid personalisation to users.

Tagging, like collaborative filtering, is used to add ‘content’ to already existing information to

potentially benefit other users.

It should be prefaced, though, that tagging is not a form of personalisation. As defined earlier in this

report, personalised content is content that differs from user to user based on the actions and inferred

preferences of that user. The following discussion of tagging, therefore, seeks to find the feasibility of

tagging as a means of replacing collaborative filtering in a effort to achieve the end state by a different

method. Collaborative filtering is used to present the user with content that has been recommended

and, it will be argued, that tagging does the same.

Collaborative filtering tools do not add new content to information, but they do add quality and

relevance to the information in regards to the user that the tools are operating for; by carefully

filtering and ordering information for the user in an automated way. Tagging can achieve similar

results, albeit, very differently.

When a user tags information, it is not just a keyword that is suggested to be associated with that

information; it is a form of collaborative filtering. Tags are often opinions and descriptions that would

not be found by typical content descriptors such as keywords. Rather, like traditional collaborative

filtering, tagging enables users to match other users with informed and relevant recommendations. For

example, if a user finds a particular tag useful, then they can use that tag to find other

recommendations that are relevant to them; this relevancy and quality is determined by the tag and the

effect that the tag has on the user.

Unlike collaborative filtering, the end-user does not need to comment on the quality of

recommendations in front of them to refine the quality of the personalisation that they can receive.

Rather they can choose their recommendations directly, but do so via the system.

30

Tagging involves the user directly and does not require the user to explicitly provide information to

the system; all the user needs to do is select a tag and they will receive other recommendations

relevant to that tag and thus relevant to their personal preferences.

Tagging, therefore, can be applied to the focus of this project as it is ideally suited to anonymous

users. Obviously, tagging requires ‘explicit’ information from users. However, anonymous users can

still enjoy the benefits of other user’s tags without providing information, such as preferences,

themselves. It is feasible, therefore, that the concept of tagging can be applied to e-commerce web-

based systems.

6.2 Impetus for E-commerce

‘User Created Content is now beginning to move mainstream’ [24]

Web 2.0 features, especially the likes of tagging, are typically affiliated with social networking sites

and blogs. However, the reasons for implementing them in an e-commerce domain can be clearly

identified.

Foremost, they are easy and cheap to implement. Unlike collaborative filtering which requires

extensive user datasets and highly refined and tuned algorithms, Web 2.0 boasts accessibility and

interactivity; hence its inclusion in the likes of user created blogs and wikis. Web 2.0 can be used by

anyone; from both a technical and end-user usability viewpoint (it could be argued that this is the

whole point of the 2.0 phenomenon).

The tasks for a e-commerce system can be easily complimented by Web 2.0 features; to improve its

core objective of selling products/services as well as encouraging and fostering social interaction of

the business [24] Product descriptions can be developed by the likes of embedded media in webpages

through to using wikis; each product having a separate wiki page that is not only more detailed that

the generic manufacturer product description, but also one that encourages user thoughts and opinions

[1]. The use of RSS feeds can provide users with instant access to an updated product from a more

mobile location. Likewise, the interaction between customer and service [24] can be improved by

enabling and encouraging user discussions [3] or tagging.

6.2.1 Case Study: Amazon.com

As previously noted in the assessment, Amazon.com has become the first major e-commerce to fully

adopt and embrace Web 2.0 and it shall be used an example of a successful system that has done so.

Amazons wiki, Amapedia [3] offers community-based interaction and sharing, as users of Amazon

can build and contribute to detailed product related wikis. Amazon refers to it as “collaborative

structured tagging” [3] as users must follow guidelines when creating the content.

31

Amazon has launched Askville [2] a social community where knowledge can be shared between users

as they help each other by asking and answering questions.

6.3 Technical Solution of Tagging Limitations in a Mock E-commerce System

A technical implementation of tagging has been developed in a mock e-commerce system that does

not involve moderation. This is found at http://www.staggerlee.co.uk/FYPROJ/home.html and is

described in this section.

The website was created in HTML and uses PHP and MYSQL. Foremost, the tag database was

created using PHP to connect to the MYSQL and run the following query as shown in Fig 6.1

This created the tag database. tag_id was the primary key and set to auto increment; each time a new

tag was entered into the database the tag_id value incremented by one to ensure that every tag had a

unique id associated with it. ‘name’ was the name of the tag that the user’s of the system entered into

the database themselves. ‘url’ was the product that the tag was associated with.

Fig 6.2 displays the code fragment used for the

user to insert a tag into the database; a empty

form field appears on the HTML page which

the user can enter tagging information into.

When the user clicks ‘submit’, this runs the

MYSQL query and inserts the ‘name’ (what the

user has just specified) into the database.

Figure 6.3shows how the tags are retrieved

from the database and displayed on the

HTML page for the user to see. The query

selects the tags from the database, and the

Fig 6.1

Fig 6.2

Fig 6.3

32

PHP while loop is used to store and manage the information that it is retrieving from the database, so

they can be retrieved and then produced and shown on the webpage one by one.

The tags are retrieved and displayed on the webpage as HTML links. When the user selects a link, the

unique tag_id from the database table is used as a variable to be passed into the next page (the next

page being whatever tag the user has clicked). This page can then retrieve all products tagged with

that tag by using the (passed) tag_id.

6.3.1 Identifiable Limitations to a E-commerce Tagging System

Like most tagging implementations, e-commerce will too suffer from the inherent problems

associated with tags. Web 2.0 explains these problems and limitations as stemming from the very

definition of user created content.

User created content (UCC) is typically created outside of the domain that it is being applied on [24].

It is non-professional content being applied to a professional context. This can have obvious

repercussions on the quality and trust worthiness of the content being added, therefore. The users

adding the tags are not being governed and are not restricted by a standardised metric. Therefore,

anything can be added to a tag. Therein lies both the benefits and limitations of tags; users can add

very specific descriptions that collaborative filtering will typically miss, and these very specific tags

can greatly benefit another use who thinks that this tag is relevant. However, users can also add tags

that are, essentially, useless. This is displayed in Fig 6.4.

Fig 6.4

Fig 6.5 is an example of tagging. Some users may find tags such as ‘police’ or ‘cyberspace terrorism’

useful because these are tags that are descriptive of the product that would typically not be found in

the content description of the product, and they enable users to find other items with these tags.

However, how many users are likely to find the tag ‘malfeasance’ practical and something that they

were looking for?

[24] offers a solution to this inherent problem by proposing several moderation possibilities; pre-

production moderation, post-production moderation and peer-based moderation.

Pre-production moderation involves all content that is submitted by a user being examined by a

moderator before it appears on the system. This is obviously labour intensive, as, especially for a

33

system like Amazon, every single tag has to be checked by a moderator. This would mean that a

standardised metric would have to be created for the moderator to check the tag against to see if it is

relevant or would have to use their own discretion to determine the relevance.

Post-production moderation means that content is posted immediately, but it can be changed or

deleted afterwards. This, again, suffers the same problems as above.

Peer-based moderation again means that content is posted immediately, but is it at the scrutiny of

other users of the system and if they think the tag is irrelevant, then it can be removed or changed.

Again, this would mean that a standard would have to be set for tags to adhere to (which defeats the

whole point of the uniqueness of tags) and then the users would have to be trusted enough to adhere to

this metric. Especially for a successful e-commerce business, this is empowering the users more than

the business feels comfortable with, and it also trusts that users won’t take an apathetic stance to

moderating the tags.

Thus, the above suggested solutions cannot be applied to an e-commerce domain.

6.4 Conclusion

Tagging is feasible from both a usability and technical perspective to be implemented into e-

commerce web-based systems. Tagging is easy to implement and easy to maintain, as demonstrated

by the technical mock-up website created. Users have become familiar with the concept and how it

works, so tagging is very user-friendly and accessible.

Although tagging would suffer from the ‘cold start’ problem (i.e. when it is first implemented into a

system there are no tags until users begin populating the system) collaborative filtering suffers from

this also, as it needs time and user interaction to build datasets that it can compare to one another.

However, tagging takes considerably longer to ‘get started’; collaborative filtering is an automated

process that the system does without the direct need of the user, whereas tagging depends entirely

upon users explicitly populating the database.

This problem is even more apparent when large e-commerce systems are considered. A system such

as Amazon.com has thousands upon thousands of different products. To make the most of the

potential of tagging, from a recommendation perspective, the majority of products would have to be

tagged.

Likewise, as mentioned previously tagging is typically associated with the likes of blogs and personal

webpages; not huge profit driven organisations. Although some are confident that tagging will be

adopted into the mainstream [24] it is doubtful that it will be done without some form of either

automatic moderation or standardised metric. To do either, however, diverts the nature of tagging.

34

Considering the above discussions, therefore, the feasibility of tagging being implemented

successfully into a e-commerce system is doubtful, as is the likelihood of it replacing traditional

collaborative filtering techniques.

35

Chapter 7

Evaluation

7.1 Establishing Criteria

To properly establish criteria to evaluate both the assessment developed to be used on personalisation

systems and the feasibility of using Web 2.0 as a means of replacing collaborative filtering the

following criteria, and the justification for using it, has been considered. The evaluation has been split

into two sections; evaluation of assessment scheme proposed by the project and evaluation of the

feasibility for Web 2.0 in an e-commerce context.

7.2 Evaluation of Devised Assessment Scheme

7.2.1 Assuming the Role of a Anonymous User

Assessing a system for its effectiveness upon anonymous users will inevitably mean that the

assessment will involve recreating the actions of an anonymous user in the system. As the definition

of anonymous user was considered and justified for the methodology of the project, the same

definition was applied to the assessment criteria.

Apart from clearing the cache information and the cookies from the browser used to assess the

system, there was little else that could be done to assume the role of an anonymous user. The

information passed into the system was always implicit, as justified and defined in the project’s

methodology.

The study could’ve have been perhaps extended with multiple anonymous users by perhaps

conducting a group testing session. However, as noted in the project’s methodology, it is not only

difficult to assign possible tasks to anonymous users, the system cannot identify these tasks and use

them to any advantage. It would have been similarly difficult to properly inform user’s of the

assessment scheme used in this methodology and it would have been difficult ensuring that this

scheme was followed.

As such, conducting a group session of anonymous users would have probably gained the same

results, although it would have enabled a qualitative assessment of data.

7.2.2 Sample of Systems Tested

36

Initially, it was decided that a sample of systems was to be tested. However, as the project progressed

and the methodology was justified and clearly defined, it became evident that this sample of different

systems was not needed.

However, the results of the assessment criteria cannot be applied to solving the problem indefinitely

as defined in the initial problem statement as only one e-commerce system was tested. Although a

justification for this was argued in the methodology, one assessment is not indicative of major e-

commerce businesses and their personalisation techniques.

7.2.3 Assessment of Personalisation – Interface and Control

The assessment of the system identified the personalisation techniques that affected the interface of

the system, but did not state a clear measurement of why or how these interface changes were helpful.

An informed usability criteria should have been devised to measure the features implemented, rather

than just identifying the features.

7.2.4 Assessment of Personalisation – Content

The assessment measured the recommendations provided by the system to the user; this was used as

indication of the quality of the collaborative filtering techniques employed by the system and was

measured by using precision techniques.

The background reading presented many different quality measuring criteria’s that have been used on

personalisation techniques; but they were only applicable on users who provided explicit information

to the system. For example, users who rated the recommendations that the system provided them with.

However, as anonymous users could only use implicit information, the relevance of the items returned

had to be determined without telling the system whether they were relevant or not.

The problem with this approach is obvious: relevance was defined by using matching keywords in the

original items description to keywords found in the recommended items descriptions. Preference and

user opinion was not (and could not) be used.

The quality of content cannot be measured scientifically; only the user can decide whether a

recommendation is good or not. However, an informed, relevant attempt to measure accuracy had to

be made and precision was the most viable assessment technique in these circumstances.

7.2.5 Assessment of Personalisation – Coverage

The scope of the system’s recommendations was measured by using Amazon’s sales rankings. The

average was calculated from the range of sales ranks and this was used to argue whether Amazon had

suitable scope and whether obvious or novel items were returned to the user.

37

Again, like content, whether an item is novel or not depends entirely on the user. However, the

method used in this study does prove indicative of whether a item is novel or not as it can be argued

that a obscure item will have a low sales rank and a very popular (obvious) item will have a high sales

rank. Thus, there is a correlation between the two.

However, using sales ranks can only be applied to Amazon.com, as it was found in the rapid

prototyping stage that most other major e-commerce businesses do not disclose that information.

Therefore, the assessment criteria devised cannot be applied to other systems to be assessed in this

respect and the assessment criteria does not offer another suitable alternative to u sing sales ranks.

7.3 Justification of Discussion of Web 2.0

The question of the feasibility of using Web 2.0 in major e-commerce systems was explicitly raised in

the literature [29] and this is why it was included in the body of the report. Likewise, as discussed

previously, Web 2.0 features can be found in current existing e-commerce systems and the

background reading suggested that the techniques and their implementation are increasingly becoming

‘mainstream’.

Unlike the assessment criteria, the sources and readings available that critically discussed Web 2.0

were scarce. It could be argued that although the discussion and conclusion of the feasibility of Web

2.0 was been critical in its identification and effectiveness of the tools and techniques, but as there

were few available reading resources used, it may not have been justified and informed.

7.4 Evaluation of Software Produced

The software produced will be evaluated according to its usability, technical feasibility and relevance

to the main discussion of Web 2.0.

7.4.1 Usability of Mock-up

From a usability perspective, the mock-up was successful as it followed the standard ‘guidelines’ of

implementing tagging. The user could easily submit their own tags and the tags of other products were

retrieved automatically. It was not, however, assessed against an established and informed usability

criteria.

Although moderating of tags were discussed and considered, the mock-up did not offer a working

solution to this inherent problem.

7.4.2 Technical Feasibility

38

The technical feasibility of the mock-up was discussed. It was argued that, as tagging involves PHP

and MYSQL, it was perfectly reasonable that a major e-commerce system could adopt it. However,

the mock-up did not take into consideration the size and scope of major e-commerce systems.

Unlike the mock-up a major e-commerce system would have thousands of different products, which

would be growing exponentially daily. This would mean that if tagging were implemented, the

tagging database would have to support this and, with potentially hundred of tags for one product, the

efficiency of retrieving tags could worsen over time. Likewise, although the conclusion of the

feasibility did identify the ‘cold-start’ problem, it did not identify the fact that users would have to

constantly be tagging as more and more new products were added to the system.

7.5 Evaluation of Project

As evidenced by the different project schedule iterations, the project suffered many different set-

backs. The most obvious one being a failure to properly commit to a devised assessment criteria and a

difficulty in quantifying a scheme that was justifiable and relevant to the problem and scope of the

project.

39

References

[1] www.Amapedia.amazon.com [Accessed April 2008]

[2] www.Amazon.com [Accessed April 2008]

[3] www.askville.amazon.com [Accessed April 2008]

[4] R Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. Harlow :

Addison-Wesley Longman, 1999.

[5] C Bell, Personalisation in E-Commerce, School of Computing, Leeds , 2004

[6] A Bergholz. Coping with sparsity in a recommender system. In WEBKDD 2002 : Mining

Web data for discovering usage patterns and profiles : 4th International workshop, pages

86–99, Edmonton, Canada, 2002.

[7] C Delong, P Desikan, and J Srivastava. User : User-sensitive expert recommendations.

In PKDD 2000 : Principles of Data Mining and Knowledge Discovery : 4th European

Conference, pages 77–95, Lyon, France, 2000.

[8] C Delong, P Desikan, and J Srivastava. User : User-sensitive expert recommendations.

In PKDD 2000 : Principles of Data Mining and Knowledge Discovery : 4th European

Conference, pages 77–95, Lyon, France, 2000.

[9] E Frias-Martinez and V Karamcheti. A customisable behaviour model for temporal prediction

of web user sequences. In WEBKDD 2002 : Mining Web data for discovering usage

patterns and profiles : 4th International workshop, pages 66–85, Edmonton, Canada, 2002.

[10] A Geyer-Schulz, M Hahsler, and M Jahn. A customer purchase incidence model applied

to recommender services. In WEBKDD 2001 : Mining web log data across all customers

touch points : Third International Workshop, pages 25–47, CA, USA, 2001.

[11] A Geyer-Schulz and M Hasler. Comparing two recommender algorithms with the help of

recommendations by peers. In WEBKDD 2002 : Mining Web data for discovering usage

40

patterns and profiles : 4th International workshop, pages 137–158, Edmonton, Canada,

2002.

[12] Ana Gil and Francisco Garcia. E-commerce recommenders: powerful tools for e-business.

Crossroads, 10(2):6–6, 2003.

[13] M Grcac, D Mladenic, B Fortuna, and M Grobelnik. Data sparsity issues in the collaborative

filtering framework. In PKDD 2000 : Principles of Data Mining and Knowledge

Discovery : 4th European Conference, pages 58–76, Lyon, France, 2000.

[14] Y Hafri, C Djerabo, P Stanchev, and B Bachimont. A markovian approach for web user

profiling and clustering. In Advances in knowledge discovery and data mining : 7th Pacific-

Asia Conference, PAKDD 2003, pages 191–202, Seoul, Korea, 2003.

[15] B Hay, G Wets, and K Vanhoof. Web usage mining by means of multidimensional sequence

alignment methods. In WEBKDD 2002 : Mining Web data for discovering usage patterns

and profiles : 4th International workshop, pages 50–65, Edmonton, Canada, 2002.

[16] J Herlocker. Evaluating Collaborative Filtering Recommender Systems. In ACM Transactions on

Information Systems, Vol 22. No.1, pages 5-53, New York, NY USA 2004. ACM

[17] J Hipp, U Guntzer, and G Nakhaeizadeh. Data mining of association rules and the process

of knowledge discovery in. In Advances in data mining: applications in E-commerce,

medicine, and knowledge management, pages 15–36. Springer, 2003.

[18] Young Kim. Impact of social influence in e-commerce descision making. In Proceedings

of the ninth international conference on Electronic Commerce, pages 293–302, MN, USA,

2007.

[19] Ron Kohavi. Mining e-commerce data: the good, the bad, and the ugly. In KDD ’01:

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery

and data mining, pages 8–13, New York, NY, USA, 2001. ACM.

[20] P Markellou, M Rigou, and S Sirmakessis. Mining for web personalisation. In Web Mining

: applications and techniques, pages 27–49. Hershey, PA : Idea Group Publishing, 2005.

[21] F Masseglia, P Poncelet, and M Teisseire. Web usage mining: How to efficiently manage

41

new transactions and new clients. In PKDD 2000 : Principles of Data Mining and

Knowledge Discovery : 4th European Conference, pages 530–535, Lyon, France, 2000.

[22] B Mobasher. Analysis and detection of segment-focused attacks against collaberative

recommendation. In PKDD 2000 : Principles of Data Mining and Knowledge Discovery :

4th European Conference, pages 96–118, Lyon, France, 2000.

[23] Bamshad Mobasher, Robert Cooley, and Jaideep Srivastava. Automatic personalization

based on web usage mining. Commun. ACM, 43(8):142–151, 2000.

[24] OECD, Participate Web and User-Created Content: Web 2.0, Wikis and Social Networking, 2007

[25] G Fiss P Perner. Intelligent e-marketing with web mining, personalisation and user-adapted

interfaces. In Advances in data mining: applications in E-commerce, medicine, and knowledge

management, pages 37–52. Springer, 2003.

[26] J Reidl Personalisation and Privacy in IEE Internet Computing, pages 29-31, 2001

[27] A Rosien and J Heer. Automatic categorisation of web pages and user clustering with

mixtures of hidden markov models. In WEBKDD 2002 : Mining Web data for discovering

usage patterns and profiles : 4th International workshop, pages 35–49, Edmonton, Canada, 2002.

[28] A Rosien and J Heer. Lumberjack : Intelligent discovery and analysis of web user traffic

composition. In WEBKDD 2002 : Mining Web data for discovering usage patterns and

profiles : 4th International workshop, pages 1–16, Edmonton, Canada, 2002.

[29] J Schafer. Collaborative Filtering Recommender Systems in The Adaptive Web, LNCS 4321.

Pages 291-324, 2007

[30] V Schickel-Zuber and B Faltings. Overcoming incomplete user models in recommendation

systems. In PKDD 2000 : Principles of Data Mining and Knowledge Discovery : 4th

European Conference, pages 39–57, Lyon, France, 2000.

[31] Antony Scime. Web mining : applications and techniques. Hershey PA, 2005.

[32] A Shahabi and F Banaei-Kashani. Framework for efficient and anonymous web usage

mining based on client-side tracking. In WEBKDD 2001 : Mining web log data across all

42

customers touch points, Third International Workshop, pages 113–144, CA, USA, 2001.

[33] Y Shen, Q Yang, Z Zhang, and H Lu. Mining the customer’s up-to-moment preferences

for e-commerce recommendation. In Advances in knowledge discovery and data mining :

7th Pacific-Asia Conference, PAKDD 2003, pages 165–177, Seoul, Korea, 2003.

[34] B Suryavanshi, S Nematollaah, and S Mudur. Adaptive web usage profiling. In PKDD

2000 : Principles of Data Mining and Knowledge Discovery : 4th European Conference,

pages 119–135, Lyon, France, 2000.

[35] Wu D. A Framework For Classifying Personalisation Scheme Used on e-Commerce Websites, in

Proceedings of the 36th Hawaii International Conference on System Sciences Vol 5 Issue 1. 2002.

[36] H Yang and S Parthasarathy. On the use of constrained associations for web log mining.

In WEBKDD 2002 : Mining Web data for discovering usage patterns and profiles : 4th

International workshop, pages 100–118, Edmonton, Canada, 2002.

43

Appendix A

Personal Reflection

This section is a personal reflection of the project experience and I hope that other students may it useful if they are producing a similar project to mine, or even just producing a project that is an evaluative one.

Perhaps the most obvious issue that comes to mind during the course of my project is the issue of producing a software implementation for an evaluative project. As my project was developing in the early stages I thought that the evaluative aspect of my project would suffice and I was quite hesitant to focus my study on producing a software implementation; having suffered problems in the past with modules which were solely based upon producing a piece of software.

I was therefore quite worried when my assessor suggested that I should consider a software implementation to support my project. I was concerned that the research and production of this software would take more time than my main focus of the study and did not think that I could produce software that was relevant or ‘new’ to the field of personalisation. However, on further reflection, a software implementation greatly supported my solution. As mentioned in the ‘Project Schedule’ of this report, I had great difficulty specifically stating what my assessment criteria was; both in terms of what I would be doing to evaluate the personalisation systems as well as how many systems I would need to evaluate to justify a project such as this. I spent too much time trying to build a ‘project-worthy’ assessment criteria rather than thinking of other directions that my project could take. Eventually, I realised that a discussion of Web 2.0 was not only relevant to my project solution, but could also be used to produce a relevant software implementation. I would advice future students, therefore, to consider producing some sort of piece of software not only because it is generally expected of students studying computing, but also because it will lend greater weight to arguments that have been proposed or hypothesised in a report if they can be seen in a working software example.

I would also advice students that if they are producing a project that deals with a field that has had a great amount of research already (as I have done with regards to personalisation) then they need to find a legitimate and concise reason as to why their doing the project in this field; in terms of identifying a problem that is yet to be solved and producing a solution that is appropriate. Likewise with projects that are solely (or the main focus is) assessing something that already exists. During my project experience, although I thought that the problem that I identified was worthy of both research and evaluation, I found it difficult to produce a proper assessment criteria as there had been a great deal of work committed towards evaluating personalisation with registered users and ‘explicit’ data, but little assessment of anonymous users and ‘implicit’ data. This made it harder for me to not only build an assessment criteria but also difficult to justify the assessment criteria that I had developed.

A more general piece of advice that all students can take heed of is not to underestimate the time it takes to write it up a project such as this. During my project I made notes on each section (Methodology, Evaluation, etc) and thought that I would quickly and easily be able to expand these notes into the full written up sections. However, it takes considerable time and effort to construct readable and coherent sentences from snippets of notes. Especially when the notes have been made several months prior to the write up and the context is muddled!

As with all projects (whether they are worth 10 or 40 credits) time management is absolutely essential.

44

45

46

47

Appendix E

Item No Sales Rank

Item 1 408

Item 2 866

Item 3 2546

Item 4 2260

Item 5 3268

Item 6 5054

Item 7 9391

Item 8 13222

Item 9 5064

Item 10 11264

Item 11 1299

Item 12 188731

Item 13 288

Item 14 7381

Item 15 184

Item 16 165

Item 17 20031

Item 18 1202

Item 19 4104

Item 20 492010

Item 21 574

Item 22 2193

Item 23 3281

Item 24 31890

Item 25 22282

Item 26 9467

Item 27 871

Item 28 2595

Item 29 2213

Item 30 3703

Item 31 30148

Item 32 10679

48

Item 33 12896

Item 34 230937

Item 35 1195

Item 36 31118

Item 37 23235

Item 38 2626

Item 39 475

1 191 116 / 39 = 30541

Documents

Assessing Personalisation Techniques on Anonymous … Problem Statement ... of a webpage to predicting and anticipating the needs and ... retrieve valuable customer data as even if