Upload
truongnhu
View
214
Download
1
Embed Size (px)
Citation preview
The candidate confirms that the work submitted is their own and the appropriate credit has been given where reference has been made to the work of others.
I understand that failure to attribute material which is obtained from another source may be considered as plagiarism.
(Signature of student)
Assessing Personalisation Techniques on Anonymous Users in Major E-Commerce
SystemsJohn SchaerComputing
Session (e.g. 2005/2008)
1
Chapter 1
Introduction
This introductory chapter serves to quantify the aims, objectives and scope of the project and what
will be completed and delivered. The project schedule is also included.
1.1 Problem Statement
Personalisation features and techniques are only fully applicable to users who can be clearly defined
and compared by web-based systems and, although features exist that can operate on anonymous
users, they are not as efficient or integrated as their registered counterparts.
1.2 Aim
The aim of this project is to determine the extent to which personalisation techniques are applicable in
popular web-based systems operating on anonymous users. Using this information, enhancements to
these systems can be proposed.
1.3 Objectives
The objectives of the project are to:
• Identify the different personalisation techniques that exist for anonymous users.
• Determine the extent of applicable personalisation from a justified selection of web-based e-
commerce systems.
• Evaluate the quality of the current customisation techniques.
• Suggest why, where possible, customisation techniques are not being used in the selected Web-
based systems and suggest enhancements that can be made.
• Discuss the feasibility and practicality of using Web 2.0 features as a possible part of these
enhancements.
1.4 Minimum Requirements
The minimum requirements are:
• A report researching and evaluating personalisation techniques that have been integrated in a
selected sample of web-based e-commerce systems, considering their performance on anonymous
2
users. This will then be supplemented by an informed and valid in-depth discussion of the feasibility
of using Web 2.0 to compliment, or even in lieu of, current personalisation systems.
The possible extensions are:
• A technical software implementation of a discussed Web 2.0 feature, with an appropriate
justification of its usability and feasibility in terms of providing personalisation for anonymous users.
• A evaluation of the software implementation.
1.5 Consideration of Past Projects Within the School of Computing
There have been a range of past projects that have implemented personalisation techniques into web-
based systems and one past project that has evaluated and assessed personalisation techniques in
current systems. However, none of these projects have considered the subject of anonymous users and
how successful these techniques are. With the growth of e-commerce web-based systems and
anonymous users, it is a worthy and considerable problem to examine as the basis of a project.
1.6 Project Schedule
The final project schedule is shown overleaf (Fig 1.1.). It is the fourth iteration of the project
schedule. The original project schedule and the second and third iterations are included in appendices
B, C and D.
The original project schedule changed only slightly to become the second iteration after the Mid-
Project Report was returned. This was to accommodate changes suggested by the assessor; notably to
provide a possible software implementation. Although the inclusion of a software implementation
seems as if it would drastically change a project schedule, it should be noted that the original schedule
was perhaps too generous in its original time allocations in regards to the second half (Semester 2) of
the project. The original schedule was followed and on-track for the first half (Semester 1) of the
project
The second iteration was altered after difficulties in quantifying an assessment scheme that would
justify the scope of the project. This involved focusing the project to include just e-commerce web-
based systems. The third iteration reflected comments made in the project progress meeting.
The project schedule is reflected upon in Appendix A.
3
4
Chapter 2
Background Research
To understand the problems that are inherent with anonymous users and personalisation techniques in
web-based e-commerce systems it first must be understood what a personalisation technique is; how it
works, how it is implemented and why it has been implemented. This is important in order to properly
identify what personalisation techniques have been implemented in the web-based systems being
assessed and what limitations these techniques have.
The following research aims to explain this, as well as what anonymous users are defined as in the
reading and what research has been made in the personalisation field and what assessment criterion
have been proposed.
2.1 Introduction to Personalisation
‘The personalisation technology is fast evolving; its use spreads quickly. In the years to come all Web
applications will embed personalisation components and this philosophy will be part and parcel of
everyday life tasks’ [20].
Personalisation of web-based systems refers to the provision of personalised content and
recommendations provided from the system to the user. It has enabled ‘mass customisation’ to
individual customers direct from the vendor and it can range from simple, subtle changes of the
content and presentation of a webpage to predicting and anticipating the needs and goals of a user
[23].
Personalisation and customisation systems have evolved significantly from check-box user applied
personalisation to more integral and intuitive techniques and methods [20]. Every user visiting a
website leaves a trace of their activities in a log file, showing the pages they have visited and their
click patterns [33]. Analysis of such data provides the system with information on how to personalise
or customise the system to cater for a particular type of user. This analysis can then be applied to the
system either through content, structure and/or presentation. It is because of these advances in the
field that personalisation is able to intuitively occur for users that provide little to no personal
information or preferences; anonymous users.
5
As Markellou et al. [20] describe, content personalisation typically produces additional information
for the user (for example, in an E-commerce context, other items/products they may be interested in),
structure personalisation changes the link structure of documents and web pages in a website
(providing or removing relevant links) and presentation personalisation changes the layout and format
of the system produced pages (for example, for use on PDAs or mobile phones). Thus the analysis of
data is only as useful as the subsequent delivery of information to the user [12]. If the user cannot
interpret the recommended information, as reliable as it may be, then it is useless.
2.2 Impetus for the Implementation of Personalisation
Survanashi et al. [34] writes;
‘Understanding behaviours, characteristics and preferences of users is essential for satisfactory user
experience. This can help companies retain current customers, attract new customers, improve cross
marketing and sales...’
Through a better understanding of customers and their goals and intentions, web-based systems
(particularly e-commerce systems) can better tailor their efforts to meet these needs as well as going
beyond, through marketing such as sales promotions and advertisements [28]. A highly personalised
and customisable allows for particular competitive advantage, too, for e-commerce based systems
[15][12], offering a better user experience [19] and, again, encouraging repeat-visits and repeat-
buying from customers.
Kim [18] details that there is a correlation between the advent of social networking sites (such as
Facebook and MySpace) and their provision of consumers to e-commerce sites. They are
complimenting the trust-worthiness of recommender systems as whereas before a person would be
more influenced by a personal recommendation by a person they knew but are now more prepared to
believe a systems recommendations.
It can also prove hugely beneficial for the owners of the systems. Particularly in an e-commerce
domain, personalisation allows for the quick testing and integration of the introduction of new
products and new marketing campaigns [19] It is especially beneficial for web-based systems to try
and incorporate personalisation systems for anonymous users as it may encourage them to register and
reuse the site again in future [34].
This is critically discussed and considered further in the development of the project’s methodology.
6
2.3 Knowledge Discovery
Knowledge Discovery is the process of extracting potentially valuable patterns from data to
retrieve behavioural content [31]. Web mining methods are an example of knowledge discovery.
Knowledge Discovery is complex, iterative and highly interactive [17].
The Knowledge Discovery Process is split into 5 main steps, as described by Schulz and Hahsler [11]
First is the selection of data, whether to use existing transaction logs or collate data especially for the
purpose of generating recommendations. The selected data is then pre-processed and cleaned for noise
and inconsistencies before being transformed into a format suitable for data mining. The data can then
be data mined for significant, interesting patterns, after which it is interpreted in the context of the
system it will be used for. Finally, the results are formatted and presented in an appropriate way (i.e. a
recommendation system).
2.3.1 Web Mining
Web mining refers to three knowledge discovery domains; Web Content Mining, Web Structure
Mining and Web Usage Mining, as Hay et al [15] describe;
• Web Content Mining
Web Content Mining extracts knowledge from the content of documents and their descriptions.
• Web Strucutre Mining
Web Structure Mining extracts knowledge from the organisation, structure and linking of webpages.
• Web Usage Mining
Web Usage Mining (WUM) extracts knowledge from the usage patterns of webpages. The user leaves
a trace of their activity and their click patterns in a log file made automatically by websites [27]. This
behaviour is then compared to an already constructed offline user access mode [32]. Through this
process, the system is able to implement varying degrees of personalisation and customisation
techniques by identifying the user’s individual needs and preferences.
However, particularly for leading E-commerce web based systems, this generates a huge amount of
data; analysis of which is hard to do by hand [13].
7
2.4 Data
The construction and maintenance of datasets bring with them formidable challenges. They are not
only intrinsically large in nature and expected to grow even further, but they also contain missing
entries. For example, customers don’t view all webpages and they do not purchase all items [26]. The
growth of the system also means that the predictions have to be constantly updated; else they will
have a very short lifetime [36]. Over a long period of time, therefore, existing user patterns that have
become invalid must be pruned so they do not taint future patterns [21]. There are three sets of data
that can be used in the mining of users interactions to build recommendation systems; user entry data,
server logs and cookie logs [25] and application server logs [19].
2.4.1 User Entry Data
User entry data is explicit information provided by the user in a direct way, such as online forms or
questionnaires [25]. These can include the likes of personal user preferences and interests, but
typically contain information such as name, address, etc, to be used as part of the registration process
for users.
2.4.2 Web Server Logs
A web server log is automatically generated by a server when a user visits the web pages of its
website [25]. A server log can contain information such as the IP address of the user, what webpages
the user accessed, the time the user visited the website and the overall time the user spent at the
website [25].
Although most servers generate web server logs, they should not be the only source of data used for
web mining. The limits of web server logs are explained by Kohavi [19]. Foremost, web server logs
do not capture events critical to both E-commerce and web mining; such as ’add to cart’ and ’change
quantity’. Capturing such events could retrieve valuable customer data as even if a shopping cart has
been discarded before the final purchase stage, it would still present valuable user information and
preferences. Web server logs also do not capture form information that the user has filled in on
webpages, which, again, would be very useful in web mining. Finally, most of the data in web server
logs is completely redundant to web mining. For example, they contain the requests for every image
8
on every page, which is of no use to web mining as every user has to request those images so no
differentiation or preferences can be derived.
2.4.3 Application Server Logs
Application Server Logs (logging at the application layer) offer many solutions to the problems
raised in web server logs. Again, as Kohavi explains [19] they can capture critical events and
web forms that web server logs cannot, they eliminate redundant trivial data, and as they are
set on the application layer which controls and identifies user sessions and registration as well
as user logins and logouts, all this information can be logged.
2.4.4 Cookies
Cookies are automatically generated in the form of a small text file by a server each time a
user visits the web pages of its website also [25]. Cookies attribute each user with a unique
identification code, rather than using an IP address, so they can be identified each time they
visit the website [25].
2.5 Users
The primary aim of customisation and personalisation is to automatically tailor the system to the users
needs without inhibiting the user anymore than necessary. The use of user surveys and questionnaires
are both intrusive and labour intensive [28][25] and are referred to explicit user information. Explicit
profiling typically involves generic user information such as data of birth, address, etc, as well as
dynamic changing information such as favourite television program [34]. As such, web based systems
have to extract the users preferences and aims from their interactions with the data [25]. Typically,
web users are hesitant to provide explicit personal data, from a both effort and privacy viewpoint [20].
If the identity of the user of the system is unknown, then the system cannot construct an individual
purchase history [10] as it has no reference to the user. The user may well have visited the system
before, but the system is unable to differentiate one anonymous user from another. As such, the only
information available regarding anonymous users are their navigational activity and interactions with
the webpages. Thus, every action of the user must be used and considered as accurately as possible
[32].
2.6 Modelling User’s Patterns
9
2.6.1 Clustering
Clustering is a method either for grouping together similar patterns found from users browsing
sessions or for grouping the web pages of a website into groups that are accessed together [4]. This
creates a neighbourhood of user models, whereby similarities between users can be weighed against
and matched to neighbouring matching attributes [25][18].
2.6.2 Association Rules
Association rules aim at discovering all frequent patterns among transactions [9]. It refers to a
sequential data pattern; typically the user accessing a set of pages in a given order.
2.7 The Web Environment
The World Wide Web has many inherit problems which can impede the process of data mining on its
domain. Yates and Riberio-Neto [4] approach some of these problems. Firstly, the very nature of the
Web, a volatile, unstructured and dynamic environment spanning many different computers,
architectures and platforms, poses obvious mining challenges. The data within varies greatly in
quality, structure and even format (for example, Chinese and Japanese alphabets). Yet, it is this sheer
scope of the World Wide Web that brings many potential customers for E-commerce systems to
exploit.
2.8 Web 2.0
Web 2.0 is a term that has been coined to describe an advent of concepts that promote and encourage
user participation and collaboration that aid both information retrieval and creation [24] using the
platform of the World Wide Web. Popular features, or tools, that typically describe or are associated
with Web 2.0 are wikis, blogs, tagging, virals, etc. However, the phrase is often extended to also
include tools that not only afford greater interactivity between user and system, but also that seek to
close the gap between user and system, e.g. sharing media (e.g. embedded flash video files as part of
the content of webpages), RSS feeds, etc [24]. It has become a popular buzz-word to describe the
omnipresent nature of the world wide web in popular culture.
2.8.1 Tagging
10
Tagging is a form of user created content [24]. It enables users to describe content, from products for
sale on an e-commerce website to articles written by someone on a blog, with succinct keywords. The
purpose of tagging is to help categorise information with succinct and concise keywords for the users
own benefit or for other users, as well as to comment on information by adding such content.
2.8 Recommender Systems
Recommender systems are increasingly becoming a standard across retailer web sites [11] as
previously observed customer behaviour is most important feature to determine future customer
behaviour [10]. Their goal is not to recommend products necessary to the user but rather to boost
sales or products, or to benefit the system in some way [8]. A recommender system is an anonymous
implicit system which uses observed user behaviour to provide links to related webpages, without
significant human effort and interference [10].
2.8.1 Collaborative Filtering
Yang and Parthasarathy [36] define collaborative filtering as;
‘Collaborative filtering uses consumer specific historical data to classify users into different
groups such that each group has similar interests or buying patterns.’
After these classifications have been constructed, it can then predict a users possible future interest by
referencing the group that they have been classified into. As such, collaborative filtering can find
relationships between items that have no content similarities but are linked between the users
accessing them [13]. Thus, the recommendations are collected implicitly from the users as they are
not expected to give input. Rather, the users interests are generated using the web logs they have
created [34][12].
By its nature, collaborative filtering is not accurately applicable to completely anonymous users as
they cannot be classified or compared to these constructed groups based on preferences. The system
can only infer preferences from anonymous users, which may or may not be accurate. Approaches
that have been developed to combat this, such as offline clustering of page accesses, have thus far
proved expensive and process-consuming to implement [36]. In the context of recommendation
systems, collaborative filtering is applied in an item-to-item sense; products are recommended to
users based on other users experience [30].
11
Collaborative Filtering suffers many significant problems, notably, that of sparsity [6] If an item has
only a few ratings then it will not be used as its correlation between other items cannot be calculated
[6]. Such sparsity is likely to occur particularly with less popular items or ones that are new to the
system and thus have yet to be rated yet. It can also occur if there are simply more items than there are
users who can rate them [15].
As systems grow, as does the complexity and scalability of the recommendation systems. A trade-off
between keeping user details up to date whilst keeping response times low [14] as to not affect the
users of the systems. Also, an increased amount of users and products means that the system has to
therefore look at more and more users and products [30][13]. Failure to do so, particularly in an e-
commerce web-based system where business and consumer patterns are likely to change dynamically
and quickly, can lead to incorrect and invalid recommendations to the users [33].
2.8.2 Preference Based Recommendation
An alternative to Collaborative Filtering is Preference-Based Recommendation whereby the user is
explicitly prompted to express their preferences so that the system can retrieve results and
recommendations based on these preferences [30].Whilst this method can operate successfully on
even a large set of volatile attributes without suffering from aforementioned latency, sparsity or
scalability problems, it is intrinsically based on the input from the user. To build a successful
preference based model, the user would have to input a large amount of detailed, correct information
[30]. From a HCI perspective, especially, it may be hard to actually prompt the user into giving such
information as although the user knows what they want, how can they succinctly input this
information into the system without ambiguity or confusion. Also, if it is dynamic information and
likely to change over time then the user has to remember to change their preferences accordingly as
time goes on, else the information become useless in recommendations.
There is also the issue with users actually being bothered rating items. Recommender systems are
dependent on user input; they are integral to the success of the recommendations. Yet, as Bergholz
succinctly writes [6] ’people like to review items they like, they don’t bother with the rest. But that
gives an incomplete picture to the system’. This is, of course, arguable, as if a user has a particular
gripe with a product they are even more likely to review it then they would if they were entirely
satisfied with it. It would be more unlikely for a user to be bothered to review a middling item, but the
general apathy of users reviewing items (and thus contributing to the system) is a valid point to
consider.
12
The major benefit of recommender systems comes from the fact that they can recommend
items/products to the consumer without actually having had to analyse the item/product being
recommended; [6] it can do so on the strength of either other users ratings or from the analysis of
learned usage patterns. Of course, therein lies its major weakness also as it is not only questionable
how reliable such ratings are, but such systems are prone to attacks such as the insertion of false
information to deliberately manipulate the ratings, known as shilling [36][30]. The aim of such attacks
are to bias the systems output [22] perhaps to the manipulators advantage (e.g. on auction systems
such as Ebay.com). It has been proven possible to successfully attack as system without knowing
much about either the system itself or its users.
2.8.3 Recommendations: The Banana Problem
Herlocker [16] states that a system can retrieve obvious and popular recommendations that are
accurate with reasonable coverage, but are in fact impractical and almost useless. This is because the
items are so popular and so obvious that, for the mass majority, they seem to be accurate because
most people like them or have bought them. This is commonly referred to as ‘the banana problem’ (or
the ‘Harry Potter’ problem) as it is the equivalent of a system recommending bananas to a user’s
grocery list. The system will think this is a good recommendation because it will be able to match a
current user’s shopping to other (past) user’s in the system and most of these other users will have
bought bananas. So bananas seems like a good recommendation. However, most customers buy
bananas and the ones who don’t typically have a good reason for not buying them. So suggesting
bananas to a user who doesn’t have bananas on their list isn’t a good idea, because they’ve already
made a concrete decision not to buy them – they’re not buying them because they haven’t heard of
them before.
Although impractical, Herlocker notes that obvious recommendations do have use to the user as they
produce confidence in the system, albeit, superficially. If a user is recommended an item that they
would probably have bought (e.g. a banana), or were going to buy, they think that the system is
accurate. However, the system may just as well recommend the most popular items, without
comparing users to each other, because it will have the same results.
2.8.4 Recommendations: Novelty
Herlocker [16, 4] also introduces the concept of novel recommendations and stresses the importance
of them. A item is considered novel if the user is not aware of it and would not have been aware of it
were it not for the recommendation provided by the system. Although important (it could be argued
that the whole point of recommendation systems is to provide the users with items that they did not
13
know about and match their needs and wants) the achievement and success of novelty is difficult to
measure as it differs entirely from user to user.
2.9 Direct Recommendation and Pervasive Recommendation
A direct recommendation systems responds to the users direct interaction with the system with
a direct response to their request [12], e.g. with a recommendation. Pervasive recommendation
systems produces advertising and marketing. The marketing produced in a system (e.g. banner
ads in a webpage) is personalised to the current user [12].
2.10 Evaluation
As recommender systems use complex algorithms and heuristics, they are difficult to evaluate.
Evaluation methods typically only evaluate how well the patterns from datasets are discovered,
rather than the usefulness of the recommendations themselves. As such, their performance can
be measured against their impact on the system that they are supporting [11]. This can be, for
example, the additional sales that the implementation of a recommendation system generated.
They can also be evaluated from a HCI perspective for the presentation of the recommendations
generated [11]. Evaluation methods are discussed in more detail in the Methodology chapter of the
report.
2.11 Previously Developed Tools and Techniques
Although a variety of approaches exist in literature attempting to provide personalisation to
anonymous users [15][14] [25][32], these approaches are limited in several respects. The different
techniques developed are applied on a variety of different websites, so comparison of the
results is difficult [28]. Furthermore as only the user traces are used there is no way of knowing
to what extent the techniques were successful [28]. One such method is the use of Markovian
chains which uses prior categorisation of web pages to model clickstreams on a transition matrix
between page categories [14]. Another approach is the use of Open Profiling Standard (OPS)
[25][20]. This is where a user builds a profile offline which can be integrated into the Internet
browser they are using. These profiles can be accessed by each website the user visits, providing
instant access to reliable information (specified by the user themselves) regarding the user. [32]
details the Feature Matrices (FM) model which can adapt to user behaviour in a real-time
environment., as well as consider partial navigation patterns. [15] describes the development of
a tool called INSITE which uses on-the-fly customisation, rather than offline models, against
the users navigational trails.
14
One significant problem raised by existing approaches is the inability to monitor and record
how long a user spends looking at a webpage [15]. This can be achieved by calculating the
time difference of two consecutive requests from a user [13] and such data could form interesting
patterns, as it could be deduced that the longer a user looks at a page then the more interested
the user is in that information. Of course, this is still an assumption in itself as the user may
have left the computer and left that page open or there may be more information on that web-
page than other pages on average, for example. Regardless, if such data could be interpreted
successfully then it would be a contributing useful factor in web mining.
The very nature of the Internet raises problems also. The Internet is inherently a changing dynamic
environment and techniques for personalisation have to adapt accordingly, by changing
the user model used to provide users with personalisation and recommendations [34].
15
Chapter 3
Designing an Appropriate Assessment Criteria
Designing a methodology that will provide a assessment of personalisation techniques in regards to
anonymous users needs to consider many important factors that will affect the quality (and subsequent
assessment) of personalisation techniques in response to anonymous users. These include why
anonymous users (identification) exist and the extent which they can be ‘used’ by a system and why
e-commerce systems have been selected for assessing and why these systems may use personalisation.
These must be considered else the quality of the system cannot be determined.
3.1 Defining Anonymous Users
As the background reading has highlighted, an anonymous user is one who offers no data (that can be
turned into information) to the system. In relation to this project and assessing personalisation
systems, the truest definition of an anonymous user cannot be used as there will be no data
whatsoever for the system to use. Also, different works found in the reading use different meanings
for both anonymous users and implicit information.
Thus, for the purpose of this study an anonymous user will be defined as one that offers minimal data
to the system. The data will be implicit and passive rather than explicit, e.g. the anonymous user will
not create a profile or enter personal information or preferences. The system will have to assume these
preferences and interests from the (minimal) data supplied.
As such, it would be unfair to assess the quality of an e-commerce system in comparison to a ‘known
user’ as the personalisation will be completely different; both in terms of what is supplied and the
quality of it. A known user will not only have had previous interaction with the system but will have
explicitly provided data too. Thus they can not only expect informed recommendations that are of a
higher quality, but also different personalisation techniques too.
3.2 Customer attitudes
Privacy is a notable factor as, as [26] states, the more the users gives to a system, the more the system
can give back to them in the form of a personalised and unique experience. However the more
hesitant and concerned the user is with regards to submitting personal information the less likely it is
that they will receive an intuitive and useful experience from the system.
A particular privacy concern arises from click stream data, retrieving information automatically form
a users IP address, etc. It is obvious that these may cause concerns from users who feel that what
16
they’ve been doing and where they’ve been doing it is being recorded [26]. There are personalisation
techniques that automatically use a user’s IP address to provide personalised content to a user, but
these are typically associated with nefarious websites or pervasive marketing (“Find Singles in your
area!”).
The reasons as to why a user may be considered with privacy are numerable. Foremost is the issue of
advertising. A particularly unwelcome consequence from the advent of the World Wide Web is more
intuitive marketing. With users being bombarded with advertising as they merely browse webpages
through to them sending an email, it is no surprise that a user may be hesitant to give something like
just their email to avoid to this hassle; especially if they are fearful of specific tailor-made targeted
marketing that exploits of all of their personal data. Or, if this information is not used by the web-
based system that the user is using, that it may be passed on to another organisation. So, in respect to
this project’s methodology, a site that uses IP addresses or pervasive intuitive marketing will not be as
a sign of quality personalisation – as they are seldom complementary to the user.
Such concerns are relevant to the preparation of a methodology as they can restrain the
implementation of personalisation techniques. For example, systems will shy away from using tools
that users think are intrusive or suspicious; e.g. using an IP address to providing instant location
information, and thus should not be ‘penalised’ for not using them. It also potentially means that
systems have to be more adaptable with the potentially little information that they can receive from
users.
3.3 E-commerce Systems
Although the background reading has shown that personalisation features can be applied across a
variety of different web-based systems such as search engines, social networking websites,
information retrieval websites, etc, e-commerce has been chosen as the focus of the evaluative study
for several reasons. Foremost, the different types of web-based systems must be considered in relation
to anonymous users.
Simply, an anonymous user cannot use a social networking website for its intended purpose without
contributing explicit information to the system as the purpose of such websites are to build and show
(explicit) relationships between different users. So they obviously cannot be used for this study.
Secondly e-commerce based systems have the strongest incentive to offer personalisation to its users.
Although this point is arguable, it must be stressed that the primary purpose of an e-commerce system
is to make profit. They can do so by either retaining users so they visit and the use the site more often
or by persuading users to buy more products. Both of these can be achieved by successful
personalisation and are explained in further detail below (System’s objectives).
17
Finally, the implementation of personalisation techniques has become a set standard in e-commerce
systems. Successful e-commerce systems are expected to at least have a basic personalisation
interface, whereas not all search engines or information retrieval systems, for example, necessarily
need to deploy such techniques to achieve their primary objectives. The ubiquitoness of using
Amazon.com as an example of implementing successful personalisation in the background reading
may be used as corroborating proof of this.
3.3.1 Impetus for E-commerce Personalisation
The background reading has touched upon the importance and effectiveness of personalisation
especially in regards to e-commerce systems. The reasons for e-commerce systems utilising and
implementing these techniques would provide an indication of quality; does the personalisation
achieve the systems objectives and goals, and does it fulfil the user objectives and goals?
3.4 Identification of System’s Objectives
Three main reasons can be identified as to why an e-commerce system would personalise. These are
summarised below;
1.) It draws attention to the business and its products and services, so that it will hopefully turn
browsers into buyers and gain consumer loyalty
2.) It implants a message; impacting upon the user in such a way that the user will remember the
business
3.) It persuades and cross sells; how personalisation can convince a user to consider another
product and convince a user to use their business over another.
It is, however, difficult to infer from these objectives how effective the personalisation implemented
actually is in achieving them. Evaluating the personalisation techniques of a business from the
perspective of how the personalisation has increased sales is out of the scope of a project such as this
as simply it would require a mass of sales data from e-commerce businesses. Even then, the sales data
may not necessarily prove that a user bought an item purely on the strength of a provided
recommendation. Likewise, evaluating how a system has implanted a message or persuaded users
would require intensive studies of users; and it would be difficult for users to confidently then say that
they bought an item because of just the persuasion of the system’s personalisation.
Thus, from an assessment perspective, the success of a system achieving its (identifiable) objectives is
difficult to evaluate.
18
3.5 Identification of User’s Objectives
A user may be using an e-commerce system for several reasons [16]:
1.) To buy, or look at, a particular product
2.) To compare prices
3.) To read/write user reviews of a product
4.) To receive recommendations from the system
5.) Browsing with no clear intent
6.) Find a mixture of new and old items
When applied to the assessment of personalisation techniques, studying user’s objectives may prove
as inconclusive as the achievement of the system’s objectives. Users may not be able to explicitly
state whether they definitely would buy a product just because of a recommendation from the system.
For example, the user may have heard of the item that the system is recommending and in such a case
the system is just prompting the user to buy it; they may very well have been intending to buy the
item. Likewise, if a user buys a product because of another users review of that product can it be said
that the system itself has provided the recommendation? So, in such examples, can the system be said
to have supplied a ‘quality’ recommendation?
Thus, with respect to a user’s objectives, a system will find it even more difficult to associate an
anonymous user with a given task as the system has no prior information about that user and has no
indication of what the user might be doing. A user may use a system at first just to ‘test the water’;
they may be evaluating it through using it and, if they find the system effective, they become a
registered (perhaps paying) user. Therefore the assessment of personalisation techniques cannot be
made through the identification of user’s tasks.
It would be similarly difficult to recreate a particular user objective as a means of assessing a system;
how would a ‘comparing prices’ session clearly differentiate from a ‘find a mixture of new and old
items’ session?
3.6 Consideration of other relevant methodologies and approaches
There is no evidence in the literature of evaluations focused entirely on anonymous users. As such
there is no predefined standardised metric that can be used across a range of e-commerce systems for
measuring quality in regards to anonymous users. However, there are evaluations of personalisation
systems which can be considered to justify and compliment the chosen methodology for this study.
19
3.6.1 Herlocker
Herlocker’s approach to evaluating recommender systems is particularly exhaustive and raises many
important issues. Foremost, it discusses the fact that different algorithms are developed and
implemented across different domains for different reasons, e.g. a large dataset will use a different
algorithm to one that has a small dataset, a system with more users that items will use a different
algorithm to one that is vice versa, etc [16]
This is a contributing reason as to why this study is focusing on ‘major e-commerce systems’ as it can
be assumed that the system’s datasets and the algorithms used on them will be similar, if not the same.
Another point raised is how a user who just intends to browse a system will prioritise the interface and
usability of the system rather than necessarily just the recommendations offered (which will still be
useful). The usefulness of the system, it can be argued, must be a priority in an evaluation of a system
anyway – a system cannot just depend on accuracy of recommendations.
As discussed in the background reading, novelty is considered “one of the most valued characteristics
of systems recommendations” [16] and refers to items the user wasn’t aware of and probably wouldn’t
know about if it weren’t for the system. However, its quality and effectiveness cannot be measured in
any qualitative way as how obscure an item is depends entirely upon the user. It would be even harder
still for anonymous users as the system is at the disadvantage of not entirely knowing the user’s
preferences, buying history, etc. The quality of such a recommendation would rely on explicit
prompting of the user; “was this useful?” and as such cannot be used with regards to anonymous
users.
3.6.2 Wu
Wu’s methodology is considerably less comprehensive than Herlocker’s evaluative techniques. The
methodology [ 35] awards ‘weights’ for implementation of both explicit and implicit personalisation
techniques and these are tallied up to give each system a score and an indication of the amount of
personalisation implemented. Although it does consider implicit (and thus anonymous user)
personalisation, this is only in conjunction with explicit personalisation and it does not measure the
accuracy of the personalisations: just the ‘presence’ of them.
It is useful, however, in its identification and classification of personalisation techniques that can be
implemented in systems.
20
3.7 Conclusion
Thus with these concerns systems need to be more adaptable with the potentially little information
that they can receive from the users. Users are apprehensive that their actions are being watched and
assumptions are being made about their habits and interests.
21
Chapter 4
Developed Methodology
Considering the aforementioned possible inhibitions upon critically assessing the quality of
personalisation features on anonymous users, a justified methodology has been devised. The
following chapter discusses the different iterations taken towards developing this methodology. A
rapid prototyping approach was adopted for each iteration to test whether it would be feasible and
practical as an assessment technique.
4.1 Initial Decisions
Considering the background reading and the possible inhibitions to an assessment such as this, the
following decisions were to be implemented in the final methodology. Foremost, the method of using
a ‘check-list’ of possible personalisation features that can be used on anonymous users and then
seeing if the systems had these features implemented was not going to be used [5] as this ignores
context and makes the assessment of the systems difficult; especially as the absence of a feature is not
necessarily indicative of a systems quality.
It was also decided that the systems personalisation features would not be given ‘weights’ to indicate
and measure quality [35]. The reasoning for this, again, is that a more qualitative approach to
measuring quality would be developed.
The first iteration of the assessment criteria developed the idea of creating a ‘session’, or walkthrough
that would be used on each system to be assessed. As the sample session would be the same, a
consistent testing method would have been used for each assessment (and subsequent evaluation).
The session was intended to create random user activity, e.g. the user goes to the main page, searches
for a product, reads the product information, clicks a recommendation, etc.
However, this proved impractical for several reasons. Foremost, it was difficult to create a sampling
session (which would be the exact same) that could be used across a variety of different e-commerce
web-based systems. For example, the devised sample would have involved the user searching, then
clicking the top recommended item, then returning to the homepage to see if there was customised
content, etc. However, as the session progressed it would have been harder to recreate as the content
on the pages always differed, e.g. if different recommendations were retrieved, then different links
would be followed, etc. Likewise, if a product is not found in a search, then the session cannot be
replicated across different systems. This would have been an obvious problem if the systems
22
evaluated sold completely different products from one another. As such, it was decided that the final
iteration of the methodology would not compare the results of one system to another.
Also, it was difficult to describe the session recreated and the results at each stage in a critical way,
without it being tedious.
Obviously, however, assuming the role of a ‘user’ had to be done in order to assess the system’s
personalisation in regards to users, but it was decided at an early stage in the project that an exact
session of user activity would not be used across all the systems to be tested.
4.1.2 Selection of Systems to be Assessed
It was stated early in the project that the assessment would be made upon a variety of different e-
commerce systems. However, when adopting the aforementioned rapid prototyping approach to
testing the devised assessment criteria, it quickly became clear that most major e-commerce systems
all had the same, if not similar, personalisation features that operated on anonymous users. The only
difference was the content returned and recommended to users and changes in the presentation of this
content.
If a system has, therefore, been classified as a major e-commerce web-based system, then it would be
difficult to accurately and confidently suppose why its personalisation features in regards to
anonymous users are not as good as another businesses personalisation features. Any arguments made
would be baseless assumptions and would either be because one business wasn’t as big as another
business or the business has decided that they won’t implement these features. Also, a comparison
between the different systems would not provide more justification for the assessment criteria as the
assessment criteria has been designed and justified in the previous section, based on research and
background reading. It would demonstrate that the assessment criteria can be applied to different e-
commerce systems, but the point of the assessment criteria is to see the extent to which anonymous
users can be offered personalisation; not see how it differs between different systems.
The intention of the report was to see if anonymous users could be offered quality personalisation
and, therefore, the decision has been made to apply the assessment criteria to a well-regarded
successful major e-commerce that offers personalisation: Amazon [2]. The assessment criteria has
been applied to Amazon.co.uk.
This choice has been reflected upon, critically, in the evaluation of this report.
4.2 Methodology
A framework was developed that identified possible personalisation features into the following
categories. Thus, rather than looking for particular techniques (the implementation of which could
23
differ between different systems) it will look for the implementation between categories. These
categories will be discovered and assessed by browsing through the system, but not by using a
predefined session. These categories and why they are being assessed are described in detail below:
4.2.1 Control
Control refers to whether the user has command over the system; whether they can choose to
interrupt, continue/expand (e.g. “See more recommendations) or terminate the recommendation. This
has been distinguished as ‘implicit’ data, rather than ‘explicit’, as the user is not giving the user any
information that the system can interpret for personalisation uses. If the user were to comment on a
personalisation (e.g. “This Was/Wasn’t Useful”), then this would be “explicit”, as the system can
infer preference from this. Rather, the user is controlling the usability of the interface.
4.2.2 Content
Content refers to the accuracy of the recommendations that the system provides to the user. This has
been used as the primary indicator of the system’s quality and the quality will be measured by the
precision of the returned recommendations to the user.
Precision and accuracy, in a personalisation context, is typically measured by:
No of relevant recommendations/ no of recommendations retrieved
The relevance of an item is difficult to measure as it requires the justification and personal opinion of
the user of the system. Whether a recommended item is relevant differs between users to user. Thus
for the purposes of this assessment a clear and justified definition of ‘relevance’ must be given.
The relevance of a returned recommendation will be based on a comparison between the product
details of each item recommended and the original product.
Although relevance is typically measured with recall also, this is completely impractical to do so in
this context [16] as it would involve measuring whether each item is relevant in the entire catalogue of
a system.
4.2.3 Interface
This refers to the usability of the system and its integration of personalisation features into its
interface. One such example would be “Items you Recently Viewed”, “The Page you Made”, etc.
The above categories have therefore been used to develop the assessment of web-based systems, and
the assessments of the selected systems are as follows.
24
4.2.4Coverage/Scope
Coverage typically refers to how much of the system is being searched when recommendations are
being retrieved to users. Using Amazon.com for the assessment criteria allows the ‘novelty’ of an
item to be assessed, as each item in Amazon has a sales rank. This sales rank will be used to gauge the
popularity, and thus the ‘novelty’, of recommendations retrieved by the system.
The average of the sales ranks from the recommendations returned by the system will be taken in an
effort to attempt to measure the possible coverage of the system.
25
Chapter 5
Implementation of Assessment Criteria
5.1 Overview of the Testing Session
The system was tested with a cleared internet cache and all cookies from previous (prototyping)
sessions were cleared. The system assumed that an entirely new user was using it and this user had not
provided any personal preferences or information to the system. No explicit information (e.g.
preferences, ratings) were given to the system. The session only involved searching for products
(which isn’t explicit as it does not infer personal information or preference) and selecting
recommended products from the system.
5.2 Control
During the session, it was possible to alter and
modify the browsing history in order to refine
subsequent recommendations that the system was
going to offer based on the user’s past activity, or
to amend any possible mistakes the user may
make, e.g. so if the user clicks on the wrong item by mistake the personalisation of the system can
still be made to work in their favour.
5.3 Content
The calculations and results of the precision of the systems recommended items are as follows.
The first set of recommendations were provided after viewing only one item (a DVD). The precision
was:
14 (relevant items returned) / 43 (all items returned) = 0.325
The cache was cleared and a new session was created. A different item (software) was viewed and the
precision was:
16 (relevant items returned) / 50 (all items returned) = 0.32
This was repeated and another new item (sports backpack) was viewed. The precision was:
9 (relevant items returned) / 34 (all items returned) = 0.29
This was repeated one final time with another new item (jewellery):
Fig 5.1 Fig 5.2
26
2 (relevant items returned) / 2 (all items returned) = 1.0
The reason this process of finding the recommendations for just one item was repeated so many times
was to see if recommendations depended on the type of item looked at. It was assumed that more
popular items would retrieve more recommendations (after the user just looking at one item) and the
system did so after the user viewed the DVD and software (popular types of item). The accuracy was
fairly high for these items; especially considering the user had only viewed one item.
The jewellery was used as it was an item that few people shopped for (it had a high sales rank;
compared to the low, popular sales ranks of the DVD and software). Although it received a perfect
precision score, this was because it only retrieved two items.
The assumption was made after these early results that the precision of items may increase as the
browsing history of the anonymous user becomes more filled.
The second sets of recommendations were provided after viewing 5 different items in the same
session:
61 (relevant items returned) / 189 (all items returned) = 0.322
This was repeated again with 5 different items:
74 (relevant items returned) / 197 (all items returned) = 0.375
This was finally repeated with 5 different items, but items all of the same category (DVDs):
66 (relevant items returned) / 162 (all items returned) = 0.407
The average of the precision under these conditions had not changed much from the initial testing
with only one item viewed. The precision of the items retrieved was of a similar accuracy. For the
third session, items all from the same (popular) category was used in an effort to see whether the
system would fine more accurate recommendations if the items had been rated by many other users.
However, this only resulted in a small increase of accuracy.
Finally, precision of recommendations provided after a session of viewing 15 different items:
92 (relevant items returned) / 246 (all items returned) = 0.373
This was again, a similar figure to the past sessions. It seems fair to conclude that the accuracy of
recommendations returned to anonymous users would always be fairly low. Without the user
27
explicitly rating items/and or the recommendations of the system, then the system can only reach low
levels of accuracy.
However, although the accuracy did not increase the more the user built up a browsing session, the
accuracy is still impressive for a system that is operating on an anonymous user that it knows little
about.
5.4 Interface
The system was able to create a personalised page for the anonymous user with the ‘Page you Made’
option; as shown in Fig 5.3
The user is able to view all recent items as well as, as previously mentioned, edit them to their liking.
The recommendations are also available on this page (Fig 5.4) which the user can see more of it they
so choose.
The interface also personalises the main ‘home’ page. It features personalised links to
‘Customers with similar searches purchased’, ‘Recommended Items based on your
browsing history’ and ‘You recently looked at’ personalised links.
Thus, the interface is very intuitive and adaptable, even for anonymous users.
5.5 Coverage / Scope
The results of the coverage/scope of the system are included in Appendix E.
The average rank was 30541. Although this average has been affected by occurrences of
items with very high sales ranks, even a cursory glance at the returned data shows that
Amazon retrieved a mixture of both obvious and non-obvious items; suggesting that it was successful
in retrieving novel results for anonymous users.
5.6 Conclusion
In conclusion, therefore, Amazon was successful to an extent, when being assessed for its
personalisation techniques with regards to anonymous users.
Fig 5.3
Fig 5.4
28
It offered an intuitive and high usable and adaptable interface, offering control to the user, as well as
reasonable coverage of its catalogue of items when retrieving recommendations.
However, the quality of the system did not improve significantly even when the anonymous user
provided more information to the system. For example, the same quality of items (ranked by their
accuracy and relevance) were retrieved even when the user offered more inferred preferences (by
looking at more items). This suggests that collaborative filtering definitely has a limit in regards to the
quality it can offer to anonymous users.
29
Chapter 6
Feasibility of Web 2.0 in E-commerce
6.1 Justification for Proposal of Web 2.0 Features
The reason for considering Web 2.0 features to compliment traditional personalisation and
collaborative filtering techniques arose from a question posed by Schafer [29]; whether tagging can be
used, naturally, in conjunction with collaborative filtering methods to aid personalisation to users.
Tagging, like collaborative filtering, is used to add ‘content’ to already existing information to
potentially benefit other users.
It should be prefaced, though, that tagging is not a form of personalisation. As defined earlier in this
report, personalised content is content that differs from user to user based on the actions and inferred
preferences of that user. The following discussion of tagging, therefore, seeks to find the feasibility of
tagging as a means of replacing collaborative filtering in a effort to achieve the end state by a different
method. Collaborative filtering is used to present the user with content that has been recommended
and, it will be argued, that tagging does the same.
Collaborative filtering tools do not add new content to information, but they do add quality and
relevance to the information in regards to the user that the tools are operating for; by carefully
filtering and ordering information for the user in an automated way. Tagging can achieve similar
results, albeit, very differently.
When a user tags information, it is not just a keyword that is suggested to be associated with that
information; it is a form of collaborative filtering. Tags are often opinions and descriptions that would
not be found by typical content descriptors such as keywords. Rather, like traditional collaborative
filtering, tagging enables users to match other users with informed and relevant recommendations. For
example, if a user finds a particular tag useful, then they can use that tag to find other
recommendations that are relevant to them; this relevancy and quality is determined by the tag and the
effect that the tag has on the user.
Unlike collaborative filtering, the end-user does not need to comment on the quality of
recommendations in front of them to refine the quality of the personalisation that they can receive.
Rather they can choose their recommendations directly, but do so via the system.
30
Tagging involves the user directly and does not require the user to explicitly provide information to
the system; all the user needs to do is select a tag and they will receive other recommendations
relevant to that tag and thus relevant to their personal preferences.
Tagging, therefore, can be applied to the focus of this project as it is ideally suited to anonymous
users. Obviously, tagging requires ‘explicit’ information from users. However, anonymous users can
still enjoy the benefits of other user’s tags without providing information, such as preferences,
themselves. It is feasible, therefore, that the concept of tagging can be applied to e-commerce web-
based systems.
6.2 Impetus for E-commerce
‘User Created Content is now beginning to move mainstream’ [24]
Web 2.0 features, especially the likes of tagging, are typically affiliated with social networking sites
and blogs. However, the reasons for implementing them in an e-commerce domain can be clearly
identified.
Foremost, they are easy and cheap to implement. Unlike collaborative filtering which requires
extensive user datasets and highly refined and tuned algorithms, Web 2.0 boasts accessibility and
interactivity; hence its inclusion in the likes of user created blogs and wikis. Web 2.0 can be used by
anyone; from both a technical and end-user usability viewpoint (it could be argued that this is the
whole point of the 2.0 phenomenon).
The tasks for a e-commerce system can be easily complimented by Web 2.0 features; to improve its
core objective of selling products/services as well as encouraging and fostering social interaction of
the business [24] Product descriptions can be developed by the likes of embedded media in webpages
through to using wikis; each product having a separate wiki page that is not only more detailed that
the generic manufacturer product description, but also one that encourages user thoughts and opinions
[1]. The use of RSS feeds can provide users with instant access to an updated product from a more
mobile location. Likewise, the interaction between customer and service [24] can be improved by
enabling and encouraging user discussions [3] or tagging.
6.2.1 Case Study: Amazon.com
As previously noted in the assessment, Amazon.com has become the first major e-commerce to fully
adopt and embrace Web 2.0 and it shall be used an example of a successful system that has done so.
Amazons wiki, Amapedia [3] offers community-based interaction and sharing, as users of Amazon
can build and contribute to detailed product related wikis. Amazon refers to it as “collaborative
structured tagging” [3] as users must follow guidelines when creating the content.
31
Amazon has launched Askville [2] a social community where knowledge can be shared between users
as they help each other by asking and answering questions.
6.3 Technical Solution of Tagging Limitations in a Mock E-commerce System
A technical implementation of tagging has been developed in a mock e-commerce system that does
not involve moderation. This is found at http://www.staggerlee.co.uk/FYPROJ/home.html and is
described in this section.
The website was created in HTML and uses PHP and MYSQL. Foremost, the tag database was
created using PHP to connect to the MYSQL and run the following query as shown in Fig 6.1
This created the tag database. tag_id was the primary key and set to auto increment; each time a new
tag was entered into the database the tag_id value incremented by one to ensure that every tag had a
unique id associated with it. ‘name’ was the name of the tag that the user’s of the system entered into
the database themselves. ‘url’ was the product that the tag was associated with.
Fig 6.2 displays the code fragment used for the
user to insert a tag into the database; a empty
form field appears on the HTML page which
the user can enter tagging information into.
When the user clicks ‘submit’, this runs the
MYSQL query and inserts the ‘name’ (what the
user has just specified) into the database.
Figure 6.3shows how the tags are retrieved
from the database and displayed on the
HTML page for the user to see. The query
selects the tags from the database, and the
Fig 6.1
Fig 6.2
Fig 6.3
32
PHP while loop is used to store and manage the information that it is retrieving from the database, so
they can be retrieved and then produced and shown on the webpage one by one.
The tags are retrieved and displayed on the webpage as HTML links. When the user selects a link, the
unique tag_id from the database table is used as a variable to be passed into the next page (the next
page being whatever tag the user has clicked). This page can then retrieve all products tagged with
that tag by using the (passed) tag_id.
6.3.1 Identifiable Limitations to a E-commerce Tagging System
Like most tagging implementations, e-commerce will too suffer from the inherent problems
associated with tags. Web 2.0 explains these problems and limitations as stemming from the very
definition of user created content.
User created content (UCC) is typically created outside of the domain that it is being applied on [24].
It is non-professional content being applied to a professional context. This can have obvious
repercussions on the quality and trust worthiness of the content being added, therefore. The users
adding the tags are not being governed and are not restricted by a standardised metric. Therefore,
anything can be added to a tag. Therein lies both the benefits and limitations of tags; users can add
very specific descriptions that collaborative filtering will typically miss, and these very specific tags
can greatly benefit another use who thinks that this tag is relevant. However, users can also add tags
that are, essentially, useless. This is displayed in Fig 6.4.
Fig 6.4
Fig 6.5 is an example of tagging. Some users may find tags such as ‘police’ or ‘cyberspace terrorism’
useful because these are tags that are descriptive of the product that would typically not be found in
the content description of the product, and they enable users to find other items with these tags.
However, how many users are likely to find the tag ‘malfeasance’ practical and something that they
were looking for?
[24] offers a solution to this inherent problem by proposing several moderation possibilities; pre-
production moderation, post-production moderation and peer-based moderation.
Pre-production moderation involves all content that is submitted by a user being examined by a
moderator before it appears on the system. This is obviously labour intensive, as, especially for a
33
system like Amazon, every single tag has to be checked by a moderator. This would mean that a
standardised metric would have to be created for the moderator to check the tag against to see if it is
relevant or would have to use their own discretion to determine the relevance.
Post-production moderation means that content is posted immediately, but it can be changed or
deleted afterwards. This, again, suffers the same problems as above.
Peer-based moderation again means that content is posted immediately, but is it at the scrutiny of
other users of the system and if they think the tag is irrelevant, then it can be removed or changed.
Again, this would mean that a standard would have to be set for tags to adhere to (which defeats the
whole point of the uniqueness of tags) and then the users would have to be trusted enough to adhere to
this metric. Especially for a successful e-commerce business, this is empowering the users more than
the business feels comfortable with, and it also trusts that users won’t take an apathetic stance to
moderating the tags.
Thus, the above suggested solutions cannot be applied to an e-commerce domain.
6.4 Conclusion
Tagging is feasible from both a usability and technical perspective to be implemented into e-
commerce web-based systems. Tagging is easy to implement and easy to maintain, as demonstrated
by the technical mock-up website created. Users have become familiar with the concept and how it
works, so tagging is very user-friendly and accessible.
Although tagging would suffer from the ‘cold start’ problem (i.e. when it is first implemented into a
system there are no tags until users begin populating the system) collaborative filtering suffers from
this also, as it needs time and user interaction to build datasets that it can compare to one another.
However, tagging takes considerably longer to ‘get started’; collaborative filtering is an automated
process that the system does without the direct need of the user, whereas tagging depends entirely
upon users explicitly populating the database.
This problem is even more apparent when large e-commerce systems are considered. A system such
as Amazon.com has thousands upon thousands of different products. To make the most of the
potential of tagging, from a recommendation perspective, the majority of products would have to be
tagged.
Likewise, as mentioned previously tagging is typically associated with the likes of blogs and personal
webpages; not huge profit driven organisations. Although some are confident that tagging will be
adopted into the mainstream [24] it is doubtful that it will be done without some form of either
automatic moderation or standardised metric. To do either, however, diverts the nature of tagging.
34
Considering the above discussions, therefore, the feasibility of tagging being implemented
successfully into a e-commerce system is doubtful, as is the likelihood of it replacing traditional
collaborative filtering techniques.
35
Chapter 7
Evaluation
7.1 Establishing Criteria
To properly establish criteria to evaluate both the assessment developed to be used on personalisation
systems and the feasibility of using Web 2.0 as a means of replacing collaborative filtering the
following criteria, and the justification for using it, has been considered. The evaluation has been split
into two sections; evaluation of assessment scheme proposed by the project and evaluation of the
feasibility for Web 2.0 in an e-commerce context.
7.2 Evaluation of Devised Assessment Scheme
7.2.1 Assuming the Role of a Anonymous User
Assessing a system for its effectiveness upon anonymous users will inevitably mean that the
assessment will involve recreating the actions of an anonymous user in the system. As the definition
of anonymous user was considered and justified for the methodology of the project, the same
definition was applied to the assessment criteria.
Apart from clearing the cache information and the cookies from the browser used to assess the
system, there was little else that could be done to assume the role of an anonymous user. The
information passed into the system was always implicit, as justified and defined in the project’s
methodology.
The study could’ve have been perhaps extended with multiple anonymous users by perhaps
conducting a group testing session. However, as noted in the project’s methodology, it is not only
difficult to assign possible tasks to anonymous users, the system cannot identify these tasks and use
them to any advantage. It would have been similarly difficult to properly inform user’s of the
assessment scheme used in this methodology and it would have been difficult ensuring that this
scheme was followed.
As such, conducting a group session of anonymous users would have probably gained the same
results, although it would have enabled a qualitative assessment of data.
7.2.2 Sample of Systems Tested
36
Initially, it was decided that a sample of systems was to be tested. However, as the project progressed
and the methodology was justified and clearly defined, it became evident that this sample of different
systems was not needed.
However, the results of the assessment criteria cannot be applied to solving the problem indefinitely
as defined in the initial problem statement as only one e-commerce system was tested. Although a
justification for this was argued in the methodology, one assessment is not indicative of major e-
commerce businesses and their personalisation techniques.
7.2.3 Assessment of Personalisation – Interface and Control
The assessment of the system identified the personalisation techniques that affected the interface of
the system, but did not state a clear measurement of why or how these interface changes were helpful.
An informed usability criteria should have been devised to measure the features implemented, rather
than just identifying the features.
7.2.4 Assessment of Personalisation – Content
The assessment measured the recommendations provided by the system to the user; this was used as
indication of the quality of the collaborative filtering techniques employed by the system and was
measured by using precision techniques.
The background reading presented many different quality measuring criteria’s that have been used on
personalisation techniques; but they were only applicable on users who provided explicit information
to the system. For example, users who rated the recommendations that the system provided them with.
However, as anonymous users could only use implicit information, the relevance of the items returned
had to be determined without telling the system whether they were relevant or not.
The problem with this approach is obvious: relevance was defined by using matching keywords in the
original items description to keywords found in the recommended items descriptions. Preference and
user opinion was not (and could not) be used.
The quality of content cannot be measured scientifically; only the user can decide whether a
recommendation is good or not. However, an informed, relevant attempt to measure accuracy had to
be made and precision was the most viable assessment technique in these circumstances.
7.2.5 Assessment of Personalisation – Coverage
The scope of the system’s recommendations was measured by using Amazon’s sales rankings. The
average was calculated from the range of sales ranks and this was used to argue whether Amazon had
suitable scope and whether obvious or novel items were returned to the user.
37
Again, like content, whether an item is novel or not depends entirely on the user. However, the
method used in this study does prove indicative of whether a item is novel or not as it can be argued
that a obscure item will have a low sales rank and a very popular (obvious) item will have a high sales
rank. Thus, there is a correlation between the two.
However, using sales ranks can only be applied to Amazon.com, as it was found in the rapid
prototyping stage that most other major e-commerce businesses do not disclose that information.
Therefore, the assessment criteria devised cannot be applied to other systems to be assessed in this
respect and the assessment criteria does not offer another suitable alternative to u sing sales ranks.
7.3 Justification of Discussion of Web 2.0
The question of the feasibility of using Web 2.0 in major e-commerce systems was explicitly raised in
the literature [29] and this is why it was included in the body of the report. Likewise, as discussed
previously, Web 2.0 features can be found in current existing e-commerce systems and the
background reading suggested that the techniques and their implementation are increasingly becoming
‘mainstream’.
Unlike the assessment criteria, the sources and readings available that critically discussed Web 2.0
were scarce. It could be argued that although the discussion and conclusion of the feasibility of Web
2.0 was been critical in its identification and effectiveness of the tools and techniques, but as there
were few available reading resources used, it may not have been justified and informed.
7.4 Evaluation of Software Produced
The software produced will be evaluated according to its usability, technical feasibility and relevance
to the main discussion of Web 2.0.
7.4.1 Usability of Mock-up
From a usability perspective, the mock-up was successful as it followed the standard ‘guidelines’ of
implementing tagging. The user could easily submit their own tags and the tags of other products were
retrieved automatically. It was not, however, assessed against an established and informed usability
criteria.
Although moderating of tags were discussed and considered, the mock-up did not offer a working
solution to this inherent problem.
7.4.2 Technical Feasibility
38
The technical feasibility of the mock-up was discussed. It was argued that, as tagging involves PHP
and MYSQL, it was perfectly reasonable that a major e-commerce system could adopt it. However,
the mock-up did not take into consideration the size and scope of major e-commerce systems.
Unlike the mock-up a major e-commerce system would have thousands of different products, which
would be growing exponentially daily. This would mean that if tagging were implemented, the
tagging database would have to support this and, with potentially hundred of tags for one product, the
efficiency of retrieving tags could worsen over time. Likewise, although the conclusion of the
feasibility did identify the ‘cold-start’ problem, it did not identify the fact that users would have to
constantly be tagging as more and more new products were added to the system.
7.5 Evaluation of Project
As evidenced by the different project schedule iterations, the project suffered many different set-
backs. The most obvious one being a failure to properly commit to a devised assessment criteria and a
difficulty in quantifying a scheme that was justifiable and relevant to the problem and scope of the
project.
39
References
[1] www.Amapedia.amazon.com [Accessed April 2008]
[2] www.Amazon.com [Accessed April 2008]
[3] www.askville.amazon.com [Accessed April 2008]
[4] R Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. Harlow :
Addison-Wesley Longman, 1999.
[5] C Bell, Personalisation in E-Commerce, School of Computing, Leeds , 2004
[6] A Bergholz. Coping with sparsity in a recommender system. In WEBKDD 2002 : Mining
Web data for discovering usage patterns and profiles : 4th International workshop, pages
86–99, Edmonton, Canada, 2002.
[7] C Delong, P Desikan, and J Srivastava. User : User-sensitive expert recommendations.
In PKDD 2000 : Principles of Data Mining and Knowledge Discovery : 4th European
Conference, pages 77–95, Lyon, France, 2000.
[8] C Delong, P Desikan, and J Srivastava. User : User-sensitive expert recommendations.
In PKDD 2000 : Principles of Data Mining and Knowledge Discovery : 4th European
Conference, pages 77–95, Lyon, France, 2000.
[9] E Frias-Martinez and V Karamcheti. A customisable behaviour model for temporal prediction
of web user sequences. In WEBKDD 2002 : Mining Web data for discovering usage
patterns and profiles : 4th International workshop, pages 66–85, Edmonton, Canada, 2002.
[10] A Geyer-Schulz, M Hahsler, and M Jahn. A customer purchase incidence model applied
to recommender services. In WEBKDD 2001 : Mining web log data across all customers
touch points : Third International Workshop, pages 25–47, CA, USA, 2001.
[11] A Geyer-Schulz and M Hasler. Comparing two recommender algorithms with the help of
recommendations by peers. In WEBKDD 2002 : Mining Web data for discovering usage
40
patterns and profiles : 4th International workshop, pages 137–158, Edmonton, Canada,
2002.
[12] Ana Gil and Francisco Garcia. E-commerce recommenders: powerful tools for e-business.
Crossroads, 10(2):6–6, 2003.
[13] M Grcac, D Mladenic, B Fortuna, and M Grobelnik. Data sparsity issues in the collaborative
filtering framework. In PKDD 2000 : Principles of Data Mining and Knowledge
Discovery : 4th European Conference, pages 58–76, Lyon, France, 2000.
[14] Y Hafri, C Djerabo, P Stanchev, and B Bachimont. A markovian approach for web user
profiling and clustering. In Advances in knowledge discovery and data mining : 7th Pacific-
Asia Conference, PAKDD 2003, pages 191–202, Seoul, Korea, 2003.
[15] B Hay, G Wets, and K Vanhoof. Web usage mining by means of multidimensional sequence
alignment methods. In WEBKDD 2002 : Mining Web data for discovering usage patterns
and profiles : 4th International workshop, pages 50–65, Edmonton, Canada, 2002.
[16] J Herlocker. Evaluating Collaborative Filtering Recommender Systems. In ACM Transactions on
Information Systems, Vol 22. No.1, pages 5-53, New York, NY USA 2004. ACM
[17] J Hipp, U Guntzer, and G Nakhaeizadeh. Data mining of association rules and the process
of knowledge discovery in. In Advances in data mining: applications in E-commerce,
medicine, and knowledge management, pages 15–36. Springer, 2003.
[18] Young Kim. Impact of social influence in e-commerce descision making. In Proceedings
of the ninth international conference on Electronic Commerce, pages 293–302, MN, USA,
2007.
[19] Ron Kohavi. Mining e-commerce data: the good, the bad, and the ugly. In KDD ’01:
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery
and data mining, pages 8–13, New York, NY, USA, 2001. ACM.
[20] P Markellou, M Rigou, and S Sirmakessis. Mining for web personalisation. In Web Mining
: applications and techniques, pages 27–49. Hershey, PA : Idea Group Publishing, 2005.
[21] F Masseglia, P Poncelet, and M Teisseire. Web usage mining: How to efficiently manage
41
new transactions and new clients. In PKDD 2000 : Principles of Data Mining and
Knowledge Discovery : 4th European Conference, pages 530–535, Lyon, France, 2000.
[22] B Mobasher. Analysis and detection of segment-focused attacks against collaberative
recommendation. In PKDD 2000 : Principles of Data Mining and Knowledge Discovery :
4th European Conference, pages 96–118, Lyon, France, 2000.
[23] Bamshad Mobasher, Robert Cooley, and Jaideep Srivastava. Automatic personalization
based on web usage mining. Commun. ACM, 43(8):142–151, 2000.
[24] OECD, Participate Web and User-Created Content: Web 2.0, Wikis and Social Networking, 2007
[25] G Fiss P Perner. Intelligent e-marketing with web mining, personalisation and user-adapted
interfaces. In Advances in data mining: applications in E-commerce, medicine, and knowledge
management, pages 37–52. Springer, 2003.
[26] J Reidl Personalisation and Privacy in IEE Internet Computing, pages 29-31, 2001
[27] A Rosien and J Heer. Automatic categorisation of web pages and user clustering with
mixtures of hidden markov models. In WEBKDD 2002 : Mining Web data for discovering
usage patterns and profiles : 4th International workshop, pages 35–49, Edmonton, Canada, 2002.
[28] A Rosien and J Heer. Lumberjack : Intelligent discovery and analysis of web user traffic
composition. In WEBKDD 2002 : Mining Web data for discovering usage patterns and
profiles : 4th International workshop, pages 1–16, Edmonton, Canada, 2002.
[29] J Schafer. Collaborative Filtering Recommender Systems in The Adaptive Web, LNCS 4321.
Pages 291-324, 2007
[30] V Schickel-Zuber and B Faltings. Overcoming incomplete user models in recommendation
systems. In PKDD 2000 : Principles of Data Mining and Knowledge Discovery : 4th
European Conference, pages 39–57, Lyon, France, 2000.
[31] Antony Scime. Web mining : applications and techniques. Hershey PA, 2005.
[32] A Shahabi and F Banaei-Kashani. Framework for efficient and anonymous web usage
mining based on client-side tracking. In WEBKDD 2001 : Mining web log data across all
42
customers touch points, Third International Workshop, pages 113–144, CA, USA, 2001.
[33] Y Shen, Q Yang, Z Zhang, and H Lu. Mining the customer’s up-to-moment preferences
for e-commerce recommendation. In Advances in knowledge discovery and data mining :
7th Pacific-Asia Conference, PAKDD 2003, pages 165–177, Seoul, Korea, 2003.
[34] B Suryavanshi, S Nematollaah, and S Mudur. Adaptive web usage profiling. In PKDD
2000 : Principles of Data Mining and Knowledge Discovery : 4th European Conference,
pages 119–135, Lyon, France, 2000.
[35] Wu D. A Framework For Classifying Personalisation Scheme Used on e-Commerce Websites, in
Proceedings of the 36th Hawaii International Conference on System Sciences Vol 5 Issue 1. 2002.
[36] H Yang and S Parthasarathy. On the use of constrained associations for web log mining.
In WEBKDD 2002 : Mining Web data for discovering usage patterns and profiles : 4th
International workshop, pages 100–118, Edmonton, Canada, 2002.
43
Appendix A
Personal Reflection
This section is a personal reflection of the project experience and I hope that other students may it useful if they are producing a similar project to mine, or even just producing a project that is an evaluative one.
Perhaps the most obvious issue that comes to mind during the course of my project is the issue of producing a software implementation for an evaluative project. As my project was developing in the early stages I thought that the evaluative aspect of my project would suffice and I was quite hesitant to focus my study on producing a software implementation; having suffered problems in the past with modules which were solely based upon producing a piece of software.
I was therefore quite worried when my assessor suggested that I should consider a software implementation to support my project. I was concerned that the research and production of this software would take more time than my main focus of the study and did not think that I could produce software that was relevant or ‘new’ to the field of personalisation. However, on further reflection, a software implementation greatly supported my solution. As mentioned in the ‘Project Schedule’ of this report, I had great difficulty specifically stating what my assessment criteria was; both in terms of what I would be doing to evaluate the personalisation systems as well as how many systems I would need to evaluate to justify a project such as this. I spent too much time trying to build a ‘project-worthy’ assessment criteria rather than thinking of other directions that my project could take. Eventually, I realised that a discussion of Web 2.0 was not only relevant to my project solution, but could also be used to produce a relevant software implementation. I would advice future students, therefore, to consider producing some sort of piece of software not only because it is generally expected of students studying computing, but also because it will lend greater weight to arguments that have been proposed or hypothesised in a report if they can be seen in a working software example.
I would also advice students that if they are producing a project that deals with a field that has had a great amount of research already (as I have done with regards to personalisation) then they need to find a legitimate and concise reason as to why their doing the project in this field; in terms of identifying a problem that is yet to be solved and producing a solution that is appropriate. Likewise with projects that are solely (or the main focus is) assessing something that already exists. During my project experience, although I thought that the problem that I identified was worthy of both research and evaluation, I found it difficult to produce a proper assessment criteria as there had been a great deal of work committed towards evaluating personalisation with registered users and ‘explicit’ data, but little assessment of anonymous users and ‘implicit’ data. This made it harder for me to not only build an assessment criteria but also difficult to justify the assessment criteria that I had developed.
A more general piece of advice that all students can take heed of is not to underestimate the time it takes to write it up a project such as this. During my project I made notes on each section (Methodology, Evaluation, etc) and thought that I would quickly and easily be able to expand these notes into the full written up sections. However, it takes considerable time and effort to construct readable and coherent sentences from snippets of notes. Especially when the notes have been made several months prior to the write up and the context is muddled!
As with all projects (whether they are worth 10 or 40 credits) time management is absolutely essential.
44
45
46
47
Appendix E
Item No Sales Rank
Item 1 408
Item 2 866
Item 3 2546
Item 4 2260
Item 5 3268
Item 6 5054
Item 7 9391
Item 8 13222
Item 9 5064
Item 10 11264
Item 11 1299
Item 12 188731
Item 13 288
Item 14 7381
Item 15 184
Item 16 165
Item 17 20031
Item 18 1202
Item 19 4104
Item 20 492010
Item 21 574
Item 22 2193
Item 23 3281
Item 24 31890
Item 25 22282
Item 26 9467
Item 27 871
Item 28 2595
Item 29 2213
Item 30 3703
Item 31 30148
Item 32 10679
48
Item 33 12896
Item 34 230937
Item 35 1195
Item 36 31118
Item 37 23235
Item 38 2626
Item 39 475
1 191 116 / 39 = 30541