32
A comparison of usability evaluation methods for evaluating e-commerce websites Layla Hasan a *, Anne Morris b and Steve Probets b a Department of Management Information Systems, Zarqa University, Zarqa, Jordan; b Department of Information Science, Loughborough University, Loughborough LE11 3TU, UK (Received 7 October 2009; final version received 6 June 2011) The importance of evaluating the usability of e-commerce websites is well recognised. User testing and heuristic evaluation methods are commonly used to evaluate the usability of such sites, but just how effective are these for identifying specific problems? This article describes an evaluation of these methods by comparing the number, severity and type of usability problems identified by each one. The cost of employing these methods is also considered. The findings highlight the number and severity level of 44 specific usability problem areas which were uniquely identified by either user testing or heuristic evaluation methods, common problems that were identified by both methods, and problems that were missed by each method. The results show that user testing uniquely identified major problems related to four specific areas and minor problems related to one area. Conversely, the heuristic evaluation uniquely identified minor problems in eight specific areas and major problems in three areas. Keywords: comparison; usability; evaluation methods; user testing; heuristic evaluation; e-commerce websites 1. Introduction Usability is one of the most important characteristics of any user interface and measures how easy the interface is to use (Nielsen 2003). Usability has been defined as ‘the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use’ (ISO 9241–11 1998). A variety of usability evaluation methods (UEMs) have been developed to evaluate human interaction with a product; these are aimed at identifying issues or areas of improvement of the interaction in order to increase usability (Gray and Salzman 1998). UEMs have been categorised differently by differ- ent authors. For example, Nielsen and Mack (1994) classified UEMs into four general categories: auto- matic, empirical, formal and informal, while Gray and Salzman (1998) used two categories to describe methods: analytic and empirical. Another approach is to categorise them by how the usability problems are identified, for example by users, evaluators or tools: . User-based UEMs: This category includes a set of methods that involves users. These methods aim to record users’ performance while interacting with an interface and/or users’ preferences or satisfaction with the interface being tested. . Evaluator-based UEMs: This category includes usability methods that involve evaluators in the process of identifying usability problems. These methods were called usability inspection methods (Nielsen and Mack 1994). Examples of common usability methods related to this category are: heuristic evaluation, pluralistic walkthroughs and consistency inspections. . Tool-based UEMs: This category involves software tools and models in the process of identifying usability problems. The software tools automati- cally collect statistics regarding the detailed use of an interface (for example web analytics) and the models provide measurements of user perfor- mance without actually involving users (for example Goals, Operators, Methods and Selection Rules (GOMS)) (Preece et al. 2002). User-based and evaluator-based approaches have been frequently used to evaluate the usability of websites, including e-commerce websites (Kantner and Rosenbaum 1997, Barnard and Wesson 2003, 2004, Freeman and Hyland 2003, Chen and Macredie 2005). User testing involves observing a number of users performing a pre-defined list of tasks to identify the usability problems they encounter during their interaction (Brinck et al. 2001), while heuristic evalua- tion involves having a number of expert evaluators assess the user interface and judge whether it conforms to a set of usability principles (namely ‘heuristic’) (Nielsen and Mack 1994). *Corresponding author. Email: [email protected] Behaviour & Information Technology Vol. 31, No. 7, July 2012, 707–737 ISSN 0144-929X print/ISSN 1362-3001 online Ó 2012 Taylor & Francis http://dx.doi.org/10.1080/0144929X.2011.596996 http://www.tandfonline.com

Evaluating E-commerce Websites 2012

Embed Size (px)

DESCRIPTION

methods and research on evaluating e-commerce experiences. A lit review and recommendations

Citation preview

Page 1: Evaluating E-commerce Websites 2012

A comparison of usability evaluation methods for evaluating e-commerce websites

Layla Hasana*, Anne Morrisb and Steve Probetsb

aDepartment of Management Information Systems, Zarqa University, Zarqa, Jordan; bDepartment of Information Science,Loughborough University, Loughborough LE11 3TU, UK

(Received 7 October 2009; final version received 6 June 2011)

The importance of evaluating the usability of e-commerce websites is well recognised. User testing and heuristicevaluation methods are commonly used to evaluate the usability of such sites, but just how effective are these foridentifying specific problems? This article describes an evaluation of these methods by comparing the number,severity and type of usability problems identified by each one. The cost of employing these methods is alsoconsidered. The findings highlight the number and severity level of 44 specific usability problem areas which wereuniquely identified by either user testing or heuristic evaluation methods, common problems that were identified byboth methods, and problems that were missed by each method. The results show that user testing uniquely identifiedmajor problems related to four specific areas and minor problems related to one area. Conversely, the heuristicevaluation uniquely identified minor problems in eight specific areas and major problems in three areas.

Keywords: comparison; usability; evaluation methods; user testing; heuristic evaluation; e-commerce websites

1. Introduction

Usability is one of the most important characteristicsof any user interface and measures how easy theinterface is to use (Nielsen 2003). Usability has beendefined as ‘the extent to which a product can be usedby specified users to achieve specified goals witheffectiveness, efficiency and satisfaction in a specifiedcontext of use’ (ISO 9241–11 1998). A variety ofusability evaluation methods (UEMs) have beendeveloped to evaluate human interaction with aproduct; these are aimed at identifying issues or areasof improvement of the interaction in order to increaseusability (Gray and Salzman 1998).

UEMs have been categorised differently by differ-ent authors. For example, Nielsen and Mack (1994)classified UEMs into four general categories: auto-matic, empirical, formal and informal, while Gray andSalzman (1998) used two categories to describemethods: analytic and empirical. Another approachis to categorise them by how the usability problems areidentified, for example by users, evaluators or tools:

. User-based UEMs: This category includes a set ofmethods that involves users. These methods aimto record users’ performance while interactingwith an interface and/or users’ preferences orsatisfaction with the interface being tested.

. Evaluator-based UEMs: This category includesusability methods that involve evaluators in the

process of identifying usability problems. Thesemethods were called usability inspection methods(Nielsen and Mack 1994). Examples of commonusability methods related to this category are:heuristic evaluation, pluralistic walkthroughsand consistency inspections.

. Tool-based UEMs: This category involves softwaretools and models in the process of identifyingusability problems. The software tools automati-cally collect statistics regarding the detailed use ofan interface (for example web analytics) and themodels provide measurements of user perfor-mance without actually involving users (forexample Goals, Operators, Methods and SelectionRules (GOMS)) (Preece et al. 2002).

User-based and evaluator-based approaches havebeen frequently used to evaluate the usability ofwebsites, including e-commerce websites (Kantnerand Rosenbaum 1997, Barnard and Wesson 2003,2004, Freeman and Hyland 2003, Chen and Macredie2005). User testing involves observing a number ofusers performing a pre-defined list of tasks to identifythe usability problems they encounter during theirinteraction (Brinck et al. 2001), while heuristic evalua-tion involves having a number of expert evaluatorsassess the user interface and judge whether it conformsto a set of usability principles (namely ‘heuristic’)(Nielsen and Mack 1994).

*Corresponding author. Email: [email protected]

Behaviour & Information Technology

Vol. 31, No. 7, July 2012, 707–737

ISSN 0144-929X print/ISSN 1362-3001 online

� 2012 Taylor & Francis

http://dx.doi.org/10.1080/0144929X.2011.596996

http://www.tandfonline.com

Page 2: Evaluating E-commerce Websites 2012

The research presented in this article forms part ofa wider investigation into user-based, evaluator-basedand tool-based methods for evaluating the usability ofe-commerce sites. The overall intention is to develop amethodological framework outlining how each of thesemethods could be used in the most effective manner.However, for the purposes of this article only thecomparison between user- and evaluator-based meth-ods will be discussed.

Previous researchers have compared different typesof UEMs. Various studies have, for example, com-pared the effectiveness of user testing and heuristicevaluation methods in evaluating different types ofuser interfaces such as commercial web sites (Tan et al.2009), a hotel website (Molich and Dumas 2008),a web-based software program (Fu et al. 2002), auniversal brokerage platform (Law and Hvannberg2002), software user interfaces (Simeral and Branaghan1997), 3D educational software and a 3D map (Bachand Scapin 2010), an office application’s drawingeditor (Cockton and Woolrych 2001), a novel informa-tion retrieval interface (Doubleday et al. 1997), and aninteractive telephone-based interface (Desurvire et al.1991). Additionally, other researchers have comparedthe effectiveness of three or four usability methodswhich have included user testing and heuristic evalua-tion methods. Desurvire et al. (1992a,b), for example,compared heuristic evaluation and the cognitivewalkthrough evaluation methods when testing atelephone-based interface, Jeffries et al. (1991) evalu-ated a user interface (UI) for a software product usingfour different techniques (heuristic evaluation, soft-ware guidelines, cognitive walkthroughs and usabilitytesting), and Nielsen and Phillips (1993) comparedusability testing to heuristic evaluation and formalGOMS when evaluating two user interface designs.

These studies have provided useful findings regard-ing which of these two approaches was more effectivein identifying the largest number of usability problemsand which cost the least to employ. A few studiesprovided some examples of the usability problemsidentified by these methods. However, previous re-search offers little detail about the benefits anddrawbacks of each method with respect to theidentification of specific types of problems. Theresearch described here aims to address this gap andprovides specific benefits to e-commerce vendors ina developing country (Jordan) where the researchhas been undertaken. Jordan, like other developingcountries, faces significant challenges that affect thedevelopment and diffusion of e-commerce in thiscountry. Examples of these challenges include lack ofpayment systems, lack of trust, the high cost ofpersonal computers (PCs), the high cost of connectingto the Internet, cultural resistance and an absence of

legislation and regulations that govern e-commercetransactions (Obeidat 2001, Sahawneh 2002).

An awareness of the type of usability problems thatcan be identified by user testing and heuristic evalua-tion methods would encourage e-commerce companiesto employ UEMs in order to improve the usabilityof their websites. It would also aid e-commercecompanies in taking appropriate decisions regardingwhich usability method to apply and how to applyit in order to improve part or the overall usability oftheir e-commerce website. Therefore, this would helpe-commerce companies in Jordan to survive and growin the challenging environment.

This article reviews related work, presents theresearch aims and objectives and describes the methodsused. Additionally, further sections outline the mainresults, discuss the results in light of the literaturereview and present some conclusions.

2. Related work

There has been a lack of research on the attributes(standard criteria) on which UEMs can be evaluatedand compared. However, the work of Hartson et al.(2001) gives a general overview of attributes and UEMperformance measures that can be used in studies toevaluate and compare UEMs. The criteria or UEMperformance measures include: thoroughness (theproportion of real problems found on an interface),validity, reliability, effectiveness, cost effectiveness anddownstream utility. Hartson et al. (2001) referred to 18comparative UEM studies and found that 14 only used‘thoroughness’ as a criterion for comparison and thatthe remaining four studies also used ‘validity’ as acriterion. Most of the former computed the ‘thorough-ness’ criterion using a raw count of the usabilityproblems encountered by the UEMs rather than acount of the real problems encountered by real users inreal work context.

Later, Blandford et al. (2008) reported a compar-ison of eight analytical UEMs, including heuristictesting, when evaluating a robotic arm interface. Thefindings were compared against empirical data (videodata) that was available for the interface. The focus ofthe study was on the scope (content) of the methodsrather than any other criterion such as ‘thoroughness’.Five categories of usability issues were identified:system design, user misconceptions, conceptual fitbetween user and system, physical issues and con-textual ones. The results showed that all the analyticalmethods, except heuristic evaluation, identified usabil-ity issues that belonged to only one or two of theidentified categories. In this research, the findings ofearlier studies comparing user testing and heuristicevaluation methods were grouped under three

708 L. Hasan et al.

Page 3: Evaluating E-commerce Websites 2012

headings: the number of usability problems found, thecost of employing each method and the types ofusability problems found.

2.1. Number of usability problems

Most of the studies, which compared user testing andheuristic evaluation methods, found that the latteruniquely identified a larger number of usabilityproblems compared to the former (Desurvire et al.1992a,b, Doubleday et al. 1997, Fu et al. 2002).Conversely, the results of the study conducted by Bachand Scapin (2010), which compared the effectiveness ofuser testing, document-based inspection and expertinspection methods, showed that despite the fact thatdocument-based inspection method allowed the iden-tification of a larger diversity of usability problems anda slightly higher proportion of usability problemscompared to user testing and expert inspection, therewas no significant difference between the effectivenessof the document-based inspection and user testingmethods. Despite differences in results, none of theabove studies specified the distribution of usabilityproblems identified by user testing and heuristicevaluation methods in terms of their seriousness, i.e.major and minor problems. The studies that haveinvestigated this issue appear to have found conflictingresults. Jeffries et al. (1991) found that heuristicevaluation identified a larger number of serious andminor problems in comparison to user testing,although the design of the study was criticised byGray and Salzman (1998) for having too few partici-pants. By contrast, Law and Hvannberg (2002), whocompared the effectiveness of usability testing andheuristic evaluation, found that the latter identifieda large number of minor usability problems comparedto user testing. However, user testing was better atuniquely identifying major problems. Tan et al. (2009)also compared the efficiency and effectiveness ofuser testing and heuristic evaluation but provideddifferent results regarding the severity level of pro-blems identified by these methods. Three severity levelswere used (severe, medium and mild). The resultsshowed that the two methods identified similarrespective proportions of usability problems of thesevere, medium and mild types.

Some studies claim that heuristic evaluation mis-identifies some usability problems by identifyingissues that, if implemented/corrected in the evaluateddesign, would not improve its usability (Jeffries andDesurvire 1992, Simeral and Branaghan 1997, Bailey2001). These issues have been called false alarms orfalse positives. Producing false alarms is regarded bySimeral and Branaghan (1997) as one of the weak-nesses of heuristic evaluation and Cockton and

Woolrych (2001) argue that such false positives shouldnot be addressed through the redesign of systems.

Law and Hvannberg (2002) tried to find evidenceregarding the claim of false alarms being made inheuristic evaluation. In their study, they raised ques-tions regarding whether the minor problems that werenot confirmed by user testing represented false alarmsor whether the participants were just unable to identifythem. However, the researchers could not confirm orconclude that the heuristic evaluation method pro-duced false alarms or misidentified usability problems.

The findings of Molich and Dumas (2008) were incontrast to all the studies reviewed above; they foundno empirical difference between the results obtainedfrom usability testing and expert reviews in terms ofthe number and seriousness of problems identified.Expert reviewers reported a few more serious or criticalusability issues than usability testers although theexperts reported fewer minor problems. However, thedesign of this study is worth considering as this mighthave played a role in the results that were achieved.Molich and Dumas (2008) carried out a comparativeusability evaluation using 17 teams of experiencedusability specialists who independently evaluated theusability of a hotel website. Nine of the 17 teamsemployed usability testing and eight employed expertreviews. Each of the 17 teams received a test scenariowhich specifically included three tasks and four areasthat were most important to consider in the evaluation.Also, each team was asked to report a maximum of50 usability comments using a standard reportingformat with a specific classification of problemcategories (i.e. minor problem, serious problem,critical problem) to classify problems found by eachteam. Therefore, these issues might have limited thenumber of problems identified by the heuristic evalua-tion teams as they concentrated on specific issues andareas on the tested site. Also, the limited number ofcomments requested from each team might have madethem cautious and reticent about producing a largenumber of comments.

Hornbaek (2010) indicated that although problemcounting is a common approach and used byresearchers to compare UEMs, it has several limita-tions. For example, this approach does not differenti-ate between potential and real usability problems;different kinds of problems (for example with regard togenerality, type, aspects of user interface covered orclarity) are given equal weight when counted; finding alarge number of usability problems is not necessarilyan indication of the quality of a UEM as this coulddepend on their severity and whether the softwarecan be changed to remove them. Therefore, Hornbaek(2010) provided three suggestions in order to extendthe comparisons of UEMs beyond the problem

Behaviour & Information Technology 709

Page 4: Evaluating E-commerce Websites 2012

counting approach: combining problem counting withsome form of analysis of the type of problem;identifying methods that must help identify aspects ofdesign that can be improved and that enable evaluatorsto suggest solutions and how they can be achieved, andobtaining evaluators’ comments on and satisfactionwith the UEMs under evaluation.

Cockton and Woolrych (2001) also criticised theassessment of usability inspection methods (e.g. theheuristic evaluation method) that focused only oncalculating simple percentages of usability problemsthat they identified. They conducted an evaluation ofthe heuristic evaluation method by comparing predic-tions of usability problems identified by 99 analystswith actual problems identified by user testing. Toassess the effectiveness of heuristic evaluation, theresearchers used advanced analysis which classifiedproblems into three types; by impact (severe, nuisanceand minor); frequency (high, medium and low) and bythe efforts required to discover the problems (percei-vable, actionable and constructable (i.e. problems thatrequired several interaction steps to be discovered)).The results showed that heuristic evaluation missed alarge number of severe problems and problems thatoccurred very frequently. The results also showed thatthe heuristic evaluation missed relatively more con-structable problems (80%) than were successfullyidentified (7%). Furthermore, the results showed that65% of the problem predictions by the heuristicevaluators were false-alarms where the users did notconsider them to be problems.

2.2. Cost of employing usability problems

A few studies have compared the cost of undertakinguser testing and heuristic evaluation. The findingsshowed that, in general, heuristic evaluation was lesscostly (in terms of design and analysis) in comparisonto user testing methods (Nielsen and Phillips 1993,Doubleday et al. 1997, Simeral and Branaghan 1997,Law and Hvannberg 2002). However, these studiesdid not mention the cost of correcting usabilityproblems that might be undertaken after conductinguser testing or heuristic evaluation methods. This issuewas discussed by Jeffries and Desurvire (1992). Theyindicated that heuristic evaluation had high costs thatoccur after the evaluation since heuristic evaluationoften identified a large number of problems, most ofthem minor.

2.3. Content of usability problems

Earlier studies have been found in the literature thatdescribe and compare the type of usability problemsidentified by user testing and heuristic evaluation

methods. The studies that were found varied in theirdescription of these problems; some were general andothers were specific and detailed. Research thatdescribed usability problems in general terms foundthat the user testing method identified more problemsrelated to user performance (Simeral and Branaghan1997), whereas heuristic evaluation found more pro-blems related to interface features (Desurvire et al.1991, Nielsen 1992, Nielsen and Phillips 1993, Double-day et al. 1997, Nielsen 2003). Problems related tointerface quality were not identified in user testing(Simeral and Branaghan 1997).

Other studies provided more detail regarding thecharacteristics of usability problems identified by usertesting and heuristic evaluation methods (Doubledayet al. 1997, Fu et al. 2002, Law and Hvannberg 2002).These studies showed that the user testing method wasmore effective in picking up usability problems relatedto a lack of clear feedback and poor help facilities(Doubleday et al. 1997, Fu et al. 2002). User studieswere also helpful in identifying functionality andlearnability problems (Doubleday et al. 1997, Fuet al. 2002, Law and Hvannberg 2002) as well as thoseconcerned with navigation and excessive use ofcomplex terminology (technical jargon) (Law andHvannberg 2002). In contrast, these studies alsoshowed that the heuristic evaluation method wasmore effective in identifying problems related to theappearance or layout of an interface (i.e. the use offlash graphics that distract the attention), inconsistencyproblems with the interface and slow response time ofthe interface to display results (Doubleday et al. 1997,Fu et al. 2002, Bach and Scapin 2010). An example ofone such study is that undertaken by Bach and Scapin(2010). They evaluated 3D applications and identified35 classes describing the profile of usability problemsidentified by user testing, document-based inspectionand expert inspection methods. They found that usertesting efficiently identified problems that required aparticular state of interaction to be detectable (how toachieve a task, for example) whereas the document-based inspection method used was better at identifyingdirectly observable problems especially those related tolearning and basic usability (i.e. the efficiency of certaincommands or functions and their utility). On the otherhand, the expert inspection method used was moreefficient in identifying problems related to consistencyissues.

Only a few studies, however, have highlighted thetypes of specific usability problems identified by usertesting and heuristic evaluation methods while evalu-ating websites. One such study by Mariage andVanderdonckt (2000), evaluated an electronic news-paper. Mariage and Vanderdonckt (2000)’s studyreported examples of the usability problems that

710 L. Hasan et al.

Page 5: Evaluating E-commerce Websites 2012

were uniquely identified by user testing and missed byheuristic evaluation. Examples include inappropriatechoice of font size, the use of an inappropriate formatfor links, and consistency problems. Mariage andVanderdonckt (2000)’s study also reported problemsthat were identified by heuristic evaluation andconfirmed by user testing, such as: home page layoutthat was regarded as being too long; navigationproblems that were related to the use of images andbuttons that lacked clarity so that users did not knowwhich were clickable; and a lack of navigationalsupport. However, Mariage and Vanderdonckt(2000)’s study did not report examples related to theusability problems that were uniquely identified byheuristic evaluation and missed by user testing.

Tan et al. (2009), who compared user testing andheuristic evaluation by evaluating four commercialwebsites, also classified usability problems by theirtypes. They identified seven categories of problems andclassified the usability problems identified by the usertesting and heuristic evaluation methods with regard tothese seven categories. The categories included: navi-gation, compatibility, information content, layoutorganisation and structure, usability and availabilityof tools, common look and feel, and security andprivacy. The results showed that the user testing andheuristic evaluation methods were equally effective inidentifying the different usability problems related tothe seven categories with the exception of two:compatibility, and security and privacy issues. Theuser testing did not identify these two issues. Theresearchers also found that heuristic evaluation identi-fied more problems in these seven categories comparedto user testing. However, Tan et al. (2009)’s study didnot provide details regarding specific types of problemsrelated to the seven categories.

It is worth mentioning that the reliability ofusability methods has been discussed with respect tothe ‘evaluator effect’. This occurs when multipleevaluators evaluating the same interface using thesame usability method identify different types ofusability problems (Hertzum and Jacobsen 2001).The available research shows that the evaluator effectexists in both formal and informal UEMs, includinguser testing and heuristic evaluation. For example,Hertzum and Jacobsen (2001) reviewed 11 studieswhich used 3 UEMs: think-aloud, cognitive walk-through and heuristic evaluation methods. They foundthat the evaluator effect existed with respect todetection of usability problems as well as assessmentof problem severity regardless of: the experience ofevaluators (novice and experiences evaluators), theseverity of usability problems (minor and major/severeproblems) and the complexity of systems (simple andcomplex systems).

The literature outlined above indicates that therehas been a lack of research that compared issuesidentified by user testing and heuristic evaluationmethods when specifically evaluating e-commercewebsites in order to investigate detailed types ofspecific usability problems that could be uniquelyidentified, missed or commonly identified by thesemethods.

3. Aims and objectives

The aim of this article was to compare and contrastuser testing and heuristic evaluation methods foridentifying specific usability problems on e-commercewebsites. The objectives for this research were:

. To identify the main usability problem areas ine-commerce websites using user-based evaluationmethods;

. To identify the main usability problem areas ine-commerce websites using evaluator-based heur-istic method;

. To determine which methods were best forevaluating each usability problem area.

4. Methodology

This article is based on a multiple-case study (com-parative design) where the same experiments wereemployed in each case. Twenty-seven e-commercecompanies in Jordan were identified from five electro-nic Jordanian and Arab directories and a Googlesearch. These companies were contacted and three ofthem agreed to participate. Company 1 specialises inmanufacturing Islamic clothing for women. It wasfounded in Jordan in 1987 although its online outlethas only been in operation since 2000. Company 2 alsospecialises in manufacturing Islamic clothing forwomen and is based in Jordan. Its online shoppingstore was launched in 2004. Launched in 2005,Company 3 provides an online store that sells a widevariety of crafts and handmade products made inJordan and Palestine by individuals and organisations.The URLs for the chosen sites were not included forconfidentiality reasons.

The method compared the usability problemsidentified by user testing with a heuristic evaluationof the sites by experts. Hence, the intention was toemploy the same methods (i.e. the same usabilitytesting sessions involving the same users and the sameheuristic evaluators) in each case. This number of caseswas considered appropriate within the time andresources available for the research. The customers ofthese websites were both local and world-wide. Thisstudy focused on investigating the usability of these

Behaviour & Information Technology 711

Page 6: Evaluating E-commerce Websites 2012

websites from the point view of local (Jordanian)customers.

In order to employ the user testing method, a taskscenario was developed for each of the three websites(see Appendix 1). This involved some typical taskssuggested for e-commerce websites in earlier studies(e.g. Brinck et al. 2001, Kuniavsky 2003) such asfinding products (tasks 1 and 4); finding information(tasks 8 and 9); purchasing products (tasks 2 and 4);using the site’s search engine (tasks 6 and 10); anddealing with complaints (task 7). Tasks 3 and 5 wereincluded to reflect the need to offer facilities forcustomers to change their orders and/or user profile.Both Brinck et al. (2001) and Kuniavsky (2003)suggested avoiding the use of terms in the tasks thatmatched the screen terms so, rather than asking adirect question about the contact details of thee-commerce company, other questions were used aspart of the task.

Researchers vary with respect to the number ofusers they recommend as optimum for user testing. Forexample, Brinck et al. (2001) suggested recruiting 8–10users, Rubin (1994) also suggested the use of at least8 participants but Nielsen (2006) recommended having20 users. In this research, 20 users were recruited.

To identify typical users for user testing, an emailwas sent to each of the studied companies asking themto provide information about their current andprospective users (such as demographic information,experience using the computer and experience usingthe Internet). Based on the answers from the threecompanies, a matrix of users’ characteristics wasdesigned. Two sources were used to recruit theparticipants for the user testing including an adver-tisement and an email broadcast. A screening ques-tionnaire was developed and then used to select 20suitable participants from the respondents in order tomatch the matrix of the users’ characteristics as closelyas possible. The two characteristics that were matchedwere gender and experience of using the Internet.Other characteristics were not totally matched becauseof the lack of availability of suitable participants.

Data were gathered from each user testing sessionusing screen capture software (Camtasia), with threepost-test questionnaires (one post-test questionnairewas given to each user after completing the tasks foreach site to get his/her feedback). Observation of theusers working through the tasks, in addition to takingcomments from the users while interacting with eachsite, was also carried out. The time for each task wasdetermined beforehand and checked throughout thepilot study. It was estimated that the user testingsession would take 3 h. Each participant was given30 min to evaluate each site. For each session, theorder of the three websites that were evaluated was

counterbalanced to avoid bias. It is worth mentioningthat participants stopped the evaluation before enter-ing personal details. This represents a non-typicalsituation and that the absence of risk may affect useractions.

In addition, a set of comprehensive heuristics,specific to e-commerce websites, was developed basedon an extensive review of the literature. The developedheuristics were organised into five major categories:architecture and navigation, content, accessibility andcustomer service, design, and purchasing process.Table 1 displays the categories and the subcategoriesof the developed heuristics. Appendix 2 displays thecategories, the subcategories and the references ofthe developed heuristics, while Appendix 3 displays thedeveloped heuristics and their explanations.

In this research, there were both time and resourcelimitations regarding recruiting ideal evaluators withdouble specialisms, i.e. who have experience in bothusability issues and e-commerce sites, to perform theheuristic evaluation as suggested by earlier research(Nielsen 1992). At the present time, the Human–Computer Interaction (HCI) field is new to Jordan asan area for study in universities and therefore it isunusual to find people with graduate degrees in thisarea. The target experts, therefore, were people whohad extensive design experience in e-commerce web-sites, as suggested by Stone et al. (2005) in cases whenit was impossible to find experts with the ideal

Table 1. Categories and subcategories of the developedheuristics.

Heuristiccategories Heuristic subcategories

Architecture andnavigation

Consistency; navigation support;internal search; working links;resourceful links; no orphan pages;logical structure of site; simplenavigation menu

Content Up-to-date information; relevantinformation; accurate information;grammatical accuracy; informationabout the company; informationabout the products

Accessibility andcustomerservice

Easy to find and access website; contactus information; help/customerservice; compatibility; foreignlanguage and currency support

Design Aesthetic design; appropriate use ofimages; appropriate choice of fontsand colours; appropriate page design

Purchasingprocess

Easy order process; orderinginformation; delivery information;order/delivery status provision;alternative methods of ordering/payment/delivery are available;reasonable confidence in security andprivacy

712 L. Hasan et al.

Page 7: Evaluating E-commerce Websites 2012

experience. Extensive experience in this research wasdefined as more than 10 years.

An investigation into companies in Jordan usingelectronic Jordanian and Arab directories and aGoogle search resulted in identifying 17 companieswhich were developing and designing e-commerce sites.All these companies were contacted by email askingthem to recommend a web expert who had at least 10years’ experience in designing e-commerce sites. Onlyfive companies agreed and therefore five web expertsparticipated in this research as heuristic evaluators.The recommended number of evaluators who shouldparticipate in a heuristic evaluation is between threeand five (Nielsen and Mack 1994, Pickard 2007). Grayand Salzman’s (1998) work indicates that up to 10participants may be required to compute inferentialstatistics, but as the purpose of this study wasprimarily to get an indication of the relative meritsof the two approaches, and as there was a lack ofevaluators who are usability specialists in Jordan, fiveevaluators were considered appropriate.

Each of the five web experts evaluated the three e-commerce websites thoroughly in three differentsessions. The heuristic sessions followed a similarprocedure. Nielsen’s (1994) procedure for heuristicevaluation was adopted. He recommended that theevaluators visit the interface under investigation atleast twice; the first to get a feel for the flow of theinterface and the general scope of the system, thesecond to allow the evaluator to focus on specificinterface elements while knowing how they fit thelarger whole. Consequently, the web experts in thisresearch were asked to visit each website twice. At thebeginning of each session, the web expert was asked toexplore the website under investigation for 15 min andthen to try buying an item from the site (aborting atthe final stage). After the exploration, the heuristicguidelines (Appendix 3) were given to him/her to beused as guidelines while evaluating each website. Theseexperts were asked to write down their commentsconcerning whether or not the website complied witheach heuristic principle and to make any additionalcomments.

The data were analysed to determine whichmethods identified each usability problem area. Theanalysis was undertaken in two stages. The first stageaimed to identify a list of common usability problemsby analysing each method separately. The user testingmethod was analysed by examining: performance data;in-session observation notes; notes taken from review-ing the 60 Camtasia sessions; users’ comments notedduring the test; and quantitative and qualitativedata from the post-test questionnaires. The heuristicevaluation method was analysed by examining theheuristic evaluators’ comments obtained during the

15 sessions. The problems were classified according tothe comprehensive heuristics developed specifically fore-commerce websites.

The main objective of the second stage of analysiswas to generate a list of standardised usability problemthemes and sub-themes to facilitate comparison amongthe various methods. Problem themes and sub-themeswere identified from the common usability problemareas which were generated by each method. Thesewere then used to classify the problems which had beenidentified. The list was generated gradually, startingfrom an analysis of the first method used in the usertesting (the performance data). Then, after an analysisof the aforementioned methods, new problem themesand/or sub-themes were added to the list fromproblems that were not covered in the standardisedthemes. Ten themes were finally identified. The list ofthe problem themes and sub-themes that were gener-ated from the analysis of all the methods is shown inAppendix 4.

Gray and Salzman (1998) defined threats to validityof experimental studies within the context of HCIresearch. They provided recommendations for addres-sing different types of validity that are most relevant toHCI research, such as internal validity and causalvalidity.

These recommendations were considered in thisresearch in order to ensure its validity. The internalvalidity of this research concerned instrumentation,selection and setting. The researcher/experimenter wasnot assigned to different UEMs and identified usabilityproblems. Despite the fact that the researcher wasinvolved in the collection of data and played the role ofobserver in the user testing sessions and heuristicevaluation sessions, the web experts in the heuristicevaluation sessions identified the usability problemsthemselves. The researcher only reported the results ofthe experts. Furthermore, the categorisation of usabil-ity problems identified by each method was not thebasis for categorising the usability problems obtainedfrom the other methods. Each method was analysedseparately, then problems that were identified by eachmethod were compared to generate the problemthemes and sub-themes which were generated gradually.

The selection issue was also considered whilerecruiting participants in the user testing and heuristicevaluation methods. The characteristics of the partici-pants in the user testing were based on the companies’profiles of their users. Also, the web experts whoparticipated in the heuristic evaluation all hadapproximately similar experience (i.e. more than 10years) in designing e-commerce sites. Users withspecific usability needs (e.g. users with visual orcognitive impairment) were not specifically consideredwhen recruiting participants.

Behaviour & Information Technology 713

Page 8: Evaluating E-commerce Websites 2012

The ‘setting’ issue was also considered in thisresearch, where all the participants in the user testingperformed the testing in the same location under thesame conditions and all followed the same procedure.All the experts in the heuristic evaluation performedthe inspection under the same conditions and followedthe same procedure. Even though every web expertevaluated the sites in his/her office in his/her company,similar conditions existed in each company.

Causal construct validity was also taken intoconsideration in this research. This section describeshow each method was used while these methodsrepresent the usability methods that were identifiedand described in the literature. The problem ofinteractions was avoided since the participants in theuser testing were not the same as those who carried outthe heuristic evaluation.

To measure the inter-rater reliability or the extentof agreement between raters (heuristic evaluators)Kappa (k) statistics were used. The specific macrofrom the SPSS (mkappasc.sps) was downloaded tocalculate Kappa (k) for multi raters. The overallKappa (k) for all the usability problems identifiedby the evaluators was 0.69, which, according toAltman (1991), indicates *good* agreement amongthe evaluators.

5. Results

This section presents the results obtained from theanalysis of the user testing and heuristic evaluationmethods. It presents an overview of the characteristicsof the participants in the user testing. This is followedby reviewing the costs of employing user testingand heuristic evaluation methods. Then, the numberof usability problems identified by the methods ispresented, before presenting the specific usabilityproblem areas that were identified by these methods.

5.1. Participants’ characteristics

The majority (17) of the participants had more thanthree years’ experience using computers, while onlythree participants had less than this. Half of theparticipants (10) had more than three years’ experienceusing the Internet and the other half had less than threeyears’ experience. Only four participants reportedhaving used the Internet for purchasing. These resultswere not surprising given that the user testing wasconducted in a developing country (Jordan). Thenumber of e-commerce users in Jordan was estimatedto be 198,000 in 2008 (3.42% of the total population)(Arab Advisors Group 2008). Therefore, the sample ofthis research which includes a large number of noviceusers, in terms of their experience in purchasing from

the Internet, is a representative sample of the users inJordan. All participants reported that they had notexplored the three sites prior to the usability testing.

5.2. Comparative costs

The cost of employing the user testing and heuristicevaluation was estimated in terms of time spentdesigning and analysing each of these methods. Theapproximate time specifically related to the time spentconducting each method including: time for setting upand designing the research tools, collecting andanalysing data. This section reviews the approximatetime for each method. It should be noted that the timesfor the collection and analysis of data given in Table 2represent the average time taken per site.

5.2.1. User testing method

The approximate time taken to design and analyse theuser testing method was 326 h (See Table 2). Thisincluded:

. Setup and design of research tools: A total of136 h were spent recruiting typical users (16 h)and designing users’ tasks and questionnaires(pre-test and post-test questionnaires) (120 h).

. Time spent collecting data: A total of 20 h werespent in users’ sessions observing users, takingnotes and in distributing and collecting thequestionnaires; each session took approximately1 h.

. Time spent analysing data: A total of 170 h werespent transcribing the observation data andusers’ comments and in writing up the usabilityproblems (90 h). A further 80 h were spentstatistically analysing the performance data andquestionnaires.

5.2.2. Heuristic evaluation method

It is worth mentioning that the context of this researchinfluenced the estimation of time spent conductingthe heuristic evaluation method. As mentioned in

Table 2. Comparative costs for the user testing andheuristic evaluation methods.

Usability method

Usertesting(hours)

Heuristicevaluation(hours)

Setup and design of research tools 136 128Collecting data 20 15Analysing data 170 80Total time 326 223

714 L. Hasan et al.

Page 9: Evaluating E-commerce Websites 2012

Section 4, the research was conducted in a developingcountry (Jordan) where there was a limitation regard-ing recruiting evaluators who have experience inusability issues. The evaluators involved in the inspec-tion did not have enough experience to createcomprehensive heuristic guidelines; these were there-fore created by the authors. The evaluators also hadlimited experience in describing usability problems inthe format required for usability reports; one of theauthors therefore observed all inspection sessions. Thetime spent by the researchers in creating the heuristicsand transcribing the web expert comments and writingthe usability problems therefore was taken intoaccount in the estimation. However, the time spentin creating the heuristics could be discounted shouldthe heuristics be used in further e-commerce websiteevaluations. The approximate time taken to design andanalyse the heuristic evaluation method was 223 h(Table 2). This included:

. Setup and design of research tools: A total of128 h were spent recruiting web experts (8 h) andcreating the heuristic guidelines (120 h) that wereused by the web experts.

. Time spent collecting data: A total of 15 h werespent taking detailed notes from the five webexperts who participated in the study over fivesessions; each session took approximately 3 h.

. Time spent analysing data: A total of 80 h werespent transcribing the web experts’ commentsand writing out the usability problems.

It is worth mentioning, however, that one hourexpert’s time might be more valuable than one houruser’s time.

5.3. Number of usability problems

A total of 243 usability problems were identified by theuser testing and heuristic evaluation methods on thethree websites. Figure 1 shows the distribution of theseproblems by each method and shows also the propor-tion of common problems identified by both methods.This figure shows that the heuristic evaluation wasmore effective than user testing in terms of identifyinga larger proportion of problems. However, the numberof problems identified is insufficient to judge theeffectiveness of each method. Therefore, the problemsidentified by each method were classified by theirseverity: major and minor. Major problems includedproblems where a user made a mistake/error and wasunable to recover and complete the task within thetime limit which was assigned for each task andconfirmed by the pilot test. Minor problems includedboth problems where a user made a mistake/error but

was able to recover and complete the task in theallotted time (the time limit which was assigned foreach task). Difficulties faced by the user whileperforming the required tasks, and which were notedby the observer, were considered minor problems.Major and minor problems, generated by user testing,were identified by referring to the performance data,the observation notes, the notes generated fromreviewing the 60 Camtasia files and users’ comments,and the post-test satisfaction questionnaire. Minorand major problems generated by the heuristicevaluation were identified by matching each identifiedproblem with the severity rating obtained from the webexperts.

An analysis of usability problems by level ofseverity showed that heuristic evaluation was effectivein uniquely identifying a large number of minorusability problems while the user testing was effectivein uniquely identifying major problems. Table 3 showsthe number of problems, by severity level, that wereidentified by these methods. Interestingly, the commonproblems identified by these methods were divided intotwo categories: the first included common problemswhere there was an agreement regarding the severitylevel of these problems between the user testing andheuristic evaluation methods, while the second in-cluded problems where there was no agreementbetween the two methods concerning the problems’severity level. The four classes of problems (found byuser testing, heuristic evaluation, common agreed andcommon not agreed) are mutually exclusive. Forexample, Table 3 shows that user testing identified 31minor usability problems: 8 of them were uniquelyidentified by this method, while 23 were commonlyidentified by both user testing and the heuristicevaluation method; of these 23 problems, there was

Figure 1. Distribution of usability problems identified bythe two methods.

Behaviour & Information Technology 715

Page 10: Evaluating E-commerce Websites 2012

an agreement regarding the severity rating of 14 of theproblems.

Table 3 shows that there was agreement betweenthese two methods regarding the severity level of23 problems, while there was no agreement for 21problems. The 21 problems included problems whichwere identified as major by user testing while they wereidentified as minor by heuristic evaluators and viceversa. This could be explained by the difference inexperience between the users involved in the usertesting and web experts, the subjectivity in severitycoding activities, and the fact that web experts mighthave difficulty assuming the role of a user. Section 5.4provides examples regarding types of problems identi-fied by users and web experts.

Table 3. Distribution of usability problems identified by thetwo methods by severity.

Usability methodMinor

problemsMajor

problems Total

User testing 8 17 25Heuristic evaluation 148 26 174Common

problems(agreedseverity)

User testingandheuristicevaluation

14 9 23

Commonproblems(notagreedseverity)

User testing 9 12Heuristic

evaluation12 9 21

Totalnumber ofproblems

243

Figure 2. Distribution of usability problems identified by the two methods by number and types of problem.

5.4. Usability problem areas

This section reviews the problems identified by the usertesting and heuristic evaluation methods employed inthis research. It uses the problem themes that weregenerated from the analysis to explain which methodswere able to identify usability problems related to eachproblem theme. Figure 2 shows the distribution ofusability problems that were uniquely and commonlyidentified by the two methods with respect to a numberof usability problems related to the 10 problem themes.Figure 2 shows that the heuristic evaluation methodwas more effective in identifying a large number ofproblems compared to user testing with respect to allthe problem themes, with the exception of one:purchasing process problems. In this problem theme,user testing identified a larger number of problems.This figure also shows that user testing uniquelyidentified problems related to four problem themes.These included: navigation, design, the purchasingprocess and accessibility, and customer service.

The following subsections review, with regard toeach problem theme, the effectiveness of each usabilitymethod in identifying each problem sub-theme in termsof the number of problems identified and their severitylevel. Problems common to these methods, and pro-blems missed by these methods, are also highlighted.

5.4.1. Navigation problems

The results show that the sites had major navigationalusability problems related to three specific areas:

. Misleading links, including links with names thatdid not meet users’ expectations as the name of

716 L. Hasan et al.

Page 11: Evaluating E-commerce Websites 2012

the link did not match the content of itsdestination page. An example of this problemwas identified on site 2. This related to the ‘go’link located on the shipping page (Figure 3).Users expected this link to provide them with ahint regarding the information they had to enterin the ‘Redeem a Gift Certificate’ field. However,this link did not provide any additional informa-tion. Instead, it displayed a message box thatasked users to enter their gift certificate numberin the required field.

. Links that were not obvious, such as links thatwere not situated in obvious locations on thesites.

. A weak navigation support problem, for instancepages without a navigation menu.

The user testing method was more effectivecompared to the heuristic evaluation in uniquelyidentifying both major problems related to the firsttwo areas and minor problems related to the thirdarea. However, the results show that the heuristicevaluation was more effective compared with the user

testing in uniquely identifying major problems relatedto the third area and minor problems related to thefirst two areas. Appendix 5 shows the distribution ofthe number of specific navigation problems identifiedon the three sites and their severity level. The heuristicevaluators also uniquely identified other minor pro-blems related to two areas: pages with broken linksand orphan pages (i.e. pages that did not have anylinks) on the sites. Kappa (k) for the navigationalusability problems identified by the evaluators was0.65, which, according to Altman (1991), shows*good* agreement between evaluators.

5.4.2. Internal search problems

Both the user testing and heuristic evaluation methodsidentified four common usability problems with theinternal search facilities of sites 1 and 2; two weremajor and the others were minor (Appendix 5). Thetwo major problems were related to inaccurate resultsprovided by the search facilities of these sites. Theother two minor problems were related to the limitedoptions provided by the search interface (i.e. users

Figure 3. Go link and the message displayed after clicking it on Site 2.

Behaviour & Information Technology 717

Page 12: Evaluating E-commerce Websites 2012

could not search the site by product type and productname concurrently). However, the heuristic evaluatorsalso identified a further two minor usability problems,which were not identified by user testing. The firstproblem related to inaccurate results that wereprovided by the internal search facility of onesubsection of site 3. The other related to the non-obvious position of the internal search facility in site 1.The heuristic evaluators indicated that most usersexpected to see an internal search box at the top of thehome page whereas it was actually located under theleft-hand navigation menu (see Figure 4). There was*good* agreement among the evaluators about theinternal search usability problems identified on thesites (Altman 1991) (k ¼ 0.79).

5.4.3. Architecture problems

The user testing and heuristic evaluation methodsidentified one common major usability problem on site

3 in this category. This problem was that thearchitecture and categorisation of this site’s informa-tion and products was neither simple nor straightfor-ward (Appendix 5). However, the heuristic evaluatorsalso identified three further minor problems. Twoproblems concerned sites 2 and 3 and the order of theitems on their menus, which were considered to beillogical. The third problem, specific to site 3, was thatevaluators found the categorisation of the menu itemsillogical. Kappa (k) for the architectural usabilityproblems identified by the evaluators was 0.76, thusshowing *good* agreement between them (accordingto Altman (1991)).

5.4.4. Content problems

The results show that the user testing method did notuniquely identify problems related to this area whilethe heuristic evaluators uniquely identified a total of 32problems. Eight common problems were identified by

Figure 4. Basic and advanced internal searches on Site 1.

718 L. Hasan et al.

Page 13: Evaluating E-commerce Websites 2012

the two methods. These related to three specificcontent problems: some content was irrelevant on allsites; there was some inaccurate information on sites 1and 2 as these sites had product pages which displayedout of stock products; and some product informationwas missing as none of the sites displayed the stocklevel of their products on their product pages.However, there was no agreement between the twomethods regarding the level of severity of thesecommon problems; the two methods agreed on theseverity of only one major problem out of eightcommon problems (see Appendix 5). Furthermore, theheuristic evaluators uniquely identified 28 additionalproblems related to these three specific areas. It isworth noting that the heuristic evaluators alsouniquely identified four additional minor problemsrelating to inaccurate grammar and missing informa-tion about the companies. Evaluators’ agreementregarding content usability problems on the sites was*moderate*, according to Altman (1991), (k ¼ 0.60).

5.4.5. Design problems

In this area, only two major problems were uniquelyidentified by user testing (Appendix 5), which related

to inappropriate page design. For example, the loginpage on site 2, which was designed to be used bycurrent and new users, was designed in a way that wasnot clear. It was divided into two parts, the left part tobe completed by current users and the right by newusers (Figure 5).

Observation showed that all users entered theirinformation in the current users’ fields instead of thefields for new users. The six common problems thatwere identified by the two methods were related toinappropriate page design, misleading images andinappropriate choice of fonts and colours through-out a site. The two methods agreed on the severitylevel of only three of these problems (Appendix 5).The heuristic evaluators uniquely identified anadditional 17 problems related to the three areasthat included the commonly identified problems andthey also uniquely identified 21 other minor designproblems which were related to broken images,missing alternative text, inappropriate page titles,inappropriate quality of images and inappropriateheadings. There was *good* agreement among theevaluators about the design usability problemspresent on the sites, according to Altman (1991),(k ¼ 0.76).

Figure 5. Login page on Site 2.

Behaviour & Information Technology 719

Page 14: Evaluating E-commerce Websites 2012

5.4.6. Purchasing process problems

This was the only area where user testing identified alarger number of usability problems compared to theheuristic evaluators (Appendix 5). The user testinguniquely identified nine purchasing process problemswhile the heuristic evaluators identified only seven. Atotal of five problems were commonly identified byboth methods. Seven of the problems identified by usertesting were major while the other two were minor. Themajor problems were related to:

. Users had difficulty in distinguishing betweenrequired and non-required fields on the threesites (one problem per site).

. Users had difficulty in knowing which link toselect to update information. For example, onsite 1, users did not recognise that they had toclick on the ‘update order’ link located on theshopping cart page to confirm a shopping cartupdate.

. Information that was expected was not displayedafter adding a product to the cart. This problemwas found on site 3. Site 3 did not display anyconfirmation message after users had addedproducts to their cart. No new page wasdisplayed because the product page had, in thetop menu, a link that was required to completethe order after users had added products to theircart. This link was named ‘complete order’. Itwas observed that most users clicked more thanonce on the ‘Add to Cart’ link (Figure 6).

The two minor problems identified by user testingrelated to information that was expected but that wasnot displayed after adding a product to the cart on site1. The other problem related to the fact that users haddifficulty in knowing what information was requiredfor some fields. This was identified on site 2 where itwas observed that most users faced this problemduring the purchasing process.

The seven problems that were uniquely identifiedby heuristic evaluators regarding the purchasingprocess were major, as indicated by the evaluators.These included:

. It was not easy to log on to site 1. This problemrelated to the fact that site 1 used both anaccount number and an email for logging on tothe site. It could be inconvenient, as well asproblematic, for users to remember their accountdetails.

. No confirmation was required if users deleted anitem from their cart (three problems wereidentified on the three sites; one problem persite).

. A long registration page was identified on site 1.The registration form had many fields which hadto be filled in by the users.

. Compulsory registration was required by users inorder to proceed in the purchasing process (twoproblems were identified on sites 1 and 2).

Regarding the common problems that were identi-fied by the two methods, the two methods agreed onthe severity level of only one of them. This was a majorproblem that was identified on site 3. This was asession problem as users had to enter their informationfor each transaction during the same session becausethe site did not save their information. The other fourcommon problems included:

. Users had difficulty in knowing what informa-tion was required for some fields.

. Some fields that were required were illogical. Forexample, the registration page on site 1 included‘state/province’ fields. These fields were requiredeven if the selected country had no states orprovinces.

. The ordering process of site 1 was long and usersfound that the ‘checkout’ link was displayedtwice on two successive pages.

The evaluators showed *good* agreement in theproblems identified in the purchasing process (Altman1991) (k ¼ 0.69).

5.4.7. Security and privacy problems

In this area, as shown in Figure 1 and Appendix 5, theuser testing method did not identify any problems. Bycontrast, the heuristic evaluators identified one majorproblem related to this area. They reported that site 3did not indicate that it was secure nor that it protectedusers’ privacy because no security guarantee or privacystatement policy was displayed.

5.4.8. Accessibility and customer service problems

The user testing uniquely identified four problems(three major and one minor) in this area, the heuristicevaluators uniquely identified eight minor problems,and two minor problems were commonly identified byboth methods (Appendix 5). The three major problemsidentified by the user testing related to it being difficultto find help/customer support information. Forexample, observation showed that users did notknow where to find shipping information. The minorproblem identified by the user testing concerned thelack of information displayed on the frequently askedquestions (FAQ) page of site 3.

720 L. Hasan et al.

Page 15: Evaluating E-commerce Websites 2012

The eight minor problems that were identified bythe heuristic evaluators related to:

. Sites 2 and 3 did not support more than onecurrency.

. Sites 2 and 3 had a problem regarding the lack ofa customer feedback form.

. Site 3 did not have a help/customer support-section.

. The heuristic guidelines included a sub-categoryregarding the ease of finding and accessing thesite from search engines. The heuristic evaluatorsused only a Google search to check this sub-category due to time limitations. They found thatit was not easy to find sites 2 and 3 from Google.

. The difficulty in finding customer supportinformation was only identified for site 2 (incontrast to the user testing which identified this

Figure 6. Product page on Site 3.

Behaviour & Information Technology 721

Page 16: Evaluating E-commerce Websites 2012

problem on all three sites). However, theheuristic evaluators were able to indicate thereason for this problem more clearly; they statedit was due to the navigation and contentproblems previously identified on the site.

The common minor problem that was identifiedby the two methods related to the fact that sites 1 and 2did not support Arabic. Most users considered theunavailability of an Arabic interface as a usabilityproblem. It should be noted, however, that there weresome variations between the expert evaluators in thetypes of accessibility and customer service usabilityproblems identified, k ¼ 0.22.

5.4.9. Inconsistency problems

All the problems (22) that were identified on the threesites in this area were minor (Appendix 5). There wasonly one common inconsistency problem that wasidentified by the two methods on one site; the Arabicand English interfaces on site 3 were inconsistent.Conversely, the heuristic evaluators identified a total of21 inconsistency problems on all sites. These problemsincluded inconsistent position of the navigation menuon site 1, inconsistent colours and page layoutalignment on site 2, and inconsistent page layout,font colour and style, link colours, terminology,content, menu items, design, page heading andsentence format on site 3. Some variations occurredbetween the expert evaluators when identifying incon-sistency usability problems as shown by the Kappa (k)statistic (k ¼ 0.22).

5.4.10. Missing capabilities

The user testing method did not uniquely identify anyproblem related to missing capabilities on the threesites. However, it identified only one minor problemwhich was also identified by the heuristic evaluatorsrelated to missing capabilities of the sites; site 3 did nothave an internal search facility (Appendix 5).

The heuristic evaluators, however, uniquely identi-fied a large number of problems (19) regarding missingcapabilities on the three sites. They indicated that sites1 and 2 did not have alternative methods of delivery;did not have links to useful external resources and didnot have a site map. Furthermore, they stated that site1 did not display the content of its shopping cart on itstop menu, did not support delivery to another address,did not display information about delivery and itsnavigation menu, did not give a clear indication of thecurrent page on display, while site 2 did not havealternative methods of ordering. The evaluatorsindicated that site 3 did not have an internal search

facility or a customer service section; also, this site didnot display information regarding either paymentoptions or cancelling an order. Not all the expertevaluators found the same missing capabilities asidentified by the Kappa (k) statistic (k ¼ 0.21).

6. Discussion

This section compares the findings obtained fromprevious research, which compared the effectiveness ofthe user testing and heuristic evaluation methods, withthe results of this research. The comparison ispresented under four headings: the number of usabilityproblems, the number of minor and major usabilityproblems, the cost of employing each method, andfinally, the content of the usability problems that wereidentified.

6.1. Number of usability problems

Similar to previous research comparing user testingand heuristic evaluation techniques (Desurvire et al.1992a,b, Doubleday et al. 1997, Fu et al. 2002, Lawand Hvannberg 2002), this research found that theheuristic evaluation method identified the largestnumber of problems compared to the user testing.This is not surprising given the processes used by theuser testing and heuristic evaluation methods toidentify usability problems, as mentioned by Tanet al. (2009). For example, user testing focused onidentifying usability problems that users faced whileperforming only specific tasks while interacting withan interface, while the heuristic evaluators areexpected to explore of the whole of the interfaceunder inspection without being limited to specifictasks.

6.2. Number of minor and major usability problems

The results of this research revealed that heuristicevaluation was more effective than the user testing inuniquely identifying minor problems, whereas usertesting was more effective than the heuristic evaluationin uniquely identifying major problems. This is inagreement with the results obtained by earlier research(Law and Hvannberg 2002). These results stress thevalue of these two evaluation methods as they arecomplementary; in other words, each of these methodsis capable of identifying usability problems which theother method would be unlikely to identify.

Earlier research also showed the percentages ofusability problems that were commonly identified byuser testing and heuristic evaluation methods (Fu et al.2002, Law and Hvannberg 2002, Tan et al. 2009).However, these studies unlike the research described in

722 L. Hasan et al.

Page 17: Evaluating E-commerce Websites 2012

this article, did not investigate or compare the severityof the problems that were identified by the twomethods. The results of the present research showedthat the severity of only 23 of the 44 problems wererated the same by both methods. This providesevidence to support the claim raised in the literaturethat heuristic evaluators cannot play the role of usersand cannot judge the severity of usability problems inan interface for actual users. This might support theclaim raised earlier by some researchers (Cockton andWoolrych 2001) that heuristic evaluators produce falsepositives which do not need to be addressed throughthe redesign of the interfaces.

6.3. Cost of employing UEMs

Earlier studies reported that, in terms of time spent, theuser testing method incurred a higher cost comparedto heuristic evaluation methods (Jeffries et al. 1991,Doubleday et al. 1997, Law and Hvannberg 2002,Molich and Dumas 2008). The results of this researchconcur with the previous studies. However, the amountof time taken for both methods was greater in thisresearch compared to earlier studies. The main reasonfor this difference is likely to be because, unlikeprevious studies, this research specified and includedthe time spent on setup and design, and on datacollection and analysis, see Table 4 for comparison.Further, additional time in this study might haveresulted from the use of less experienced usabilityspecialists; previous studies used usability specialistsconversant with HCI.

Presumably, the time that is shown in Table 4depends on the number of users and evaluators whoparticipated in the user testing and heuristic evaluation

methods. However, there is a limitation in this tablewhich previous studies did not explicitly report: thefixed and variable cost of employing the user testingand heuristic evaluation methods. Fixed cost relates tothe time spent designing and setting up the methods,regardless of the number of users or evaluatorsinvolved in the testing, while variable cost relates tothe cost of conducting or collecting and analysing thesemethods; this depends mainly on the number of usersand evaluators involved in the testing.

6.4. Content of usability problems

This research addressed the gaps noted in the literatureregarding two issues. The following subsections discusshow these gaps were addressed by referring to theresults of this research.

6.4.1. Comparisons between the results

This research explained the effectiveness of user testingand heuristic evaluation methods in identifying 44specific usability problems that could be found on ane-commerce website. These problems are related to 10usability problem themes which were identified in thisresearch, as shown in Appendix 4.

Despite the fact that the results of this researchinvolved providing more detailed descriptions ofusability problems that were uniquely identified bythe user testing and heuristic evaluation methodscompared to the previous research, it was found thatthere was an agreement between most of the results ofthis research and the results of the previous studies.Table 5 summarises examples of the usability problemsthat were identified uniquely by the user testing and

Table 4. Cost of employing usability evaluation methods.

Study Time spent on user testing Time spent on heuristic evaluation

Jeffries et al. (1999) 199 h 35 hThis time was spent on analysis. Six subjects

participated in this studyThis time was spent on learning the method and on

becoming familiar with the interface underinvestigation (15 h) and on analysis (20 h). Fourusability specialists conducted this method.

Law and Hvannberg(2002)

200 h 9 hThis time was spent on the design and

application of this method. Ten subjectsparticipated in this study.

The time was spent on the design and conduction ofthis method by two evaluators

Doubledayet al. (1997)

125 h 33.5 hThis time included 25 h conducting 20 users’

sessions, 25 h of evaluator time supportingduring users’ sessions and 75 h of statisticalanalysis

This time included 6.25 h of five experts’ time in theevaluation, 6.25 h of evaluators’ time takingnotes and 21 h transcription of the experts’comments and analysis

This research 326 h 247 hThe time included 136 h setup and designing,

20 h collecting data from 20 users’ sessions,and 170 h analysing data

The time included 128 h setup and designing, 15 hcollecting data from the five web experts, and104 h analysing data

Behaviour & Information Technology 723

Page 18: Evaluating E-commerce Websites 2012

heuristic evaluation methods, as reported in earlierresearch.

A general overview of the problems that wereuniquely identified by the user testing and heuristicevaluation methods in this research revealed that usertesting identified problems which influenced the per-formance of the users while attempting to carry out thepurchasing tasks on the sites, as indicated by earlierresearch. Also, a general overview of the problems thatwere identified by the heuristic evaluators in thisresearch revealed that this method identified problemsrelated to improvements or interface features andquality, as indicated in the earlier research.

Furthermore, the other problems that were identi-fied by user testing in the earlier research relatedspecifically to: a lack of feedback and help facilities,navigation problems, the use of complex terms,inappropriate choice of font size and few consistencyproblems; these were also confirmed by the results ofthis research. Specific examples of problems identifiedin this research were discussed under Sections 5.4.1,5.4.4, 5.4.5, 5.4.6, 5.4.8 and 5.4.9. Also, the problemsthat were identified by the heuristic evaluators in theprevious research, which included inconsistency, ap-pearance, layout problems, and security and privacyissues, were confirmed by the results of this research,as discussed in Section 5.4.

It is worth mentioning, however, that not all of theresults of the earlier research were confirmed by thisresearch. For example, Doubleday et al.’s study (1997)pointed out that only the heuristic evaluators uniquelyidentified one problem related to the slow responsetime of an interface in displaying results. In thisresearch, a similar problem, caused by the display of alarge number of images, was identified by both the usertesting and heuristic evaluators. The apparent

difference in the results could relate to the fact thatin Doubleday et al.’s study the usability problems thatwere identified by the user testing were based only onthe quantitative data obtained mainly from observa-tion and performance data. In this research, however,the slow response problem identified by user testingwas identified from qualitative data obtained from thesatisfaction questionnaire and not from the perfor-mance data. This suggests the importance of usingvarious methods to identify different types of pro-blems. Differences were also found between the resultsof Mariage and Vanderdonckt’s (2000) study and thisresearch. Mariage and Vanderdonckt (2000) noted thatproblems related to the inappropriate choice of fontsize, the use of an inappropriate format and consis-tency problems were identified by user testing butmissed by the heuristic evaluators. Similar problems, inthe current research were correctly identified usingboth methods. A possible explanation of the differenceis that, unlike Mariage and Vanderdonckt’s (2000)study, these elements were included in the list ofheuristics for use by the evaluators. This suggests thatthe results of the heuristic evaluation method depend,in part, on the heuristic guidelines that are used by theevaluators.

6.4.2. Types and severity levels of usability problems

A few studies have described the type, number andseverity of problems identified by the user testing andheuristic evaluation methods, see, for example, thework of Tan et al. (2009). They identified seven typesor categories of problems but they did not provideexamples to illustrate the effectiveness of the twoUEMs in identifying specific usability problems interms of their severity level.

Table 5. Examples of content of usability problems that were uniquely identified by user testing and heuristic evaluationmethods.

Example of usability problems References

Characteristics ofusability problemsthat were identified byuser testing

Related to user performance Simeral and Branaghan (1997),Jeffries et al. (1991), Doubledayet al. (1997), Fu et al. (2002),Law and Hvannberg (2002),Mariage and Vanderdonckt(2000), Bach and Scapin, (2010)

Related to a lack of clear feedback and poor help facilitiesRelated to functionality and learnability problemsRelated to navigationRelated to excessive use of complex terminology(technical jargon)

Related to inappropriate choice of font sizeRelated to the use of an inappropriate format for linksFew consistency problems

Characteristics ofusability problemsthat were identified byheuristic evaluation

Related to interface features and interface quality Nielsen and Phillips (1993),Doubleday et al. (1997), Nielsen(1992), Law and Hvannberg(2002), Simeral and Branaghan(1997), Fu et al. (2002), Tan et al.(2009), Bach and Scapin, (2010)

Related to the appearance or layout of an interfaceInconsistency problems with the interfaceRelated to slow response time of the interfaceto display results

Related to compatibilityRelated to security and privacy issues

724 L. Hasan et al.

Page 19: Evaluating E-commerce Websites 2012

The research described here has attempted toillustrate the effectiveness of both methods in identify-ing the number and severity of the 44 specific usabilityproblems which were uniquely identified by eitheruser testing or heuristic evaluation, those that werecommonly identified by both methods or those thatwere missed by each method.

The results, as discussed in Section 5.4, showed thatmost of the problems that were uniquely identified byuser testing were major ones which prevented real usersfrom interacting with and purchasing products frome-commerce sites. This method identified major pro-blems related to four specific areas and minorproblems related to one area. Conversely, most ofthe problems that were uniquely identified by theheuristic evaluators were minor; these could be used toimprove different aspects of an e-commerce site. Thismethod identified minor problems in eight areas andmajor problems in four.

By considering the type of problems identified byuser testing and heuristic evaluation methods, it isobvious that the profiles of the participants for usertesting almost certainly had a significant impact on thefindings. For example, the lower level of experience ofusers might explain their ignorance of many usabilityproblems, such as some design problems and securityand privacy problems (as presented in Section 5.4).

It is worth mentioning that the idea of illustratingthe advantages and disadvantages of the two UEMs toobtain effective evaluation of the usability of e-com-merce websites could have a particular value to thefield of e-commerce website evaluation. However, thefact that this research was conducted in a developingcountry (i.e. Jordan) may limit the generalisation ofthe results. It is possible, that users and/or heuristicevaluators in other more developed countries mayidentify different types of problems based on theirgreater experience.

Therefore, the results of this research have aparticular value to Jordan and other developingcountries. The results will aid e-commerce companiesin these countries to take appropriate decisionsregarding which usability method to apply and howto apply it in order to improve part or the overallusability of their e-commerce website. Improving theusability of e-commerce websites will help to obtainthe advantages of e-commence in this challengingenvironment.

7. Conclusion

The research described here had some limitations. Forexample, convenience sampling was used for theselection of the websites based on availability ratherthan criteria based on the existence of a range of

usability problems. The types of problems covered maynot, therefore, be representative of all e-commercewebsites.

Despite this limitation, the outcomes of thisresearch concur with that of others which compareduser testing and heuristic evaluation methods (Double-day et al. 1997, Desurvire et al. 1992a,b, Fu et al. 2002,Law and Hvannberg 2002). For example, the heuristicevaluation method uniquely identified a large numberof usability problems, although most of these wereminor, compared to the user testing method whichidentified fewer but more major problems. Addition-ally, however, this research addressed the gap noted inthe literature with regard to the lack of identification ofspecific types of problems found by user testing andheuristic evaluation methods on e-commerce websites,see Tan et al. (2009). The findings illustrated theeffectiveness of these methods in identifying majorand minor usability problems related to: navigation,internal search facilities, architecture, content, design,the purchasing process, security and privacy, accessi-bility and customer service, consistency and missingcapabilities.

The results also showed the benefits of user testingregarding its accuracy and uniqueness in identifyingmajor specific usability problems which prevented realusers from interacting with and purchasing productsfrom e-commerce sites. These problems related to fourareas: navigation, design, the purchasing process, andaccessibility and customer service. The small numberof problems identified by this method added valuebecause of the low cost that is required to rectify theseproblems by making a few changes to the design of thesites. However, the results also suggested drawbackswith this method. These related to its high cost in termsof the time spent, and its failure (or inability) toidentify the specific minor usability problems, relatedto eight problem areas, which did not relate directly tousers’ performance or satisfaction.

By contrast, the results showed the benefits of theheuristic evaluation method related to its uniquecapability for identifying a large number of specificminor usability problems that could be used toimprove different aspects of the sites. These problemsrelated to eight areas including: navigation, internalsearch, the site architecture, the content, the design,accessibility and customer service, inconsistency andmissing functions. The heuristic evaluators also un-iquely identified major security and privacy problems.Furthermore, the results stressed that, despite thismethod requiring a long time to recruit appropriateweb experts with relevant knowledge and experience,to create heuristic guidelines, and to collect andanalyse the data, it was less than the time spentconducting the user testing method. However, the

Behaviour & Information Technology 725

Page 20: Evaluating E-commerce Websites 2012

results emphasised that the heuristic evaluators couldnot play the role of real users and could not predictactual problems users might face while interacting withthe sites; as a result, they did not identify majorproblems related to four specific usability problemareas, for example. The common problems where theseverity level was not agreed between the heuristicevaluators and user testing represented further evi-dence of the heuristic evaluators’ inability to predictusers’ actions while interacting with the sites. Theinability of heuristic evaluators to accurately predictthe severity of problems may lead to a high cost inredesigning a site, if this method alone was used toidentify severe problems that need rectifying.

This research suggests that employing either usertesting or heuristic evaluation would be useful inidentifying specific usability problems areas, but therewere problems that were uniquely identified andproblems that were missed by each method. Therefore,in order to obtain a comprehensive identification ofusability problems, user testing and heuristic evaluationmethods should be used together to complement eachother regarding the types of problem identified by them.

It is worth mentioning that, despite the fact thatthis research was conducted in Jordan, it is likely thatthe results can be generalised to other countriesbecause many of the details concerning the specificusability problems that may be experienced by userscould result in higher costs if the sites were re-designedunnecessarily. However, further research could beundertaken to verify this. Further research could bealso undertaken to determine if, and how, the numberof users and evaluators employed in e-commercewebsite evaluations affect the number and type ofproblems identified.

References

Altman, D.G., 1991. Practical statistics for medical research.London: Chapman and Hall.

Arab Advisors Group, 2008. Over 29,000 households shareADSL subscriptions in Jordan: the percentage of Jordanianhouseholds connected to ADSL is 11.7% of total households[online]. Available from: http://www.arabad visors.com/Pressers/presser-021208.htm [Accessed 11 August 2009].

Bach, C. and Scapin, D.L., 2010. Comparing inspections anduser testing for the evaluation of virtual environments.International Journal of Human–Computer Interaction, 26(8), 786–824.

Bailey, B., 2001. Heuristic evaluations vs. usability testing,part I [online]. Available from: http://webusability.com/article_heuristic_evaluation_part1_2_2001.htm, [Accessed12 January 09].

Barnard, L. and Wesson, J., 2003. Usability issues fore-commerce in South Africa: an empirical investigation. In:Proceedings of SAICSIT, 17–19 September, Fourways,Gauteng. South Africa: South African Institute for Compu-ter Scientists and Information Technologists, 258–267.

Barnard, L. and Wesson, J., 2004. A trust model fore-commerce in South Africa. In: Proceedings of SAIC-SIT, 4–6 October, Cape Town. South Africa: SouthAfrican Institute for Computer Scientists and Informa-tion Technologists, 23–32.

Barnes, S. and Vidgen, R., 2002. An integrative approachto the assessment of e-commerce quality. Journal ofElectronic Commerce Research, 3 (3), 114–127.

Basu, A., 2002. Context-driven assessment of commercialweb sites. In: Proceedings of the 36th Hawaii internationalconference on system sciences (HICSS’03), 7–10 January,Hawaii, USA.

Blandford, A., et al., 2008. Scoping usability evaluationmethods: a case study. Human Computer InteractionJournal, 23 (3), 278–327.

Brinck, T., Gergle, D., and Wood, S.D., 2001. Usability forthe web: designing websites that work. San Francisco, CA,USA: Morgan Kaufmann Publishers.

Cao, M., Zhang, Q., and Seydel, J., 2005. B2C e-commerceweb site quality: an empirical examination. IndustrialManagement & Data Systems, 105 (5), 645–661.

Chen, S. and Macredie, R., 2005. The assessment of usabilityof electronic shopping: a heuristic evaluation. Interna-tional Journal of Information Management, 25, 516–532.

Cockton, G. and Woolrych, A., 2001. Understandinginspection methods: lessons from an assessment ofheuristic evaluation. In: A. Blandford and J. Vander-donckt, eds. People and computers XV. Berlin: Springer-Verlag, 171–192.

Delone, W. and Mclean, E., 2003. The DeLone and McLeanof information systems success: a ten-year update. Journalof Management Information Systems, 19 (4), 9–30.

De Marsico, M. and Levialdi, S., 2004. Evaluating web sites:exploring user’s expectations. International Journal ofHuman Computer Studies, 60, 381–416.

Desurvire, H., Kondziela, J., and Atwood, M., 1992a. Whatis gained and lost when using methods other thanempirical testing. In: Posters and short talks of the 1992SIGCHI conference on human factors in computingsystems, 3–7 May, Monterey, California, USA. NewYork, NY, USA: ACM (Association for ComputingMachinery), 125–126.

Desurvire, H.W., Kondziela, J.M., and Atwood, M.E.,1992b. What is gained and lost when using evaluationmethods other than empirical testing. In: A. Monk, D.Diaper, and M.D. Harrison, eds. People and computersVII. Cambridge: Cambridge University Press, 89–102.

Desurvire, H., Lawrence, D., and Atwood, M., 1991.Empiricism versus judgement: comparing user interfaceevaluation methods on a new telephone-based interface.SIGCHI Bulletin, 23 (4), 58–59.

Doubleday, A., Ryan, M., Springett, M., and Sutcliffe, A.,1997. A comparison of usability techniques for evaluat-ing design. In: Proceedings of the 2nd conference ondesigning interactive systems, 18–20 August, Amsterdam,The Netherlands. New York, USA: ACM Press, 101–110.

Fisher, J., Craig, A., and Bentley, J., 2002. Evaluating smallbusiness web sites – understanding users. In: Proceedingsof the ECIS2002, 6–8 June, Poland. Princeton, NewJersey, USA: Citeseer, 667–675.

Freeman, M.B. and Hyland, P., 2003. Australian onlinesupermarket usability. Technical report [online]. DecisionSystems Lab, University of Wollongong. Available from:http://www.dsl.uow.edu.au/publications/techreports/freeman03australia.pdf [Accessed 11 October 2007].

726 L. Hasan et al.

Page 21: Evaluating E-commerce Websites 2012

Fu, L., Salvendy, G., and Turley, L., 2002. Effectiveness ofuser testing and heuristic evaluation as a function ofperformance classification. Behaviour & InformationTechnology, 21 (2), 137–143.

Gonzalez, F.J.M. and Palacios, T.M.B., 2004. Quantitativeevaluation of commercial web sites: an empirical study ofSpanish firms. International Journal of InformationManagement, 24, 313–328.

Gray, W. and Salzman, C., 1998. Damaged merchandise? Areview of experiments that compare usability evaluationmethods. Human–Computer Interaction, 13, 203–261.

Hartson, H.R., Andre, T.S., and Williges, R.C., 2001.Evaluating usability evaluation methods. InternationalJournal of Human–Computer Interaction, 13 (4), 373–410.

Hertzum, M. and Jacobsen, N.E., 2001. The evaluator effect:a chilling fact about usability evaluation methods.International Journal of Human–Computer Interaction,13 (4), 421–443.

Hornbaek, K., 2010. Dogmas in the assessment of usabilityevaluation methods. Behaviour & Information Technol-ogy, 29 (1), 97–111.

Huang, W., et al., 2006. Categorizing web features andfunctions to evaluate commercial web sites. IndustrialManagement & Data Systems, 106 (4), 523–539.

Hung, W. and McQueen, R.J., 2004. Developing anevaluation instrument for e-commerce websites from1st buyer’s view point. Electronic Journal of InformationSystems Evaluation, 7 (1), 31–42.

ISO 9241-11, 1998. International standard first edition.Ergonomic requirements for office work with visual displayterminals (VDTs), Part11: Guidance on usability [online].Available from: http://www.idemployee.id.tue.nl/g.w.m.rauterberg/lecturenotes/ISO9241part11.pdf [Accessed 3April 2007].

Jeffries, R. and Desurvire, H., 1992. Usability testing vs.heuristic evaluation: was there a contest? ACM SIGCHIBulletin, 24 (4), 39–41.

Jeffries, R., et al., 1991. User interface evaluation in the realworld: a comparison of four techniques. In: ProceedingsACM CHI’91 conference, New Orleans, LA, April 28–May 2, New York, NY, USA: ACM, 119–124.

Kantner, L. and Rosenbaum, S., 1997. Usability studies ofWWW sites: heuristic evaluation vs. laboratory testing.In: ACM 15th international conference on computerdocumentation, 19–22 October, Salt Lake City, Utah,USA. New York, NY, USA: ACM (Association forComputing Machinery), 153–160.

Kuniavsky, M., 2003. Observing the user experience: apractitioner’s guide to user research. San Francisco,CA; London: Morgan Kaufmann.

Law, L. and Hvannberg, E., 2002. Complementarity andconvergence of heuristic evaluation and usability test:a case study of Universal Brokerage Platform. In: ACMinternational conference proceeding series, 3, proceedingsof the second Nordic conference on human–computerinteraction, 19–23 October, Aarhus, Denmark. NewYork, NY, USA: ACM Press, 71–80.

Liu, C. and Arnett, K., 2000. Exploring the factorsassociated with web site success in the context ofelectronic commerce. Information and Management, 38,23–33.

Mariage, C. and Vanderdonckt, J., 2000. A comparativeusability study of electronic newspapers. In: Proceedingsof int. workshop on tools for working with guidelinesTFWWG’2000 [online]. Available from: http://www.isys.ucl.ac.be/bchi/ [Accessed 20 January 2008].

Molich, R. and Dumas, J., 2008. Comparative usabilityevaluation (CUE-4). Behaviour & Information Technol-ogy, 27 (3), 263–281.

Molla, A. and Licker, S.P., 2001. E-commerce systemssuccess: an attempt to extend and respecify the Deloneand Maclean of IS success. Journal of ElectronicCommerce Research, 2 (4), 131–141.

Nielsen, J., 1992. Finding usability problems throughheuristic evaluation. In: Proceedings of the SIGCHIconference on human factors in computing systems, 3–7May, Monterey, California, USA. New York, NY, USA:ACM (Association for Computing Machinery), 373–380.

Nielsen, J., 1996. Original top ten mistakes in web design[online]. Available from: http://www.useit.com/alertbox/9605a.html [Accessed 10 January 2007].

Nielsen, J., 2000. Designing web usability: the practice ofsimplicity. Indianapolis, Indiana, USA: New RidersPublishing.

Nielsen, J., 2003. Usability 101: Introduction to usability.Useit.com [online]. Available from: http://www.useit.com/alertbox/20030825.html [Accessed 14 February 2006].

Nielsen, J., 2006. Quantitative studies: how many users to test?Useit.com [online]. Available from: http://www.useit.com/alertbox/quantitative_testing.html [Accessed 20 Novem-ber 2007].

Nielsen, J. and Mack, R.L., eds. 1994. Usability inspectionmethods. New York, NY: John Wiley & Sons.

Nielsen, J. and Phillips, V., 1993. Estimating the relativeusability of two interfaces: heuristic, formal, andempirical methods compared. In: Proceedings of theINTERACT ’93 and CHI ’93 conference on human factorsin computing systems, 24–29 April, Amsterdam, TheNetherlands. New York, NY, USA: ACM, 214–221.

Obeidat, M., 2001. Consumer protection and electroniccommerce in Jordan (an exploratory study). In: Proceed-ings of the public voice in emerging market economiesconference, Dubai, UAE [online]. Available from: http://www.thepublicvoice.org/events/dubai01/presentations/html/m_obeidat/m.obeidatpaper.html [Accessed 12 Jan-uary 2007].

Oppenheim, C. and Ward, L., 2006. Evaluation of websitesfor B2C e-commerce. Aslib Proceedings: New InformationPerspectives, 58 (3), 237–260.

Pickard, A., 2007. Research methods in information. London:Facet.

Preece, J., Sharp, H., and Rogers, Y., 2002. Interactiondesign: beyond human–computer interaction. New York,NY, USA: John Wiley & Sons, Inc.

Rubin, J., 1994. Handbook of usability testing: how to plan,design, and conduct effective tests. New York, NY, USA:John Wiley & Sons, Inc.

Sahawneh, M., 2002. E-commerce: the Jordanian experience.Amman, Jordan: Royal Scientific Society.

Sartzetaki, M., et al., 2003. An approach for usabilityevaluation of e-commerce sites based on design patternsand heuristics criteria [online]. Available from:http://www.softlab.ntua.gr/*retal/papers/conferences/HCICrete/Maria_EVAL/HCI2003_DEPTH_camera.pdf[Accessed 10 December 2006].

Shneiderman, B., 1998. Designing the user interface, strategiesfor effective human computer interaction. 3rd ed. MenloPark, CA, USA: Addison Wesley.

Simeral, E. and Branaghan, R., 1997. A comparative analysisof heuristic and usability evaluation methods. STC Proceed-ings [online]. Available from: http://www.stc.org/confproceed/1997/PDFs/0140.PDF [Accessed 20 April 2008].

Behaviour & Information Technology 727

Page 22: Evaluating E-commerce Websites 2012

Singh, M. and Fisher, J., 1999. Electronic commerce issues: adiscussion of two exploratory studies. In: Proceedings ofCollEcTeR, 29 November, Wellington, NewZealand.

Singh, S. and Kotze, P., 2002. Towards a framework fore-commerce usability. In: Proceedings of the annualSAICSIT conference, 16–18 September, Port Elizabeth,South Africa. Republic of South Africa: South AfricanInstitute for Computer Scientists and InformationTechnologists, 2–10.

Srivihok, A., 2000. An assessment tool for electronic commerce:end user evaluation of web commerce sites [online].Available from: http://www.singstat.gov.sg/statsres/conferences/ecommerce/r313.pdf [Accessed 10 October 2006].

Stone, D., et al., 2005. User interface design and evaluation.California: The Open University, Morgan Kaufmann.

Sutcliffe, A., 2002. Assessing the reliability of heuristicevaluation for website attractiveness and usability. In:Proceedings of the 35th Hawaii international conference onsystem sciences (HICSS), 7–10 January, Big Island, HI,USA. Washington, DC: IEEE Computer Society.

Tan, W., Liu, D., and Bishu, R., 2009. Web evaluation:heuristic evaluation vs. user testing. International Journalof Industrial Ergonomics, 39, 621–627.

Tilson, R., et al., 1998. Factors and principles affecting theusability of four e-commerce sites. In: Proceedings of the4th conference on human factors and the web (CHFW), 5June, AT&TLabs, USA.

Van der Merwe, R. and Bekker, J., 2003. A framework andmethodology for evaluating e-commerce websites. Inter-net Research: Electronic Networking Applications andPolicy, 13 (5), 330–341.

Webb, H. and Webb, L., 2004. SiteQual: an integratedmeasure of web site quality. Journal of EnterpriseInformation Management, 17 (6), 430–440.

Zhang, P. and von Dran, G., 2001. Expectations and rankingof website quality features: results of two studies on userperceptions. In: Proceedings of the 34th Hawaii interna-tional conference on system sciences, 3–6, January, Maui,Hawaii, USA. Washington, DC: IEEE Computer So-ciety, 1–10.

728 L. Hasan et al.

Page 23: Evaluating E-commerce Websites 2012

Appendix 1. Tasks scenario for the three websites

Tasks scenario for website 1 Tasks scenario for website 2 Tasks scenario for website 3

Task 1 Find ‘Jilbab JS7107’ Jilbab,Size: Large, Price ¼ $79.99,Colour: Brown.

Find ‘Jilbab with Pants 3106511’Jilbab, Product#: 3106511,Price ¼ $98.99, size: 2, Colour:Ivory2.

Find ‘Traditional White CottonThobe’ Dress, ID ¼ edt-ds-001,Price ¼ $475.00, small size.

Task 2 Purchase two quantities of thisJilbab.

Purchase two quantities of thisJilbab.

Purchase two quantities of thisdress.

Task 3 Change the quantity of thepurchased Jilbab from twoto one and complete thepurchasing process.

Change the quantity of thepurchased Jilbab from two to oneand complete the purchasingprocess.

Change the quantity of thepurchased dress from two toone and complete thepurchasing process.

Task 4 Find ‘Ameera AH7103’ Hijab,Price ¼ $7.99, then add it toyour shopping cart andcomplete the purchasingprocess.

Find ‘Chiffon Hijab 100S152’ Hijab,Product#: 100S152,Price ¼ $14.99, Colour:LightSteelBlue1, then add it toyour shopping cart and completethe purchasing process.

Find ‘Almond Oil’, ID ¼ gf-oil-051, Size ¼ 40 ML,Price ¼ $3.5, then add it toyour shopping cart andcomplete the purchasingprocess.

Task 5 Change your shipping addressthat you have just enteredduring the purchasingprocess and complete thepurchasing process.

Change your shipping address thatyou have just entered during thepurchasing process and completethe purchasing process.

Change your delivery address thatyou have just entered duringthe purchasing process andcomplete the purchasingprocess.

Task 6 Get a list of all the pins whichcan be purchased from thiswebsite, then from the listfind the price of the ‘plasticpins’.

Get a list of all the shirts which canbe purchased from this website,then from the list find the price of‘Shirt 9502237’ Shirt, Product #:9502237.

Get a list of all ceramic itemswhich are displayed atTurathCom online catalog,then from the list find the priceof ‘rusticceramic Jar’, Code:raf- 1.

Task 7 Suppose that you bought aproduct from this websiteand would like to complainthat it took several monthsto get to you. Find out howyou would do this.

Suppose that you bought a productfrom this website and would liketo complain that it took severalmonths to get to you. Find outhow you would do this.

Suppose that you bought aproduct from this website andwould like to complain that ittook several months to get toyou. Find out how you woulddo this.

Task 8 Find out how long it will taketo receive your order afterpurchasing it from thiswebsite?

Find out how long it will take toreceive your order afterpurchasing it from this website?

Find out how long it will take toreceive your order afterpurchasing it from this website.

Task 9 What is the price of ‘SkirtSK6103’ Skirt?

What is the price of ‘Bent Al-NoorDress 5002002’ Dress, Product#:5002002?

What is the price of ‘Those WereThe Days’ Book?

Task 10 Get a list of all the items whichcan be purchased from thiswebsite with size XXLarge.

Get a list of all the items which canbe purchased from this websitewith prices between $150 to $1000.

Find out the types of services thatTurathCom offer.

Behaviour & Information Technology 729

Page 24: Evaluating E-commerce Websites 2012

Appendix 2. Categories, subcategories and references of the developed heuristics

Heuristic References

Architecture and navigationConsistency Oppenheim and Ward (2006), Nielsen (1996), Brinck et al. (2001), Preece et al. (2002), Van der

Merwe and Bekker (2003), Webb and Webb (2004), Basu (2002), Sutcliffe (2002), Fisher et al.(2002), Chen and Macredie (2005), Shneiderman (1998)

Navigation support Oppenheim and Ward (2006), Brinck et al. (2001), Singh and Kotze (2002), Gonzalez andPalacios (2004), Nielsen (2000), Nielsen (1996), Preece et al. (2002), Cao et al. (2005), Van derMerwe and Bekker (2003), Gonzalez and Palacios (2004), Webb and Webb (2004), Zhang andvon Dran (2001), Barnes and Vidgen (2002), Singh and Fisher (1999), Srivihok (2000), Basu(2002), Molla and Licker (2001), Sutcliffe (2002), Fisher et al. (2002), De Marsico and Levialdi(2004), Hung and McQueen (2004)

Internal search Oppenheim and Ward (2006), Cao et al. (2005), Brinck et al. (2001), Nielsen (1996), Tilson et al.(1998), Huang et al. (2006), Van der Merwe and Bekker (2003), Barnard and Wesson (2004),Zhang and von Dran (2001), Liu and Arnett (2000), Basu (2002), Molla and Licker (2001),De Marsico and Levialdi (2004), Hung and McQueen (2004)

Working links Oppenheim and Ward (2006), Van der Merwe and Bekker (2003), Singh and Fisher (1999), Fisheret al. (2002), Huang et al. (2006), Gonzalez and Palacios (2004)

Resourceful links Oppenheim and Ward (2006), Huang et al. (2006), Gonzalez and Palacios (2004)No orphan pages Gonzalez and Palacios (2004), Nielsen (1996), Preece et al. (2002), Van der Merwe and Bekker

(2003), Chen and Macredie (2005), Zhang and von Dran (2001)Logical structure of site Van der Merwe and Bekker (2003), Brinck et al. (2001), Tilson et al. (1998), Oppenheim and

Ward (2006), Cao et al. (2005), Barnard and Wesson (2004), Webb and Webb (2004), Srivihok(2000), Liu and Arnett (2000), Molla and Licker (2001), Chen and Macredie (2005), Hung andMcQueen (2004), Preece et al. (2002), Basu (2002)

Simple navigation menu Tilson et al. (1998), Oppenheim and Ward (2006), Cao et al. (2005), Barnard and Wesson (2004),Preece et al. (2002), Basu (2002)

ContentUp-to-date information Cao et al. (2005), Nielsen (2000), Nielsen (1996), Oppenheim and Ward (2006), Van der Merwe

and Bekker (2003), Barnard and Wesson (2004), Webb and Webb (2004), Gonzalez andPalacios (2004), Barnes and Vidgen (2002), Singh and Fisher (1999), Srivihok (2000), Mollaand Licker (2001), Hung and McQueen (2004), Zhang and von Dran (2001), Huang et al.(2006)

Relevant information Oppenheim and Ward (2006), Cao et al. (2005), Brinck et al. (2001), Nielsen (2000), Preece et al.(2002), Van der Merwe and Bekker (2003), Webb and Webb (2004), Gonzalez and Palacios(2004), Barnes and Vidgen (2002), Singh and Fisher (1999), Liu and Arnett (2000), Molla andLicker (2001), Delone and Mclean (2003), Sutcliffe (2002), Fisher et al. (2002), De Marsico andLevialdi (2004), Zhang and von Dran (2001)

Accurate information Oppenheim and Ward (2006), Cao et al. (2005), Van der Merwe and Bekker (2003), Barnard andWesson (2004), Webb and Webb (2004), Barnes and Vidgen (2002), Singh and Fisher (1999),Srivihok (2000), Liu and Arnett (2000), Molla and Licker (2001), Zhang and von Dran (2001)

Grammatical accuracy Oppenheim and Ward (2006), Van der Merwe and Bekker (2003), Singh and Fisher (1999)Information about the

companyOppenheim and Ward (2006), Huang et al. (2006), Van der Merwe and Bekker (2003), Gonzalez

and Palacios (2004), Barnard and Wesson (2003), Basu (2002), Sutcliffe (2002), Hung andMcQueen (2004)

Information about theproducts

(Oppenheim and Ward (2006), Cao et al. (2005), Tilson et al. (1998), Huang et al. (2006), Van derMerwe and Bekker (2003), Gonzalez and Palacios (2004), Barnard and Wesson (2004), Zhangand von Dran (2001), Liu and Arnett (2000), Basu (2002), Sartzetaki et al. (2003), Hung andMcQueen (2004)

Accessibility and customer serviceEasy to find and access

websiteOppenheim and Ward (2006), Brinck et al. (2001), Sing and Kotze (2002), Nielsen (2000), Nielsen

(1996), Preece et al. (2002), Cao et al. (2005), Van der Merwe and Bekker (2003), Gonzalez andPalacios (2004), Singh and Fisher (1999), Srivihok (2000), Liu and Arnett (2000), Molla andLicker (2001), Delone and Mclean (2003), Fisher et al. (2002), Hung and McQueen (2004),Webb and Webb (2004), Basu (2002), Shneiderman (1998)

Contact us information Oppenheim and Ward (2006), Cao et al. (2005), Huang et al. (2006), Singh and Kotze (2002), Vander Merwe and Bekker (2003), Gonzalez and Palacios (2004), Barnard and Wesson (2004),Webb and Webb (2004), Barnes and Vidgen (2002), Srivihok (2000), Liu and Arnett (2000),Basu (2002), Molla and Licker (2001), Delone and Mclean (2003), Sartzetaki et al. (2003),Chen and Macredie (2005), Hung and McQueen (2004), Zhang and von Dran (2001)

Help/customer service Webb and Webb (2004), Barnes and Vidgen (2002), Srivihok (2000), Liu and Arnett (2000), Basu(2002)

Compatibility Brinck et al. (2001), Van der Merwe and Bekker (2003), Basu (2002), Zhang and von Dran (2001)Foreign language and

currency supportOppenheim and Ward (2006), Van der Merwe and Bekker (2003), Basu (2002), De Marsico and

Levialdi (2004)

(continued)

730 L. Hasan et al.

Page 25: Evaluating E-commerce Websites 2012

Appendix 2. (Continued).

Heuristic References

DesignAesthetic design Van der Merwe and Bekker (2003), Brinck et al. (2001), Singh and Kotze (2002), Preece et al.

(2002), Barnard and Wesson (2004), Webb and Webb (2004), Barnes and Vidgen (2002), Singhand Fisher (1999), Liu and Arnett (2000), Basu (2002), Molla and Licker (2001), Sutcliffe(2002), Fisher et al. (2002), Chen and Macredie (2005), Zhang and von Dran (2001)

Appropriate use of images Oppenheim and Ward (2006), Cao et al. (2005), Van der Merwe and Bekker (2003), Barnard andWesson (2004), Singh and Fisher (1999), Basu (2002), Fisher et al. (2002)

Appropriate choice of fontsand colours

Oppenheim and Ward (2006), Van der Merwe and Bekker (2003), Singh and Fisher (1999), Basu(2002), Sutcliffe (2002), Fisher et al. (2002), Shneiderman (1998)

Appropriate page design Nielsen (1996), Nielsen (1996), Preece et al. (2002), Brinck et al. (2001), Oppenheim and Ward(2006), Van der Merwe and Bekker (2003), Singh and Fisher (1999), Basu (2002), Molla andLicker (2001), Fisher et al. (2002), Shneiderman (1998)

Purchasing processEasy order process Van der Merwe and Bekker (2003), Oppenheim and Ward (2006), Barnard and Wesson (2004),

Singh and Fisher (1999), Liu and Arnett (2000), Molla and Licker (2001), De Marsico andLevialdi (2004), Shneiderman (1998)

Ordering information Oppenheim and Ward (2006), Tilson et al. (1998), Singh and Kotze (2002), Van der Merwe andBekker (2003), Gonzalez and Palacios (2004), Barnard and Wesson (2004), Basu (2002),Sartzetaki et al. (2003), De Marsico and Levialdi (2004), Hung and McQueen (2004)

Delivery information Oppenheim and Ward (2006), Tilson et al. (1998), Sing and Kotze (2002), Van der Merwe andBekker (2003), Gonzalez and Palacios (2004), De Marsico and Levialdi (2004)

Order/delivery statusprovision

Oppenheim and Ward (2006), Huang et al. (2006), Van der Merwe and Bekker (2003), Gonzalezand Palacios (2004), Barnard and Wesson (2004), Webb and Webb (2004), Liu and Arnett(2000), Basu (2002), Molla and Licker (2001), De Marsico and Levialdi (2004), Hung andMcQueen (2004)

Alternative methods ofordering/payment/delivery are available

Oppenheim and Ward (2006), Van der Merwe and Bekker (2003), Basu (2002), Molla and Licker(2001), De Marsico and Levialdi (2004)

Security and privacy Oppenheim and Ward (2006), Brinck et al. (2001), Tilson et al. (1998), Huang et al. (2006), Caoet al. (2005), Van der Merwe and Bekker (2003), Barnard and Wesson (2004), Webb and Webb(2004), Zhang and von Dran (2001), Barnes and Vidgen (2002), Srivihok (2000), Liu andArnett (2000), Basu (2002), Molla and Licker (2001), Delone and Mclean (2003), Sutcliffe(2002), Chen and Macredie (2005), De Marsico and Levialdi (2004), Hung and McQueen(2004)

Behaviour & Information Technology 731

Page 26: Evaluating E-commerce Websites 2012

Appendix 3. Heuristics guidelines and their explanation

Heuristic Explanation

Architecture and navigationConsistency Page layout or style is consistent throughout the website, e.g. justification of text, font

types, font sizes, position of the navigation menu in each page. Colours are consistentand provide consistent look and feel for navigation and information design, e.g. fontcolours, background colours, use of standard link colours (standard blue link colourshould be used for unvisited pages and purple or red colours for visited pages).Terminology/terms throughout the website are consistent. Content is consistentamong different languages interfaces

Navigation support Navigational links are obvious in each page so that users can explore and find their wayaround the site and navigate easily, e.g. index, or site map, or navigation bar or tableof contents

Internal search Internal search is effective, e.g. fast; accurate; provides useful, concise and clear resultswhich are easy for interpreting

Working links Links are discernible, working properly and not misleading so that the user knows whatto expect from the destination page, e.g. links are obvious, no broken links, linknames match page names

Resourceful links The site is informative and resourceful, e.g. it has links to external useful resourcesNo orphan pages The site has no dead-end pages, e.g. it is easy and obvious to go to the home page from

any sub-page of the site. Pages have a clear indication of their position within the siteLogical structure of site The structure of the site is simple and straightforward, related information is grouped

together, categorisation of products is helpful. Architecture is not too deep so that thenumber of clicks to reach goals is not too large

Simple navigation menu Navigation menu is simple and straightforward, the menu choices are ordered logicallyso it is easy to understand the website

ContentUp-to-date information The information is up-to-date, current, often updated, date of last update is displayed

and informs the user when new information is addedRelevant information The information is sufficient and relevant to user needs, e.g. content is concise and non-

repetitive, terminology/terms are clear and unambiguous, there are no ‘underconstruction’ pages

Accurate information The information is precise, e.g. product measurements, total price, services, etc.Grammatical accuracy Content is free from grammatical errors, e.g. no spelling errors, no grammar errors,

punctuation is accurateInformation about the company Basic facts about the company or company overview are displayed, e.g. year founded,

type of business, purpose of its website, etc.Information about the products Adequate information about the products is displayed, e.g. description, photographs,

availability, prices, etc.Accessibility and customer serviceEasy to find and access website The site is easily identifiable and accessible from search engines, the URL is domain

related, not complex, and easy to remember. Download time of the pages isappropriate

Contact us information Useful information to enable easy communication with the company is displayed, e.g.FAQ, contact us (e.g. name, physical address, telephone number, fax number, emaildetails), customer feedback form to submit customers’ comments

Help/customer service The help/customer service is easy to find, has a clear and distinct layout, searching forhelp/customer service is easy, navigating in help/customer service is easy, amount ofinformation is sufficient, concise and designed to answer the specific questions userswill have in a specific context.

Compatibility The site works with different browsers and on different monitor resolutionsForeign language and

currency supportThe site’s content is displayed in different languages and uses more than one currency

DesignAesthetic design The site is attractive and appealing so that it impresses the potential customerAppropriate use of images Quality of images is adequate, no broken images, images make a contribution to the

understanding and navigation of the site, alternative text is used for images, imagesize is relevant so that it has minimal effect on loading time

Appropriate choice of fontsand colours

Font types are appropriate and easy to read. Choice of colours for both fonts andbackground is appropriate, combination of background and font colours isappropriate

Appropriate page design Pages are uncluttered, headings are clear, page margins are sufficient, minimum or nolong pages with excessive white space that force scrolling; particularly on the homepage of the website, page title is appropriate, describing the company name and thecontents of each page

(continued)

732 L. Hasan et al.

Page 27: Evaluating E-commerce Websites 2012

Appendix 3. (Continued).

Heuristic Explanation

Purchasing processEasy order process Registration on site is easy, changing customer information is easy, logging on to the

site is easy, ordering process is easy, changing the contents of the shopping cart(adding, deleting or editing) is easy, obvious and accurate

Ordering information Complete information about ordering is displayed and can be accessed easily, e.g. howto order, payment options, cancelling an order, return and refund policy, terms andconditions

Delivery information Information about delivery and dispatch of an order is displayed, e.g. delivery times,delivery costs, delivery areas, delivery address options (the ability to deliver the orderto another address), delivery options, problems (e.g. non-delivery, late delivery,incorrect delivery address, etc.)

Order/delivery status provision Company informs the customer about order status, e.g. by sending confirmation emailto customer after placing an order, by sending dispatch confirmation email tocustomer when order is sent out, by using online order tracking system

Alternative methods of ordering/payment/delivery are available

Alternative methods of ordering (e.g. online, email, phone, fax, etc.), payment (e.g.Credit card (Visa, Visa Electron, Master Card, American Express, etc.), cash ondelivery, cheque by post, bank transfer, etc.), and delivery (standard, express, etc.) aresupported so that the user can select the method that suits him/her

Security and privacy The site uses secure socket layer or recognised secure payment methods, informationabout security guarantee and privacy policy is clearly displayed

Behaviour & Information Technology 733

Page 28: Evaluating E-commerce Websites 2012

Appendix 4. Usability problem themes and sub-themes together with their descriptions

Problem area Problem sub-area Description of the problem

Navigation Misleading links The destination page, which was opened by the link, was notexpected by users because the link name did not match thecontent of the destination page.

Links were not obvious Link was not situated in an obvious location on a page for it to berecognised by users.

Broken links The site had pages with broken links.Weak navigation support A page did not have a navigational menu or links to other pages

in a site.Orphan pages The site had dead-end pages that did not have any link.

Internal search Inaccurate results The results of the internal search were inaccurate.Limited options The internal search facility had limited options to search the site.Position was not obvious The internal search facility was not situated in an obvious

location identifiable by users.Architecture Architecture was not simple The architecture or organisation of a site was not simple nor

straightforward enough to find information or products.Order of menu items was not

logicalMenu items were not ordered in a logical way. For example,

the home page item was not situated at the top of the menuitems.

Categorisation of menu itemswas not logical

Menu items were not categorised in a logical way. For example,three different menu items opened the same page.

Content Irrelevant content The content of a page was not clear to users because the pagedisplayed an unclear message or had repetitive content or hadempty content (i.e. the page was under construction).

Inaccurate information The site displayed inaccurate information. For example, itdisplayed out of stock products or gave an inaccuratedescription for some products.

Grammatical accuracyproblems

The site’s content was not free from errors. For example, it hadspelling errors, grammatical errors, or its punctuation wasinaccurate.

Missing information about thecompany

Basic facts about the company were not displayed. For example,year founded, type of business, purpose of its website, etc.

Missing information about theproducts

Adequate information about the products was not displayed, suchas: availability/stock indication, fabric, representative (large)images, length and width of some products, size guide.

Design Misleading image An image did not function as users expected. For example, it didnot have a link when it suggested to users that it had one.

Inappropriate page design A page did not clearly represent its content or it had aninappropriate design such as being long and/or displaying largenumbers of images, or was cluttered, or had inappropriateheadings.

Unaesthetic design The site did not have an aesthetically pleasing nor attractiveinterface.

Inappropriate quality of image The site had images of poor quality in which the text that wasdisplayed was not clear/readable.

Missing alternative text The site did not use the alternative text for its images.Broken images The site had some broken images on some pages (i.e. images were

not displayed).Inappropriate choice of fonts

and coloursThe site used an inappropriate font size (i.e. small size) or

inappropriate font style (i.e. bold font style for many sentenceson the same page) or inappropriate combination ofbackground and link colours.

Inappropriate page titles The site’s pages had inappropriate page titles that did not describethe content of pages and that did not include the companyname.

Purchasing process Difficulty in knowing what wasrequired for some fields

The site had pages with some entry fields where the requiredinformation to be entered was not clear to users.

Difficulty in distinguishingbetween required and non-required fields

The site had pages with some entry fields where there was no cleardistinction between required and non-required fields.

Difficulty in knowing whatlinks needed to be clicked

The site had pages with information that could be updated. Linkshad to be clicked in order to confirm this update but the linksdid not reveal that users had to click them to update theinformation.

(continued)

734 L. Hasan et al.

Page 29: Evaluating E-commerce Websites 2012

Appendix 4. (Continued).

Problem area Problem sub-area Description of the problem

Long ordering process Ordering process pages included more than one page with similarcontent which increased the number of steps required topurchase from a site.

Session problem The site had a session problem in which it did not save users’information, so users had to enter their information for eachtransaction during the same session.

Not easy to log on to the site The site required users to log on using their account numberinstead of their password which was not easy to remember.

No confirmation was requiredif users deleted an item fromtheir shopping cart

The site did not display a warning message to users beforedeleting an item from their cart.

Long registration page The registration page had a large number of required fields to befilled by users.

Compulsory registration The site requires users to register to the site to proceed in thecheckout process

Required fields were not logical The site had pages with some entry fields where the required fieldswere not logical.

Expected information was notdisplayed after addingproducts to cart

The site did not display expected information (i.e. confirmation)after users had added products to their cart.

Security and privacy Not reflecting confidence insecurity and privacy

The site did not display either a security guarantee or a privacystatement policy.

Accessibility andcustomer service

Not easy to find help/customersupport information

The site did not display the help/customer service information inan obvious location to be noticed and accessed by users.

Not supporting more than onelanguage

The site did not display its content in languages other thanEnglish.

Not supporting more than onecurrency

The site did not display the prices of its products or otherexpenses in currencies other than dollars ($).

Inappropriate informationprovided within a helpsection/customer service

Some pages that displayed help/customer information hadinappropriate content that did not match users’ needs orexpectations.

Not supporting the sending ofcomments from customers

The site did not have a feedback form to facilitate sendingcomments from users.

Not easy to find and access thesite from search engines

The site was not found in the first 10 pages of the search engine’s(Google) results.

Inconsistency Inconsistent page layout orstyle/colours/terminology/content

The site’s design, layout or content was inconsistent throughoutthe site. For example, the content on Arabic and Englishinterfaces was inconsistent.

Missing capabilities Missing functions/information The site did not have some functions (i.e. an internal searchfacility) or it did not display adequate information.

Behaviour & Information Technology 735

Page 30: Evaluating E-commerce Websites 2012

Appendix 5. Distribution of specific usability problems identified by user testing and heuristic evaluation methods by

the number of problems and severity level

User testing Heuristic evaluation Common problems

Agreed severity

Minorproblems

Majorproblems

Minorproblems

Majorproblems

Minorproblems

Majorproblems

Not agreedseverity

Navigation problemsMisleading link 0 5 14 0 1 0 2Link was not obvious 1 2 13 1 0 2 6Weak navigation support 1 1 0 5 1 1 0Broken link 0 0 3 0 3 0 0Orphan pages 0 0 7 0 1 0 0

Internal search problemsInaccurate results 0 0 1 0 0 2 0Limited options 0 0 0 0 2 0 0Not obvious position 0 0 1 0 0 0 0

Architecture problemsArchitecture was complicated 0 0 0 0 0 1 0Illogical order of menu items 0 0 2 0 0 0 0Illogical categorisation of menu items 0 0 1 0 0 0 0

Content problemsIrrelevant content 0 0 16 1 0 1 2Inaccurate information 0 0 0 1 0 0 2Grammatical accuracy problems 0 0 2 0 0 0 0Missing information about the products 0 0 10 0 0 0 3Missing information about the company 0 0 2 0 0 0 0

Design problemsMisleading images 0 0 5 0 1 0 1Inappropriate page design 0 2 8 0 0 1 2Unaesthetic design 0 0 3 0 0 0 0Inappropriate quality of image 0 0 1 0 0 0 0Missing alternative text 0 0 4 0 0 0 0Broken images 0 0 10 0 0 0 0Inappropriate choice of fonts and colours 0 0 4 0 1 0 0Inappropriate page titles 0 0 3 0 0 0 0

Purchasing process problemsDifficulty in knowing what was required

for some fields1 0 0 0 0 0 1

Difficulty in distinguishing betweenrequired and non-required fields

0 3 0 0 0 0 0

Difficulty in knowing what links neededto be clicked

0 3 0 0 0 0 0

Long ordering process 0 0 0 0 0 0 1Session problem 0 0 0 0 0 1 0Not easy to log on to the site 0 0 0 1 0 0 0No confirmation was required if users

deleted an item from their shoppingcart

0 0 0 3 0 0 0

Long registration page 0 0 0 1 0 0 0Compulsory registration 0 0 0 2 0 0 0Illogical required fields 0 0 0 0 0 0 2Expected information was not displayed

after adding products to cart1 1 0 0 0 0 0

Security and privacy problemsLack of confidence in security and

privacy0 0 0 1 0 0 0

Accessibility and customer service problemsNot easy to find help/customer support

information0 3 1 0 0 0 0

Not supporting more than one language 0 0 0 0 2 0 0Not supporting more than one currency 0 0 2 0 0 0 0Inappropriate information provided

within a help section/customer service1 0 1 0 0 0 0

(continued)

736 L. Hasan et al.

Page 31: Evaluating E-commerce Websites 2012

Appendix 5. (Continued).

User testing Heuristic evaluation Common problems

Agreed severity

Minorproblems

Majorproblems

Minorproblems

Majorproblems

Minorproblems

Majorproblems

Not agreedseverity

Not supporting the sending of commentsfrom customers

0 0 2 0 0 0 0

Not easy to find and access the site fromsearch engines

0 0 2 0 0 0 0

Inconsistency problemsInconsistent page layout or style/colours/

terminology/content0 0 21 0 1 0 0

Missing CapabilitiesMissing functions/information 0 0 19 0 1 0 0

Behaviour & Information Technology 737

Page 32: Evaluating E-commerce Websites 2012

Copyright of Behaviour & Information Technology is the property of Taylor & Francis Ltd and its content may

not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written

permission. However, users may print, download, or email articles for individual use.