‘Televaluation’ of clinical information systems: an ...people.dbmi.columbia.edu/cimino/Publications/2001... · ‘Televaluation’ of clinical information systems: an integrative

International Journal of Medical Informatics 61 (2001) 45–70

‘Televaluation’ of clinical information systems: anintegrative approach to assessing Web-based systems

Andre W. Kushniruk a,b,*, Camille Patel b, Vimla L. Patel b, James J. Cimino c

a Department of Mathematics and Statistics, York Uni6ersity, 4700 Keele Street, Toronto, Ontario, Canada M3J 1P3b Cogniti6e Studies in Medicine: Centre for Medical Education, McGill Uni6ersity, Montreal, Quebec, Canada

c Department of Medical Informatics, Columbia Uni6ersity, New York, USA

Received 7 July 2000; received in revised form 1 November 2000; accepted 4 December 2000

Abstract

The World Wide Web provides an unprecedented opportunity for widespread access to health-care applications byboth patients and providers. The development of new methods for assessing the effectiveness and usability of thesesystems is becoming a critical issue. This paper describes the distance evaluation (i.e. ‘televaluation’) of emergingWeb-based information technologies. In health informatics evaluation, there is a need for application of new ideasand methods from the fields of cognitive science and usability engineering. A framework is presented for conductingevaluations of health-care information technologies that integrates a number of methods, ranging from deploymentof on-line questionnaires (and Web-based forms) to remote video-based usability testing of user interactions withclinical information systems. Examples illustrating application of these techniques are presented for the assessment ofa patient clinical information system (PatCIS), as well as an evaluation of use of Web-based clinical guidelines. Issuesin designing, prototyping and iteratively refining evaluation components are discussed, along with description of a‘virtual’ usability laboratory. © 2001 Elsevier Science Ireland Ltd. All rights reserved.

Keywords: Cognitive evaluation; Patient clinical information systems; Technology assessment; Telemedicine; World Wide Web

www.elsevier.com/locate/ijmedinf

1. Introduction

With the increasingly widespread use of theWorld Wide Web for health applications, thedevelopment of effective methods for evaluat-ing their effectiveness and usability is becom-

ing critical. Designing evaluations in healthinformatics can be difficult but becomes evenmore challenging when it involves evaluationfrom a distance. Given the increasingly dis-tributed nature of current systems, this mayinclude assessment of information resourcesaccessed from various locations ranging fromthe clinic to the home. In this paper, wedescribe a framework for designing evalua-tions directed at assessing use of Web-based

* Corresponding author. Tel.: +1-416-7362100 ext. 33980;fax: +1-416-7365757.

E-mail address: [email protected] (A.W. Kush-niruk).

1386-5056/01/$ - see front matter © 2001 Elsevier Science Ireland Ltd. All rights reserved.

PII: S1 386 -5056 (00 )00133 -7

A.W. Kushniruk et al. / International Journal of Medical Informatics 61 (2001) 45–7046

information systems by both patients andproviders. Illustrations will be provided in thecontext of the assessment of several systems,including PatCIS, a patient clinical informa-tion system (developed at Columbia Univer-sity) as well as an assessment of the use ofWeb-based clinical guidelines by physicians.

The effects of changes being brought aboutby emerging technology, including Internet-based information resources for patients,must be considered in relation to patientunderstanding and provider therapeuticgoals. In order to develop individualized,context-sensitive information systems thatwill end up being useful for physicians andpatients, we must be able to evaluate howthat information is understood, who is tryingto understand it, and what problems occur inits comprehension and application. We havebeen involved in cognitive studies of reason-ing and comprehension of medical informa-tion by both lay people and health-careprofessionals. Currently, this work has beenextended to evaluation of the use of Web-based information resources to assess theirusability and impact on health care. We areapplying approaches to evaluation that bor-row from advances in a number of areas,including cognitive science and the emergingfield of usability engineering [1]. The objec-tive is to develop a coherent framework forunderstanding the effects of advances inhealth-care information technology, in partic-ular telemedicine.

The evaluation of Internet-based technolo-gies, especially those designed for use by awide range of users (i.e. both health profes-sionals and lay people), raises a number ofissues that go beyond conventional evalua-tions typically undertaken in medical infor-matics. These include: (1) determining theextent to which the end user’s view of thesystem differs from that of the designers andhow potential ‘mismatches’ influence system

effectiveness, especially as the range of usersbecomes increasingly varied; (2) going be-yond assessment of user satisfaction to con-sideration of how the user’s interaction withthe system changes over time and how use ofsuch technology changes the interaction be-tween patients and health-care professionals;(3) characterizing the impact of systems onreasoning and thought processes as a resultof continued use; and (4) identifying technicaland methodological issues for performingevaluations remotely.

A number of studies have investigated theuse of Internet-based information systems byhealth-care providers and described develop-ment of evaluation instruments (typicallyquestionnaires) for assessing their use of in-formation technology [2,3]. However, an in-depth understanding of the effects of use ofthe WWW in providing providers and pa-tients with access to their medical data neces-sitates new approaches to evaluation. Ashealth care rapidly moves towards wide-spread distribution of medical informationvia the WWW, such an understanding willbecome increasingly important. Over the pastdecade, we have worked on developing anintegrated set of methods for evaluating theuse and impact of information systems inhealth care [4]. This has included develop-ment of techniques for laboratory study ofend users (in particular, video-based usabilitytesting) that we are extending to the natural-istic study of technology use in settings suchas clinics and doctor–patient interviews. Ourwork in the evaluation of systems has beenmotivated by a number of considerations in-cluding the observation that the majority ofevaluations in medical informatics have fo-cused around measurement of outcomes (e.g.the effect of the use of a computer system onnumber of pre-specified outcome measures)with far less work being conducted on exam-ining the effects of systems on the critical

A.W. Kushniruk et al. / International Journal of Medical Informatics 61 (2001) 45–70 47

processes involved in health-care decision-making and reasoning. In addition, conven-tional assessment methods (such as exclusiveuse of questionnaires, or feedback forms forWeb-based applications) have a number oflimitations, including the reliance on users’recall of their previous experiences in usingsystems. It has been our experience that newapproaches to evaluation are required to un-derstand more fully the impact of the WWWas a new medium for access to health-careinformation. To address these issues, we haveconducted a number of system evaluations inhealth care based on principles of cognitivetask analysis.

Cognitive task analysis aims to character-ize the decision-making of subjects of varyinglevels of expertise as they perform tasks in-volving information processing, for example,a physician entering a diagnosis into a com-puterized patient record system, or a patientinterpreting information provided by ahealth-related Web site. In our laboratory-based studies, we have video- and audio-taped users’ interactions with a variety ofcomputer systems. The approach typically in-volves having subjects (e.g. patients or physi-cians) ‘think aloud’ as they interact with thesystem under study. The studies typically in-volve video recording of all user interactions(including all computer screens) and codingof the resulting video- and audio recordingsusing methods that we have adapted fromboth the study of medical reasoning and thestudy of human–computer interaction [4].Based on this approach, we have been able tocharacterize the effects of use of health infor-mation systems on physicians’ decision-mak-ing and interaction with patients. Forexample, we have found that small changes inthe structure of information presented in in-formation systems can strongly guide andalter physicians’ basic reasoning and decisionmaking strategies, as well as their requests for

information when interviewing patients [5].Related work we have conducted has in-volved a study of lay reasoning of patients asthey interact with information technologiesover the WWW. This has included a combi-nation of methods, including video record-ings of user–system interactions andfollow-up interviews with end users.

In this paper, we describe our recent exten-sion of this line of work to the developmentof assessment methods for evaluating health-care systems at a distance, i.e. ‘televaluation’of emerging health-care information tech-nologies. It is argued that as the technologywe assess changes (invariably becoming moreadvanced), so must our methods for trackinguse of these systems in order to assess theirimpact. In addition, we feel that it is criticalto evaluate technology in the user’s real-lifesetting, in the context of other competingcognitive tasks and processes, in order toassess its full impact on decision-making andreasoning. The overall approach described inthis paper attempts to integrate the advan-tages of a number of techniques, includingstrategic scripting and presentation of on-linequestionnaires and Web-based forms to as-sess use of systems as close to the point ofcare as possible. In addition, we have ex-tended our video-based usability testing andrecording of end users to application acrossthe WWW, with the objective of movingusability engineering and assessment from thelaboratory to the real-world.

2. Evaluation objectives

In this paper, we describe work on thedevelopment and integration of evaluationinstruments specifically designed for assessinginteraction with Web-based information sys-tems in health care. We will also describe aworking model for conducting ‘televalua-


tions’ from a remote site. As in conductingother types of evaluations in medical infor-matics [6], the first step involves determiningthe objectives of the evaluation. Much of ourcurrent work focuses around evaluation ofeffects of Web-based clinical information sys-tems designed to provide access to medicaldata and advice (based on guidelines) to bothpatients and providers. Specific questions weseek to answer regarding our investigation ofthese types of Web-based clinical informationsystems include the following:� How do providers’ and patients’ percep-

tions of use of technology change overtime as they begin to interact and useWeb-based information systems?

� How does previous experience in usingcomputers and expectations about usinginformation technology affect actual useof such systems over time?

� Is the content and functionality providedby the system of value to users for theirdecision-making and provision of im-proved health care?

� What problems do providers and patientshave in comprehending, understandingand applying information provided on-lineover the WWW?

� How does the use of such technology af-fect the patients’ interaction with theirhealth-care providers?

� How does the use of information systemsaffect what patients do regarding dailymanagement of their condition, in terms ofreasoning, decisions and actions?

� What are the current limitations of Web-based information systems in providinguser-specific information and how canthey be improved from the point of viewof usability (i.e. their efficiency, effective-ness and enjoyability)?To address these questions, we have been

involved in refining a number of evaluationinstruments for obtaining baseline user profi-

les and then specifically addressing the issuesraised above. This has also involved workingwith system design teams in refining systemcontent, based on feedback from our evalua-tions in an iterative design cycle (driven byour formative evaluations). For example, wehave iteratively modified the user interface ofpatient record systems (during the systemdesign phase) based on the results of usabilitytesting involving analysis of real user–systeminteractions. We have recently extended thisto development of on-line questionnaires andforms scripted to appear at the point of usein order to capture context-sensitive informa-tion about the use of systems, which can beused in refining and improving these systems.The complexity of designing evaluations thataddress the above issues requires an inte-grated approach to both data collection andanalysis. By emphasizing a methodologicalapproach that continually takes advantage ofthe latest technological advances and a the-ory-based approach from cognitive science,evaluations can be designed that go beyondcollection of conventional usage statistics andquestionnaires.

Cognitive science provides theoreticalframeworks from which our methods in thestudy in medicine have been drawn [7]. Cog-nitive science is an interdisciplinary endeavorthat builds on several areas of research, in-cluding cognitive psychology, computer sci-ence, linguistics, anthropology and artificialintelligence. Our initial work in applying cog-nitive science to the study of medicine fo-cussed on characterizing the nature ofexpertise in medical reasoning and decision-making and has included an examination oftext comprehension and the representation ofknowledge by health-care workers of varyinglevels of expertise [8]. We have extended thiswork to the study of reasoning by patients [9]as well as extending the methodological ap-proach taken to examination of the effects of


use of information technologies on health-care decision making and reasoning. Methodsthat we have borrowed from cognitive psy-chology and incorporate in much of ourwork include a collection of ‘think aloud’protocols from subjects as they interact withsystems to solve complex tasks. We haveemployed a number of methods for analyzingthis type of data (consisting of both audioand video recordings) using principled ap-proaches to the analysis of process data fromcognitive psychology and also the study ofhuman discourse [10]. In borrowing fromcognitive science, we emphasize a study ofthe processes involved in using systems, tocomplement more traditional outcome basedmeasures and results from questionnairebased data. Such an integrated approach willbe needed to address the critical issues inevaluation raised above.

3. Methodological components

The evaluations that we have developedand are currently employing involve severalmethods of on-line data collection (e.g.recording of patients’ usage of Web-basedsystems accessed from home), telephone in-terviews with users and in-depth video analy-sis of subjects using the system under study[6]. We feel that a multi-method approach isoften needed in order to adequately addressthe multiple objectives typical of our evalua-tions. From our prior evaluation experiences,we have found that individual methods alonecan provide valuable information (e.g. log-ging of user interactions, questionnaires, orinterviews, etc.), but in order to gain anin-depth understanding underlying the use ofa system, more than one complementarymethod may be required [11]. Fig. 1 depictsan overview of the evaluation methods thatwe are using, and these are described below.

3.1. On-line questionnaires in the study ofthe use of Web-based systems

For assessing the usability of Web-basedsystems, deployment of on-line question-naires has been one of the most widely usedapproaches in evaluation of system use.However, questionnaires have a number oflimitations, including their reliance on users’recall of their interaction with a system whenpresented with questionnaires after a dura-tion of time, and the low usage of ‘voluntary’feedback forms, designed to provide Web-sitedevelopers with suggestions and comments.In this section, we describe the work that wehave done in developing and extending appli-cation of on-line questionnaires and on-lineWeb-based forms to overcome some of theselimitations for two objectives: (1) to collectbaseline information about usage of systems,and then to present follow-up questionnaires,timed to appear over the Web over at keypoints during the period of an evaluation(e.g. after a specified number of logins, orinvocations of a system function by a user);and (2) to design and deploy short forms/questionnaires to collect very specific usageinformation (timed to appear as close to thetime of use of a system as possible, e.g. at

Fig. 1. ‘Televaluation’ of Web-based information sys-tems, depicting evaluation methods employed for dis-tance evaluation of Web-based information systems.


invocation of on-line guidelines at the pointof care).

As an example, for our study of the use ofPatCIS (a patient information system avail-able over the WWW that allows patients toview and enter their own patient data overthe Web [12]), several on-line questionnaireswere developed using HTML (the standardhypertext markup language for developingWeb applications). PatCIS was designed tohandle presentation of questionnaires to sub-jects at login by invoking a function thatdetermines if the user (i.e. the patient) hasany outstanding questionnaires to fill out(which are presented to the user after loginbut prior to entering PatCIS, as described inRef. [12]). The baseline questionnaire scales(presented to all users on their initial login)focus on assessing several major aspects ofpatient care and patient interaction with clin-ical information systems (see Fig. 2, for ex-amples of the scales described below),including the following:� User Demographics: Items regarding de-

mographics of users of the system understudy include standard questions abouttheir age, sex, and educational back-ground. This questionnaire is typicallyfilled out at the beginning of the evalua-tion (upon first login by the user to thesystem being evaluated) to obtain basicdemographic information.

� Relationship with Health Care Providers:This questionnaire contains items to assesspatients’ interaction with health-care pro-fessionals, including who they interactwith and how often. Subsequent to aninitial baseline presentation of this ques-tionnaire, patients beginning to use an in-formation system may also be askedwhether they perceive their interactionwith health-care professionals as changing(using a five-point scale ranging from ‘nochange’ to ‘considerable change’).

� Expectations About System Use: The ob-jective of this questionnaire is to assessusers’ expectations about the computersystem under study, prior to actually inter-acting with the system. For example, thistype of questionnaire was presented to pa-tient users at the beginning of our study ofPatCIS to assess the extent to which pa-tients believed that use of the systemwould affect how they manage their illness(on a five-point scale, ranging from ‘con-siderable change’ to ‘no change’), as wellas how willing they were to use the system.A modified version of the questionnairewas presented to subjects later in the studyto determine how subjects’ expectationscompare to their perceived experience. Inaddition, we are investigating to what ex-tent the results of this questionnaire canpredict (along with the results of the ques-tionnaire on computer literacy, describedbelow) how subjects may fare in interact-ing with innovative health-caretechnology.

� Prior Computer Experience: The accurateassessment of prior computer experience isan important aspect in understandingusers’ interactions with innovative infor-mation systems. Questions regarding prioruse of computers are typically given at thebeginning of our evaluations to determinethe subject’s level of computer literacy.For example, in our study of PatCIS,analyses were conducted to see how pa-tients’ level of prior computer experiencepredicted the use and problems encoun-tered in learning and using patient clinicalinformation systems. Scales were designedto assess how often the patient uses acomputer, when he/she first started using acomputer, and what type of computer sys-tems (e.g. IBM-compatible systems run-ning Windows, or Mac) and programs(e.g. word processing, email, etc.) are typi-


Fig. 2. Excerpt from background questionnaire to assess users’ computer experience and demographics.

cally used. The questionnaire also containsan item asking patients to indicate howcomputer literate they perceive themselveswith regard to computer use (on a five-point scale). This questionnaire is filled outonce at the beginning of the evaluation, to

obtain baseline data on patients’ computerexperience and literacy (a portion of thisquestionnaire is given in Fig. 2).In addition to providing questionnaires

over the Web to obtain baseline informationabout users at the start of a study, we have


also deployed questionnaires to users at dif-ferent key points throughout the course ofstudy of their interactions with health-caresystems. For example, in our evaluation ofPatCIS, approximately 9 months from thestart of the study, we emailed all subjectswho had signed up to participate a question-naire (Fig. 3 illustrates several of the scales inthis questionnaire) regarding their use of thesystem. Email was employed in this study assome of the subjects were no longer regularusers of the system, but all could be reachedby email. As will be described, the resultsfrom this type of questionnaire can provideinsight into explaining the pattern of usagethat may be obtained from the logging ofuser interactions. The questionnaire items in-clude the following:� Use of System Functions: Subjects are

asked to indicate which functions theyused and which they had not (along with atext entry box for them to indicate anycomments). In addition, subjects wereasked to indicate how often they used thesystem and from what locations.

� Usability Problems: The intent of thisquestionnaire is to obtain informationabout problems users may be encounteringin interacting with an information system(i.e. problems in performing tasks that thesystem is designed to accomplish and gen-eral usability issues). The questions arebased on standard usability scales for as-sessing user interfaces [13] and also con-tain text-entry boxes where subjects canenter their responses to the following ques-tions: What features do you like? Whatfeatures don’t you like? What new featureswould you like? Were the reminders, alertsand guidelines provided helpful? Whatother types of on-line resources do youthink would be useful? Users are also ableto enter text-entry boxes to elaborate onany of the features and issues raised by the

questions. Our prior work has indicatedthat questionnaire data on usability alonemay not provide sufficient information fordetermining how specifically to improvesystem and interface design [6]. Therefore,the results of this type of questionnaireshould ideally also be complemented byother assessment instruments described be-low. In our current studies, we typicallyemail the questionnaire to all participantsin a study, including those who do not endup adopting use of a system, in whichcase, it is important to obtain informationabout why long-term use of a system is notadopted.In our most recent work, we are using a

variation of on-line questionnaires by pre-senting users with short Web-based formsthat are triggered to appear at the time ofspecific interactions with systems. The objec-tive is to assess the usage of system featuresand to assess how successful information sys-tems are in meeting user needs in the contextof specific uses of the system. For example, aswill be described in detail in Section 4 onexamples of our evaluations, we are currentlycollecting data on physicians’ use of on-lineclinical guidelines at the point of care. As abaseline, we are collecting usage statistics (us-ing a tracking system we have developed thatis described below, known as CHECK-POINT) on which guidelines are accessedand how often. While this is invaluable infor-mation, it does not tell us if the physiciansare finding the guidelines useful, or whethersuch guidelines are applicable to their partic-ular clinical situation. To address this, wehave also developed brief questionnaires (i.e.deployed as Web-based forms) that appear atthe time a user (e.g. a physician or patient)invokes a particular feature of a system overthe Web (these forms are scripted to appearusing a Perl program that triggers their invo-cation). As an example (described in detail


below), we wished to determine in whichsituational contexts physicians access clinical

guidelines from a particular site, i.e. guideli-nes from the American College of Physi-

Fig. 3. Excerpt from questionnaire about usability of PatCIS (Patient Clinical Information System).


cians–American Society for InternalMedicine (ACP–ASM). When physicians se-lect ACP–ASM guidelines to view over theWeb, a brief questionnaire immediately ap-pears prompting them to select (from amenu) their location and context of systemuse (e.g. currently with a patient or not).After submitting the form, they are then redi-rected to the guideline. As will be describedlater, we are currently using this approach tocollect detailed information about particularuser interactions (correlated with other infor-mation such as records of browsing) thatwould be otherwise impossible to obtain orinfer from usage logs.

3.2. On-line commenting facility

In our evaluation of systems such as Pat-CIS, end users (e.g. patients or physicians)are able to email any comments or concernsto the evaluators, and email is used by theevaluators to send messages to any patientsubjects. Typically, subjects, at any timewhile using the system, can evoke a ‘mail-to’function to directly send comments to theevaluators, as they occur to the subjects whileusing the system. Although subjects may beencouraged to send this type of comment, itsuse is typically infrequent (as is the filling outof voluntary ‘feedback forms’, which areavailable at most web sites). Thus, users alsoneed to be probed for their comments (usingquestionnaires and interviews) as well as hav-ing their interactions tracked, as describedbelow.

3.3. Tracking system use

In our studies of end-user interactions withhealth-care systems, we typically create logfiles capturing information about usage ofsystem features that are automaticallyrecorded for all subjects’ interactions with the

system under study. The information includesa record of functions accessed by the patient,buttons pressed, and time spent in each func-tion. Previous evaluation of Web-based clini-cal information systems (designed for use byphysicians) has shown that for the purposesof obtaining feedback for system improve-ment, such an analysis of log files can providea rich source of data, particularly in deter-mining which functions of a system may ormay not be getting accessed by end users, asdescribed by Cimino and Socratous [11]. Inaddition to constituting valuable data on howthe system is being used, the log files are alsoused by the system for automatic selectionand presentation of function-specific ques-tionnaires and forms. For example, subjectswhose log indicates that they are frequentusers of a particular PatCIS function, such asits educational component, will be presentedwith on-line questionnaires for assessing useof that function in more detail.

In our recent work at McGill University,we have developed a system known asCHECKPOINT, which allows us to trackusers interactions with systems and Web sitesthat are located at remote locations. Forexample, using CHECKPOINT in Montreal,we are currently tracking use of on-lineguidelines (located on a server in Philadel-phia), which are being accessed by physiciansin New York City. To accomplish this, users’selection of web pages located at the remote(target site) are redirected to our server inMontreal, so that we can obtain a completelog of the users’ interactions with the site inPhiladelphia (i.e. a record of pages accessedand length of browsing). CHECKPOINTacts as what is known as a gateway, whichmeans that it works as an intermediary be-tween the Web browser and the Web server.If every link that is followed by a user firstgoes through CHECKPOINT, it can keep a


Fig. 4. Framework for tracking Web browsing remotely. Numbers designate the order of the processing steps intracking a user’s browsing.

log file of every site visited as well as a dateand timestamp for each.

For example, as shown in Fig. 4, to startCHECKPOINT, an initial link must point toit. For example, as shown in the figure, whenthe user accesses a page with the address:http://www.somewhere.com/, CHECK-POINT is called using the syntax: http://www.checkpoint.com/checkpoint.cgi?url =www.somewhere.com (steps 1 and 2 in Fig. 4)where www.checkpoint.com is the location ofCHECKPOINT. At this point, CHECK-POINT proceeds to request the page the userwants from www.somewhere.com (step 3 inFig. 4). It then takes the HTML source codefrom this page and locates all the hyperlinkswithin the page. Once located, it changes allthe links to the above form (i.e. redirected

through CHECKPOINT) to ensure that thenext link followed also passes throughCHECKPOINT. At this point, the user’sWeb browser is still waiting for the page tobe returned and CHECKPOINT returns themodified HTML source code ofwww.somewhere.com (steps 5 and 6 in Fig.4), which should be indistinguishable fromthe actual page, except that the addresses ofthe links are different.

CHECKPOINT, which is written in Perl,has been designed to be extensible, and weare currently using it in conjunction withother methods, described below, includingpresentation and scripting of on-line forms(also scripted in Perl), which appear at thepoint of use (e.g. at the time physicians actu-ally choose to access guidelines over the


Web). Our use of a remote tracking systemcan be distinguished from other tracking sys-tems (e.g. Lamprey, developed at Stanford[14]) by its integration and merging of outputlogs (from the tracking system) with resultsfrom other data collection methods in orderto obtain a more complete and integratedpicture of user interactions, which goes be-yond recording of only user mouse clicks andpages browsed. For example, the logs fromtracking users interacting with on-line infor-mation resources can be timestamped andwritten to a file interleaved with the question-naire results from the presentation of Web-based forms at point of use (designed toprobe the user about their interactions withthe resource). Thus, a more complete recordof the context of a user’s comments can beobtained in terms of the interactions andbrowsing pattern that took place around thetime the user fills out a Web-based form. Wehave found that the server that runsCHECKPOINT can allow for a large num-ber of concurrent users and that the system isreliable (with little down-time).

3.4. Telephone inter6iews

In a number of our studies we have alsoconducted periodic telephone interviews withsubjects regarding their use of the system,usability issues, cognitive and lifestyle issuesand suggestions for improvements or refine-ments of the system. The approach taken isthat of a semi-structured interview (with in-terview probes being designed prior to phon-ing end users), which may be conducted atseveral points in the evaluation study. Wetypically audio record the interviews (whichlast from 15 to 30 min) and transcribe themfor analysis, which is based on methods de-scribed by Patel and colleagues, for a princi-pled analysis of discourse from telephoneinterviews [15]. For example, in our study of

PatCIS, we asked patient users of the systemto elaborate on their experience with the sys-tem, including discussing if they had encoun-tered problems using the system or hadsuggestions for improvements. During the in-terviews, the subjects were encouraged toelaborate on their thoughts regarding theirexperiences with the system and the inter-views were transcribed and coded to obtaindetails from the users’ perspective on prob-lems or concerns. These interviews have beenextremely useful in shedding light on usagepatterns that were obtained from pure track-ing of user interactions, as will be describedin a subsequent section.

3.5. Video-based usability testing

For a subset of subjects who volunteer tobe recorded and are representative of typicalend users of the system, we also typicallyemploy in-depth cognitive usability testingmethods to assess user interactions with sys-tems (taking place in a ‘laboratory’ settingwhere patients are asked to interact with asystem). This involves asking subjects to in-teract with the system under study (e.g. Pat-CIS) to perform typical tasks (e.g. data entryor review, access of information). Subjectsare instructed to ‘think aloud’ while doing so,and complete video recordings are made ofcomputer screens, as well as audio recordingof their verbalizations. This allows re-searchers to obtain a complete record of theprocess of the use of the computer systemunder study. In our original work, therecordings were made using a PC–Video con-verter, with the video input to the VCRfeeding in from the converter (i.e. the com-puter screens) and the audio input to theVCR feeding in from a microphone for pick-ing up the subjects’ verbalizations and com-ments as they interact with the system understudy [6].


Using computer-aided methods for videoanalysis that we have developed and refined,problems encountered by patients while theyinteract with the system can be identifiedfrom such data [6] [10]. The approach takento analyzing the video data involves the fol-lowing: (1) transcribing all audio recordedverbalizations into a word-processing format;(2) linking the transcriptions in the word-pro-cessing file to the corresponding video se-quences using a video annotation softwarepackage (and cable to connect the computerto the VCR) known as CVideo, which allowstranscripts on a computer (opened in a word-processing file) to be linked to the corre-sponding frames on the video of theinteraction; and (3) coding the correspondingsections of the transcriptions by ‘timestamp-ing’ events on the video tape (e.g. the useropening of a window, or selecting from amenu, or a specific user problem, such as aninability to enter data in a field) with thecorresponding transcriptions of subjects’ ver-balizations. The annotated transcripts can beused to characterize user interactions in de-tail, including tabulation of the type andfrequency of user problems coded for, andidentified from, watching the video recordingof the user’s interactions [16].

We have recently extended our video-basedusability testing from the laboratory settingto application over the WWW from real set-tings. This has involved remotely videorecording users interacting with medical ap-plications at distant sites, through use ofNetMeeting, a collaborative Web tool thatallows for remote login and meeting capabili-ties. By logging on to user applications atremote locations using NetMeeting, at ourevaluation site in Montreal, we can observethe computer screens of subjects interactingwith information systems at a distance (thebandwidth requirements are low, since we donot employ all features of NetMeeting —

only the capability of recording screens re-motely). By concurrently converting the com-puter screens to a movie format, usingsoftware such as Lotus ScreenCam, we areable to obtain a complete on-line recordingof the users’ interactions at the remote site inreal time. The corresponding audio record-ings of subjects’ verbalizations (e.g. subjects‘thinking aloud’ while interacting with a sys-tem, or discourse involved in collaborativeprojects) can be obtained using a microphoneplugged into the user’s computer (or alter-atively by using a speaker phone to hear thesubject’s verbalizations and a microphone atthe evaluation site to merge the audio withthe video recording of the computer screens).Instructions may also be given to subjectsusing the phone system, or on an audio chan-nel using the computer, allowing completeusability testing to be conducted remotely.

In order to capture not only computerscreens at the remote site, but also behavioralactions (e.g. the end user shifting in his chairor visibly concentrating on the screen), adigital camera mounted on the computer atthe end user’s site can also be employed tocapture these actions over the Web. To illus-trate our use of these methods, an example isprovided in the next section of a usabilitystudy that we conducted to examine the pro-cesses involved in use of a computer-sup-ported tools for encoding clinical guidelinesinto a computer-based representation (therecording was conducted at McGill, whilesubjects interacted with the system at theHarvard site in Boston).

4. Examples of televaluation

In this section of the paper, we provideexamples of the methods and approaches de-scribed above. The first study involves thedistance evaluation of a patient clinical infor-


mation system, known as PatCIS. This sys-tem, which was developed at Columbia Uni-versity, provides patients with customizedviews of their own medical records via theWWW. PatCIS allows patients with chronicillnesses, such as diabetes and asthma, toenter their own health data and receive ad-vice about the management of their illnessfrom home [12]. The second example focuseson the evaluation of the use and developmentof computer-based clinical guidelines thatphysicians can access over the WWW. Bothexamples illustrate a number of the methodsand techniques described above in the previ-ous section on methodology.

4.1. Example 1: remote e6aluation of apatient clinical information system (PatCIS)

For our initial study of PatCIS, recruit-ment letters were mailed to over 200 physi-cians. Permission forms were returned by 11physicians, who suggested 11 patients thatmight be interested in participating over theyear. Letters were sent to these patients,telling them how to register via the WorldWide Web. Eight of these patients respondedand were enrolled. The interactions of thesephysicians were studied in depth over a pe-riod of a year, using several methods de-scribed below.

4.1.1. System usageIn order to assess how often and what

components of PatCIS were accessed by thepatient subjects, monthly logs of all patientinteractions were automatically generated, asdescribed in Section 3.3. The patients whoparticipated in using PatCIS logged on atotal of 243 times during the course of thestudy. The first patient session was in April1999; the pilot phase was continued throughthe end of February 2000 with a total of 31patient months of use.

Of the 243 total login attempts, 33 initiallyfailed due to incorrect passwords or SecurIDcodes. There were no attempts to login withan illegal ID, and all initially unsuccessfulattempts were immediately followed by a suc-cessful login. Fourteen sessions involved asuccessful login, but no subsequent activity.Over the period of the study, interactionswith the system remained relatively dis-tributed among the eight subjects.

For security reasons, a ‘Logout’ function isincluded to allow patients to use PatCIS inpublic places (such as a library) and end theirsession without closing the browser applica-tion. They used this function 122 times anddid not use it 74 times.

4.1.2. Function usageThe most frequently used function was the

review of laboratory data, which was done atleast once in 140 (71%) of the sessions. The‘Laboratory’ sub-button shows a list of pan-els (CBC, Chem7, etc.) and allows the user toselect a panel for detailed display. Users se-lected this function 270 times and examineddetails 340 times. The ‘Laboratory Detail’sub-button shows all the panel details as asingle list, rather than requiring the interme-diate panel list. This option was selected 69times. Selecting a specific test produces asummary of results for that test. Patientsused this function 129 times (114 from Labo-ratory and 15 from Laboratory Detail). ‘Re-ports’ was the next most often used function(40 times). Patients selected a variety reportheaders to obtain details, including radiology(24 times), cardiology (17 times), and pathol-ogy (10 times). ‘PFT’ (Pulmonary FunctionTest) was selected 22 times and microbiologyresults eight times.

The other functions were used more spar-ingly than the review of data function. Forexample, vital signs were entered 31 times,and diabetes information (blood sugars and/


or insulin doses) was entered 14 times. Thesedata were also rarely reviewed (26 and 18times, respectively). Educational functions,providing passive links to other Web sites,were used 25 times (Diabetes 11 times, Geri-atrics eight times, Home Medical Guide fourtimes and Aging twice). Advice functions,which use data taken from the patient’srecord as input to active guideline programs,were used 16 times (Cholesterol 15 times andMammography once).

4.1.3. On-line questionnaires and inter6iewsIn order to investigate further the issues

underlying these usage patterns and to assesspatients’ subjective experience in using Pat-

CIS, subjects received a questionnaire, whichwas emailed to them 9 months after theirinitial logins. The subjects were also asked torespond to questionnaire items (on a five-point scale) related to aspects of human–computer interaction, to assess the following:� Willingness to enter data into PatCIS� Willingness to review their own health

data over the WWW� Perceptions of their interactions with

health-care providers.Table 1 provides examples of several of the

questions used, along with responses fromthe first four patients who participated in thestudy. The subjects were also asked to re-spond to questionnaire items (on a five-point

Table 1Examples of questionnaire items and responses from four current PatCIS users (deployed 9 months after the user’sfirst login)

User 3User 2User 1Question User 4

Once a month orNeverSeveral timesOnce a month or lessHow often do you usePatCIS? a week less

For what purposes do you Review data Review data Review data Review datause PatCIS?

Education Education AdviceAdvice

Definitely agreeI find PatCIS useful Definitely agree Unsure UnsureAgreeUnsureAgreeI am willing to enter my Definitely agree

own data into my recordusing the WWW

Definitely agreeI am willing to review my Definitely agreeAgreeAgreeown health informationusing the WWW

Definitely agree UnsureAgreePatCIS has improved my Definitely agreeinteractions with healthprofessionals

Agree Definitely agreeDefinitely agreePatCIS has improved my Disagreeunderstanding of healthand illness

PatCIS has changed how Definitely agree Definitely agreeDisagreeUnsuremy health care ismanaged

‘Occasionally, it is veryHave you had problems in None ‘Sending e-mailfrom the site’difficult to access, typicallyusing PatCIS?

in evening’


scale) related to aspects of human–computer,to assess the following:� Ability to understand graphs and tables� Clarity of screen sequences� Usefulness of help and information

buttons� Learnability of the system� Usefulness of linkages to other sites� System reliability and speed.

Results to date indicate that all eight usersfound that information presented was pre-sented on the screen in a way that was easyto read, graphs and tables were comprehensi-ble and that overall the system was reliable.However, from our preliminary analysis, ar-eas where user responses indicated that thesystem might use improvement included im-proving error and system messages, as well asstreamlining the sequence of screens.

Table 2 provides the overall impressions ofPatCIS from four patients who have used thesystem for at least 9 months. The most posi-tive ratings came from subject 4, who, fromexamination of the baseline questionnaire,had the most extensive computer back-ground. In addition, analysis of log files indi-cated that this subject used the variousfeatures of the system most extensively. Incontrast, user 3 had the least computer expe-rience and reported considerable trouble inattempting to login and access PatCIS (thissubject stated that he had trouble in using theSecure ID card for gaining access to PatCIS).

In addition to logging user accesses andemailing usability questionnaires, interviewswere conducted with users to assess for exam-ple, why they find PatCIS useful or not useful(these interviews were conducted over thetelephone, and user’s responses to questionsand their ‘thinking aloud’ were audio-tapedand transcribed). We also contacted patientswho enrolled in the study who did not use thesystem (according to our log files) in order todetermine why. Analysis of interviews indi-

cates that users may have not been fullyaware of all of the more advanced features ofPatCIS (such as the advice function), whichmay have been the reason for the low usageof some of these functions, as describedabove (despite the questionnaire results indi-cating a high degree of willingness to use thesystem). In addition, patients indicated thatthe review of data function was very useful inallowing them to follow their own resultsover time and that this improved communi-cation with their physicians (consistent withthe logging data indicating that this was themost frequently used function).

Using the multi-method approach de-scribed above, we are working on identifyingproblems and issues of those patients whohave adopted use of the system. We are nowfollowing up with more detailed targeted us-ability testing of potentially problematic ar-eas based on this pilot data (e.g. sequencingof screens, usefulness of error messages, needfor user training and relatively low usage ofdata entry and advice facilities).

4.2. Example 2: e6aluation of the use andgeneration of Web-based clinical guidelines

We have employed a number of our meth-ods described above to assess the use ofclinical guidelines by physicians (in a clinicalpractice setting) using the WWW. The ap-proach taken involved the tracking of use ofguidelines (from ACPOnline) as physiciansuse a Web-based CPR system. The purposeof this work is to characterize how guidelinesare actually being used by physicians in clini-cal practice, as well as their effectiveness.Questions to be answered included the fol-lowing: (a) Why do physicians access Web-based clinical guidelines? (b) Are patientspresent when they are accessed? (c) Do physi-cians find the guidelines useful and applica-ble, and, if not, what are the reasons? In the


Tab

le2

Ove

rall

impr

essi

onof

Pat

CIS

offo

urus

ers

afte

ra

9m

onth

peri

od

Use

r4

Use

r2

Use

r3

Use

r1

‘Ife

elas

thou

ghI

amin

ave

ry‘P

atC

ISm

akes

itm

uch

easi

erto

‘Ifin

dit

very

good

for

my

purp

oses

.‘O

vera

llpr

ogra

mis

exce

llent

.W

ould

bebe

tter

ifyo

uco

uld

By

and

larg

e,I

have

noid

eato

uniq

uepo

siti

on,

Ikn

oww

hat

my

acti

vely

part

icip

ate

inm

yw

ife’

sac

cess

itov

erIn

tern

etpr

oble

m(s

)ar

e,an

dit

’sju

sta

med

ical

care

byal

low

ing

me

toim

prov

eit

.H

owev

er,

Ire

mai

nm

atte

rof

tim

e.’

conf

used

abou

tge

ttin

git

from

aE

xplo

rer

asth

atis

wha

tcl

osel

ym

onit

orhe

rla

bre

sult

s.It

rem

ote

com

pute

r.’

mos

tpe

ople

have

.C

ould

also

allo

ws

me

tom

ore

pres

ent

apr

oble

mw

ith

inte

llige

ntly

inte

ract

wit

hhe

rdo

ctor

s.’

peop

lew

hoha

veol

dm

achi

nes

wit

hlim

ited

mem

ory.

’


area of evidence-based medicine, a consider-able number of papers have appeared de-scribing both the advantages [17] andlimitations of clinical guidelines, particularlyin the context of actual clinical practice.However, few studies have reported hard em-pirical data on how guidelines are actuallyused at the point of care, due to difficulties inobtaining this type of information. In theexample described below, we employed ourmethods in an attempt to identify the follow-ing: (1) the purpose of use of guidelines byphysicians in a clinical practice setting; (2)the patient setting in which the guidelineswere applied; (3) the quality of informationprovided by the guidelines as assessed by thephysician; and (4) the specific reasons forreduced effectiveness or applicability ofguidelines. Ideally, this information will allowus to determine how guidelines (and theirpresentation methods) may be enhanced inorder to improve their usability at the pointof care.

4.2.1. Data collection using triggered on-lineforms

Usage tracking and data collection werebased on methods described in Section 3.Specifically, user interactions with ACPOn-line were recorded during accesses to evi-dence-based guidelines by physicians fromColumbia University. Physicians were able toaccess a reference page directly from a Web-based computerized patient record system.From the reference page, they could select forviewing on-line guidelines for specific diseasesfrom ACPOnline via the WWW (illustratedin Fig. 5a). The user’s selections were cap-tured using the CHECKPOINT tracking sys-tem (described in detail above) developed forremotely tracking use of Web-based informa-tion systems.

After clicking on a particular guideline, theusers were presented with a Web-based form

(illustrated in Fig. 5b) asking them to indi-cate their intended purpose for accessing theguideline (e.g. ‘To help me care for a specificpatient’, ‘For general knowledge/education’,‘Other’), as well as the clinical context inwhich the guideline was accessed (e.g. ‘I amseeing the patient now in an out-patient set-ting’, ‘I am seeing the patient now in anin-patient setting’, ‘I saw the patient previ-ously’, ‘Other’). Once this information wasentered, the users submitted the form by se-lecting a button at the bottom of the form.The guideline was then presented to theusers, and their interaction with the guideline(i.e. browsing at the ACP site where theguideline resides) was tracked.

Upon exiting the guideline, a second Web-based form was presented to users (illustratedin Fig. 5c) asking them to indicate the qualityof information they received from the guide-line (e.g. ‘I got all of the information Iwanted from the guideline’, ‘I got some of theinformation I wanted from the guideline’, ‘Idid not get the information I wanted fromthe guideline’). Furthermore, users who didnot find the information they wanted fromthe guideline were asked to indicate the rea-son for this (e.g. ‘The guideline did not applyto the specific patient I was considering’, ‘Icould not find the information relevant towhat I needed to know’, ‘The guideline maybe incorrect’, ‘The guideline is not current’,‘Other’). The users exited the system uponsubmission of this form.

4.2.2. Results: the use of on-line clinicalguidelines at the point of care

The responses of physicians indicating thepurpose of accessing guidelines is shown inTable 3. The purpose of use was character-ized as being for (1) general knowledge/edu-cation, (2) care for a specific patient or (3)other reasons. The percentage of uses fallinginto each of these categories is shown in


Fig

.5.

(a)

Res

ourc

epa

gesh

owin

glin

ksto

clin

ical

guid

elin

esav

aila

ble

from

wit

hin

aco

mpu

ter-

base

dpa

tien

tre

cord

syst

em.


Fig

.5.

(b)

For

mto

asse

ssa

clin

icia

n’s

reas

onfo

rac

cess

ing

agu

idel

ine

(whi

chap

pear

sw

hen

the

user

sele

cts

agu

idel

ine

from

the

reso

urce

page

).


Fig

.5.

(c)

For

mto

asse

sssu

cces

sin

obta

inin

gde

sire

din

form

atio

nfr

omth

ecl

inic

algu

idel

ine

acce

ssed

(whi

chap

pear

son

exit

from

the

guid

elin

e).


Table 3Results from an analysis of use of on-line guidelines byphysicians at the point of care

Purpose of use PercentageFrequency

General 128 81knowledge/education

22Care for a specific 14patient

5Other 8100158Total

Patient setting9Seeing patient now as an 41

out-patient6Seeing patient now as an 27

in-patient32Saw patient previously 7

10022Total

Quality of information19Got all information 46

wanted10Got some of the the 25

information wanted12 29Did not get the

information wantedTotal 10041

Reason for not getting desired informationGuideline did not apply 6 50

5 42Could not find relevantinformation

1Guideline may be 8incorrect

Total 10012

was characterized as being (1) in the presenceof the patient, in an out-patient setting, (2) inthe presence of the patient, in an in-patientsetting, or (3) for a patient seen previously.The percentage of uses in each of the patient-care settings is shown in Table 3. Specifically,the use of guidelines in each of the three mainpatient-care settings was fairly even. Further-more, roughly 68% of physicians using theguidelines for the care of a specific patientdid so in the presence of that patient. There-fore, it appears that when physicians do ac-cess guidelines for the care of a specificpatient, it is done most frequently at the timeof care.

Upon exiting ACPOnline, physicians wereasked to characterize the quality of the infor-mation that they received from the guideli-nes. Their responses are shown in Table 3.Specifically, physicians indicated whetherthey (1) obtained all of the information theywanted, (2) obtained some of the informationthey wanted, or (3) did not obtain the infor-mation they wanted. The percentages of re-sponses falling into each of these categories isshown in Table 3. Roughly half of the physi-cians did find the information they were seek-ing in the guidelines, while the other halfobtained some or none of the informationthey required. However, this does seem toimply that there are indeed problems or bar-riers that may be reducing the effectivenesswith which physicians are able to applyguidelines in clinical practice.

Physicians who indicated they did not re-ceive the information they wanted from theguidelines were asked to indicate the reasonfor this. Their responses are shown in Table3. Specifically, the reason for not obtainingthe desired information was characterized asbeing because (1) the guideline did not applyto the specific patient they were considering,(2) they were unable to find the relevantinformation, or (3) the guideline may have

Table 3. Specifically, it appears that the ma-jority of physicians accessing these guidelineson-line are doing so for general knowledge oreducational purposes. Thus, physicians wereusing the guidelines for the care of a specificpatient only a fraction of the time.

Physicians who indicated that they wereaccessing the guideline for the care of a spe-cific patient were asked to indicate the pa-tient-care setting in which this was done. Theresponses indicating the patient-care settingare shown in Table 3. The patient-care setting


been incorrect. The percentage of responsesin each of the categories is shown in Table 3.The results indicate that half of the physi-cians did not find the information they wereseeking because the guideline did not applyto the specific patient they were considering.Furthermore, 42% of the physicians were un-able to find information relevant to what theyneeded to know, while 8% indicated that theguideline may be incorrect. Therefore, thesefindings support the implication that the cur-rent applicability and effectiveness of guideli-nes in clinical practice may be issues thatneed to be addressed.

To investigate further the patterns emerg-ing from the data we are collecting, as de-scribed above, we plan to extend our currentanalyses to more clinical settings and agreater number of physicians. We will con-tinue to track user interactions usingCHECKPOINT and present users with Web-based forms that are timed to appear uponinvocation of guidelines. It should be notedthat baseline tracking of user interactionswith the guidelines (for a period prior to theadditional presentation of the forms) is im-portant in order to assess any affect thatcompleting the forms may have on the users’normal interactions. For example, in thestudy described above, we found no differ-ence in the frequency of accesses to guidelinesprior to and after the forms were added. Infuture work, we will complement the collec-tion of data using tracking and forms toinclude telephone interviews with physiciansusing the guidelines who indicate that theyare willing to discuss their interactions. Wehave found in our other studies that suchinterviews can provide information that canhelp to explain the patterns of behaviorrecorded during tracking, as well as the re-sults obtained from presentation of Web-based forms at point of use.

4.2.3. Distance usability testing of theprocess of guideline encoding

In a related study, we are also examiningthe processes involved in the generation ofon-line guidelines using computer tools. Weare using the method described earlier in thispaper for conducting distance usability test-ing, using remote recording of computerscreens via NetMeeting and collection of au-dio recordings of subjects ‘thinking aloud’.Specifically, we are studying physicians (at asite at Harvard) who are using computertools to encode paper-based clinical guideli-nes into a computer-based knowledge repre-sentation known as GLIF [18]. By encodingguidelines in this format, the guidelines canbe deployed over the WWW and integratedinto health-care systems, such as computer-ized patient records. The evaluation proce-dure has involved video-recording thesubjects as they take paper-based guidelines(e.g. for treating thyroid problems) and con-vert them into GLIF, using a special graphi-cal editing program developed at Harvard.All computer screens are video-recorded atthe McGill site (using NetMeeting to view thesubject’s interactions with the system at theremote Harvard site), with the audio inputcoming from the subject’s verbalizations(recorded over a speaker phone at the McGillsite). The data have been coded to identifycategories of problems (e.g. inability of thesubject to encode a step in the paper guideli-nes into computer format, or a need forfurther information about the clinical contextof the guidelines).

Through our preliminary work in this area,described above, we have been able to char-acterize the steps taken in modeling guideli-nes using GLIF and identify potentialdifficulties encountered in the task of encod-ing guidelines. For example, difficulties wereencountered by the subjects in deciding whichknowledge representation was most appropri-


ate for (1) modeling the procedural knowl-edge required for the GLIF developers toencode the domain knowledge, and (2) mod-eling the general informational statements tobe placed in the body of the guidelines. Theapproach taken has led to identification ofboth generic aspects of the process of guide-line encoding as well as specific problems indoing this task. Regarding the guideline re-source material used by the subject in carry-ing out this task, key backgroundinformation (e.g. regarding normal ranges)was found to be needed in order to encodethe guideline. The information from our eval-uation of the physicians’ interaction with thissystem is currently being used for feedbackinto the re-design and improvement of com-puter tools used for encoding guidelines, aswell as for iteratively improving the underly-ing guideline GLIF representation and guide-line content.

5. Discussion

In this paper, we have described our ap-proach to evaluation, including examples ofan assessment of a patient clinical informa-tion system (PatCIS), as well as an evaluationof physicians’ use of clinical guidelines overthe WWW. In designing such evaluations, weadopt a multi-method approach in order toassess a variety of related questions (rangingfrom assessment of user expectations to anal-ysis of specific interface problems). ‘Televalu-ation’ methods, involving evaluation from adistance, have a number of distinct advan-tages, including the ease in creating and ad-ministering evaluation instruments via theweb. However, previous work in assessing theuse of systems by physicians and patients[4,6] has indicated that questionnaires alonemay need to be supplemented by other tech-niques, such as interviews with patients

(which can be conducted via telephone com-munication), and the use of in-depth usabilitytesting methods, which can now be con-ducted over the WWW. In this way, thelimitations of any one method can be offsetby the advantages of another. For example,questionnaires and on-line forms may tell uswhat users think they may be doing whileinteracting with a system, but automaticallylogged data of user interactions may providemore detail on what they actually do (whichis often not the same [4]). In addition, in-depth usability studies (involving videorecording) can provide some insight into thereasoning and decision-making of subjects asthey interact with systems to solve complextasks.

An important aspect of the distance evalu-ation of Web-based information systems hasbeen the development of effective on-linequestionnaires. The use of HTML and a vari-ety of editors that generate HTML codegreatly facilitates this process. Prototypemock-ups of questionnaires can be rapidlytested as they would appear to end users, byusing web browsers to display the HTML-coded questionnaires. In the context of ourcurrent work, we have extended a number ofour methods for iterative testing and refine-ment of computer systems [6] to the testing ofquestionnaire content and display. This hasinvolved observing pilot subjects interactingwith prototype versions of the questionnaires,while ‘thinking aloud’ (verbalizing theirthoughts) regarding the questionnaire’s lay-out and content. This information can beused to improve the design of the question-naires prior to deployment over the Web.One of the most critical stages in our evalua-tion studies is selecting from the multiplepotential methods an approach that will meetthe overall objectives of the evaluation.Given the multiple objectives of our studies,several complementary methods have been


included in the design of the evaluation toachieve a broad assessment of factors such assystem usability. An important considerationin administering particular components ofour evaluations is whether the evaluation in-strument (e.g. a particular questionnaire) isto be administered once, at the beginning ofthe study (to obtain baseline data) or atseveral points throughout the entire period ofthe study to assess change. Analysis of on-line data collected from patients’ interactionswith systems on the World Wide Web isgreatly facilitated by the fact that electronicdata collection can be automated. For exam-ple, CGI scripts can be written to processdata automatically (e.g. perform summarystatistics) as they are entered (e.g. via on-linequestionnaires) and trigger the presentationof on-line forms automatically to assess as-pects of usability at the actual point of care.

At McGill University, we are currently in-volved in developing a ‘virtual’ televaluationlaboratory. At the core of this facility issoftware described in this paper for remotelytracking end users of systems. Our system atthe McGill site essentially acts as a server forconducting evaluation of system use at dis-tant sites. In addition, all data collected fromour studies goes to the McGill site, and mul-tiple sources of information (e.g. buttonsclicked and results from questionnaires andon-line forms) are automatically merged atthe McGill site. At the remote site, our ap-proach involves simply linking applicationsto our virtual laboratory (e.g. rerouting endusers’ inputs to McGill). At the McGill site,all the script programs that process informa-tion (e.g. that present on-line questionnaires,log end user browsing, etc.) reside on ourserver. This architecture is transparent to theusers interacting with the system and has theadvantage of being able to interface withsystems being evaluated at remote sites bysimply linking to those sites through our

server. It has been our experience that theinitial investment in the development of avirtual evaluation laboratory (including thepurchase of server hardware and develop-ment of tracking programs) can be offset byusing the laboratory in the evaluation of anumber of health-care information systems,involving only a minor modification of thelaboratory’s software to accommodate a di-verse range of assessments. Our experience insuccessfully conducting distance evaluationssuggests that major development efforts in-volving health-care information systemscould, and perhaps should, be subjected tosome form of evaluation even if the facilitiesare not readily available at the actual site ofsystem deployment. We plan to extend ourcurrent laboratory to include equipment usedfor conducting high-bandwidth teleconferenc-ing methods in order to conduct remote stud-ies of aspects of system use involvingcollaborative team decision making.

In conclusion, we feel that an approach toevaluation in medical informatics that takesinto account both technological advances inthe systems that we are evaluating, as well asmethods and theoretical frameworks from ar-eas such as cognitive science, promises toprovide us with detailed information on sys-tem use. This information will prove to beinvaluable in the iterative process of design,evaluation and re-design that our complexsystems will inevitably require.

Acknowledgements

The studies reported in this paper wereconducted at McGill University under directcontractual and sub-contractual agreementfrom the funding agencies. The authorswould like to thank Dr. E.H. Shortliffe forhis suggestions and comments in the develop-ment of our web-tracking component


CHECKPOINT. This work was supported inpart by a National Information Infrastruc-ture contract N01-LM-6-3542 from the Na-tional Library of Medicine, a grant from theCanadian Centres of Excellence networkHEALNet and a grant from the AmericanCollege of Physicians–American Society forInternal Medicine (ACP–ASM).

References

[1] J. Nielsen, Usability Engineering, Academic Press,New York, 1993.

[2] R. Kittredge, U. Rabbani, F. Melanson, G.O.Barnett, Experiences in deployment of a Web-based CIS for referring physicians, in: D.R. Masys(Ed.), Proceedings of the 1997 Fall AMIA, Hanley& Belfus, Philadelphia, PA, 1997, pp. 320–324.

[3] R.D. Cork, W.M. Detmer, C.P. Friedman, Devel-opment and initial validation of an instrument tomeasure physicians’ use of, knowledge about andattitudes toward computers, JAMIA 5 (1998)164–176.

[4] A.W. Kushniruk, V.L. Patel, Cognitive evaluationof decision making processes and assessement ofinformation technology in medicine, Int. J. Med.Inform. 51 (1998) 83–90.

[5] Patel V.L., Kushniruk A.K., Yang S., Yale J.F.Impact of a computer-based patient record systemon data collection, knowledge organization, andreasoning, JAMIA 7 (6) (2000) 569–585.

[6] A.W. Kushniruk, V.L. Patel, J.J. Cimino, Usabil-ity testing in medical informatics, in: D.R. Masys(Ed.), Proceedings of the 1997 Fall AMIA, Hanley& Belfus, Philadelphia, PA, 1997, pp. 218–222.

[7] V.L. Patel, D.R. Kaufman, Science and practice:A case for medical informatics as a local science ofdesign, JAMIA 6 (1998) 489–492.

[8] V.L. Patel, J.F. Arocha, D.R. Kaufman, Diagnos-tic reasoning and medical expertise, The Psychol-ogy of Learning and Motivation 31 (1994)

187–252.[9] Patel, VL, Arocha, JF, Kushniruk, AW.. EMR:

Re-engineering the organization of health infor-mation. Methods of Information in Medicine (inpress).

[10] K.A. Ericsson, H.A. Simon, Protocol Analysis:Verbal Reports as Data, MIT Press, London,1993.

[11] J.J. Cimino, S.A. Socratous, Just tell me what youwant!: The promise and perils of rapid prototyp-ing with the World Wide Web, in: J.J. Cimino(Ed.), Proceedings of the 1996 Fall AMIA, Hanley& Belfus, Philadelphia, PA, 1996, pp. 719–723.

[12] J.J. Cimino, S. Sengupta, P.D. Clayton, V.L.Patel, A.W. Kushniruk, X. Huang, Architecturefor a Web-based clinical information system thatkeeps the design open and the access closed, in: C.Chute (Ed.), Proceedings of the 1998 Fall AMIA,Hanley & Belfus, Philadelphia, PA, 1998, pp.121–125.

[13] B. Schneiderman, Designing the User Interface,Addison-Wesley, New York, 1992.

[14] R.M. Felciano, R.B. Altman, Lamprey: Trackingusers on the World Wide Web, in: J.J. Cimino(Ed.), Proceedings of the 1996 Fall AMIA, Hanley& Belfus, Philadelphia, PA, 1996, pp. 757–761.

[15] E.H. Shortliffe, V.L. Patel, J.J. Cimino, G.O. Bar-net, R.A. Greenes, A study of collaborationamong medical informatics research laboratories,AI Med. 12 (2) (1998) 97–123.

[16] A.W. Kushniruk, V.L. Patel, J.J. Cimino, R.A.Barrows, Cognitive evaluation of the user inter-face and vocabulary of an outpatient informationsystem, in: J.J. Cimino (Ed.), Proceedings of the1996 Fall AMIA, Hanley & Belfus Inc, Philadel-phia, PA, 1996, pp. 22–26.

[17] C.J. McDonald, Medical heuristics: The silent ad-judicators of clinical practice, Ann. Intern. Med.124 (1) (1996) 56–62.

[18] L. Ohno-Machado, J.H. Gennari, S.N. Murphy,N.L. Jain, S.W. Tu, D.E. Oliver, E. Pattison-Gor-don, R.A. Greenes, E.H. Shortliffe, G.O. Barnett,The guideline interchange format: A model forrepresenting guidelines, JAMIA 5 (4) (1998) 357–372.

.

Documents

‘Televaluation’ of clinical information systems: an ...people.dbmi.columbia.edu/cimino/Publications/2001... · ‘Televaluation’ of clinical information systems: an integrative