The feasibility of a multi-format Web-based assessment of physicians’ communication skills

Patient Education and Counseling 84 (2011) 359–367

The feasibility of a multi-format Web-based assessment of physicians’communication skills

Sara Kim a,*, Douglas M. Brock a, Brian J. Hess b, Eric S. Holmboe b, Thomas H. Gallagher a,Rebecca S. Lipner b, Kathleen M. Mazor c

a School of Medicine, University of Washington, Seattle, USAb American Board of Internal Medicine, Philadelphia, USAc UMass Medical School, Meyers Primary Care Institute, Worcester, USA

A R T I C L E I N F O

Article history:

Received 1 December 2010

Received in revised form 29 March 2011

Accepted 1 April 2011

Keywords:

Internet

Assessment

Communication skills

Validation

A B S T R A C T

Objective: Little is known about the best approaches and format for measuring physicians’

communication skills in an online environment. This study examines the reliability and validity of

scores from two Web-based communication skill assessment formats.

Methods: We created two online communication skill assessment formats: (a) MCQ (multiple-choice

questions) consisting of video-based multiple-choice questions; (b) multi-format including video-based

multiple-choice questions with rationales, Likert-type scales, and free text responses of what physicians

would say to a patient. We randomized 100 general internists to each test format. Peer and patient

ratings collected via the American Board of Internal Medicine (ABIM) served as validity sources.

Results: Seventy-seven internists completed the tests (MCQ: 38; multi-format: 39). The adjusted

reliability was 0.74 for both formats. Excellent communicators, as based on their peer and patient

ratings, performed slightly better on both tests than adequate communicators, though this difference

was not statistically significant. Physicians in both groups rated test format innovative (4.2 out of 5.0).

Conclusion: The acceptable reliability and participants’ overall positive experiences point to the value of

ongoing research into rigorous Web-based communication skills assessment.

Practice implications: With efficient and reliable scoring, the Web offers an important way to measure

and potentially enhance physicians’ communication skills.

� 2011 Elsevier Ireland Ltd. All rights reserved.

Contents lists available at ScienceDirect

Patient Education and Counseling

jo ur n al h o mep ag e: w ww .e lsev ier . co m / loc ate /p ated u co u

1. Introduction

Medical specialty boards are increasingly interested in explor-ing the Web as a platform for assessing physicians’ communicationskills. However, little is known about the best approaches andformat for measuring physicians’ communication skills in anonline environment. Common assessment methods reported in theliterature for examining physicians’ communication skills includethe use of standardized patients (SPs) and video clips [1]. SPs play acritical role in assessing physicians’ recognition of depression orHIV risks in patients based on audiotaped encounters withunannounced standardized patients in clinics [2,3]. Interactionswith standardized patients connected via video-conferencing havebeen reported to be effective in assessing physicians’ skills in

* Corresponding author at: Instructional Design and Technology, David Geffen

School of Medicine, University of California Los Angeles, Box 957381, 700

Westwood Plaza, Room 1220, Los Angeles, CA 90095-7381, USA.

Tel.: +1 310 206 0572.

E-mail address: [email protected] (S. Kim).

0738-3991/$ – see front matter � 2011 Elsevier Ireland Ltd. All rights reserved.

doi:10.1016/j.pec.2011.04.003

disclosing medical errors [4]. While SPs are effective for measuringcommunication skills, they have important practical consider-ations. The logistics and continuing costs associated with therecruitment and training of new standardized patients requireongoing support whereas video or paper-based alternativesprimarily have only upfront development costs [5,6]. We needsome other way of measuring communication skills that is valid,reliable, acceptable to users, cost effective and has demonstrableeducational impact [7]. One option would be asking people torespond to video clips.

Video clips are widely used in assessing physicians’ communi-cation skills [8–10] and may be a more efficient and cost-effectivemethod of assessing these skills. In one study, physicians werevideotaped during their actual patient encounters before and aftercompleting a computer module that presented video examples ofpoor and adequate communication between surgeons and cancerpatients [11]. Results showed that physicians’ communicationskills in real patient care settings improved after interacting withthe computer module. However, patient satisfaction with physi-cians’ communication skills did not change. In a computer-basedvideo exam, medical students were asked to type short essay

http://dx.doi.org/10.1016/j.pec.2011.04.003

mailto:[email protected]

http://www.sciencedirect.com/science/journal/07383991

http://dx.doi.org/10.1016/j.pec.2011.04.003

S. Kim et al. / Patient Education and Counseling 84 (2011) 359–367360

answers following a video clip [12]. The results pointed todeficiencies in students’ skills in decision-making. In anotherstudy [13], a 20-item video test was administered to physicians,who were prompted to voice record their answers to the question,‘‘What would you say next to this patient?’’ Physicians who hadbeen identified as ‘‘at-risk’’ for communication difficulties scoredlower on the test than the rest of the subjects. Lastly, in an onlinevideo-based tool targeting communication skills that involvedpatients’ behavior change, medical students were first prompted totype what they would say to the patient after viewing a video clip[14]. Subsequently, they were asked to select from multiple-choicequestions the statement that best matched their open-endedcomment. Results showed that students tended to select the mostexemplary patient-centered statement from the multiple-choicelist, which contradicted their free text comments that predomi-nantly reflected doctor-centered approaches. This raised questionsabout whether the use of multiple-choice questions alone issufficient in assessing physician communication skills.

The Web environment offers a platform for measuringcommunication skills but few studies have compared differentonline test formats in the domain of communication skillassessment. We report a pilot online assessment of communica-tion skills of physicians enrolled in the American Board of InternalMedicine (ABIM) Maintenance of Certification (MOC) program. Ourstudy was conceived based on multiple factors: (a) online toolspresented a logical method for investigation, but the existingliterature yielded little guidance for establishing validity evidence;(b) MOC can potentially reach 1000 s of practicing physicians,requiring feasible and reliable online tools; and (c) positive userexperiences can be generated through realistic and engagingonline communications skills exercises [15].

Our study compared two different online video-based Webassessment tools: (a) multiple-choice question format and (b)multi-format including multiple-choice questions augmentedwith physician rationale input, Likert-type scale and open-endedquestions. Currently, most maintenance of certification computer-based tests rely on a multiple-choice question format. However,this testing format may be poorly suited for assessing physicians’skills in some key areas, particularly communication [16–18].Novel on-line assessment strategies may allow more reliable andvalid tests of physicians’ communication skills. Yet few studieshave compared newer on-line testing formats for assessingphysicians’ communication skills with a standard approach suchas the multiple-choice format.

The feasibility of our study was evaluated at three levels: (a)reliability of each test based on item scoring method; (b) physicianevaluation of on-line test-taking experiences; and (c) external datasources including peer and patient ratings of the physicians’communication skills collected as part of MOC. Our goal was toapply the key results from our study to inform the future designand validation of on-line communication skill assessment mod-ules.

2. Methods

2.1. Development of video-based test items

The source video materials for the online test content weredeveloped in 1999 by a panel of communication skills expertsnationally recruited by the ABIM. The experts included physicianeducators and researchers with expertise in physical examinationand communication skills. The ABIM provided the complete setof video clips along with script materials that contained correctanswers and rationales. The professionally produced videosconsisted of sets of (a) opening scene clips (i.e., item stem),depicting an interaction between a physician and a patient, and (b)

multiple-choice video-response options. Each response optionillustrated how the opening scenes might continue. Using availableresearch evidence, the panel chose a set of clinical communicationskills and behaviors that highlighted key components of effectivecommunication [19–21]. This set of skills and behaviors formedthe basis for the creation of a series of videos. Sample topics ofcommunication skill domains included the following: deliveringbad news to a patient, prioritizing patient’s multiple concerns,establishing an empathic connection to a distressed patient,encouraging a patient to talk about domestic violence, facilitatingbehavioral change discussions with a defensive patient, identifyingbarriers to patient adherence, and handling an unethical requestfrom a patient.

We created two test formats using the items: multiple-choicequestion (MCQ) and multi-format versions. The 18-item MCQversion only included video-based multiple-choice questions. Themulti-format version included a total of 20 questions includingvideo-based multiple-choice questions with open-ended rationale(n = 9), Likert-type scales (n = 6), and free-text questions (n = 5).Table 1 includes a summary of test format and scoring methods,which are described in detail below.

2.1.1. Multiple-choice question (MCQ) test

Response format. This response format is based on the moretraditional approach where an opening scene video is followed by4–5 multiple-choice response videos (Fig. 1). The main goal of thistest format was to assess a physician’s recognition of exemplarpatient-centered communication. Each question covering one ofcommunication skill domains listed before consisted of (a) a casevideo clip illustrating an opening of a physician–patient interac-tion, (b) multiple-choice video items showing different approachesto continuing the initial interaction, and (c) an optional interactivefeature, called a sorter feature, to assist the physicians’ decision-making process as they viewed video items, compared differentcommunication approaches by the portrayed physician, andselected a clip that best demonstrated a clinician’s patient-centered communication skill (Fig. 2). After submitting theiranswer, the physician received feedback on the correctness orincorrectness of their answer (Fig. 3).

Scoring method. Responses were scored based on the pre-determined correct answers provided by the ABIM’s expert panel.A score of ‘1’ was assigned for the correct response and ‘0’ for allother responses. Correct responses were summed to yield anoverall score.

2.1.2. Multi-format test

2.1.2.1. Multiple-choice questions (MCQ) + rationale

Response format. This response format included an additionalfeature to the multiple-choice questions: an open-ended field thatprompted physicians to type their rationale for selecting amultiple-choice response. The identical video materials includedin the MCQs were used in this format. Physicians received feedbackto their open-ended responses in the form of comparing an expert’srationale with their own response.

Scoring method. The expert panel’s pre-determined answersguided the scoring of MCQs as above (1 for correct; 0 for incorrect).Open-ended rationales were scored by a team of two researchersand two trained staff members, using a coding scheme, which wasdeveloped after careful review of recommended patient-centeredcommunication skills such as the Kalamazoo Consensus Statement[19], Canadian doctor/patient questionnaires [20], and CommonGround Ratings [21]. For example, for the question assessing aphysician’s delivery of bad news to a patient, we applied thefollowing codes: opens with warning, uses direct/honest/trans-parent communication, elicits patient questions, and uses pauses/

Table 1Description of test versions, question type, scoring method and reliability evidence.

Test version Question type Format description Pros and cons of item format Scoring method Adjusted

reliability

MCQ Multiple choice

questions

(n = 18)

(a) A video opening of a physician-

patient interaction, (b) multiple-choice

video items showing different

approaches to continuing the initial

interaction, and (c) a sorter feature to

order video items from worse to better.

Pro: automatic scoring based on

experts’ keyed answers. Con:

multiple-choice format taps into

recognition of exemplar

demonstration of

communication skills.

A score of ‘1’ for correct response

and ‘0’ for incorrect selection.

0.74 (n = 11)

Multi-format Multiple-choice

questions + user

rationale (n = 9)

Same format as above plus an open-

ended question that prompted users to

type a rationale for selecting their

multiple-choice response.

Pro: open-ended question elicits

physicians’ rationale beyond

recognition skills. Con: scoring is

time and labor intensive.

Sum of (a) MCQ response

correctness (0, 1) and global

rationale score (0, 0.5, 1).

0.74 (n = 17)

Likert scale (n = 6) (1) An opening video and (2) individual

video responses that were dragged and

dropped along a five-point scale (very

ineffective, ineffective, adequate,

effective, and very effective).

Pro: a novel application of a

widely used testing format that

can be easily designed. Con: no

established scoring methods to

apply.

Sum of (a) ‘1’ for users’ highly

rated video items matching

experts’ choice; (b) ‘1’ for users’

lowest rated videos matching

experts’ choices.

Open-ended

questions

(n = 5)

An opening scene video followed by a

question to type what the user would

say to the patient.

Pro: simulates a physician-

patient interaction. Con: scoring

is time and labor intensive.

A score of 1 for explicit presence,

0.5 for implicit presence, and 0

for the absence of key descriptive

elements in the user responses.

Fig. 1. Multiple-choice question. The large panel plays an opening video scene. Multiple video options (A–D) are played by double clicking on the icons for physicians to

determine the best scenario for continuing the opening scene.

S. Kim et al. / Patient Education and Counseling 84 (2011) 359–367 361

silence. For each question, we assigned a global score of 1 forcomplete presence, 0.5 for incomplete presence, and 0 for theabsence of key descriptive elements in physician responses. Meaninter-rater reliabilities across four raters ranged from kappa = 0.45to 1.0 with an average kappa = 0.60.

2.1.2.2. Likert-type scale questions

Response format. We developed a Likert-type question formatwith the goal of investigating whether a novel application of thiswidely used format would be feasible in a Web environmentemploying a stimulus video and Likert-type video options (Fig. 4).After viewing an opening video, a physician viewed individualvideo items and dragged them along a five-point scale (1 = veryineffective, 2 = ineffective, 3 = adequate, 4 = effective, and 5 = veryeffective). Physicians could place more than one video item underthe same Likert-type category.

Scoring method. We explored different methods for scoringthe Likert-type responses but report only on the final methodthat we applied. For generating expert responses, we invited fiveexperts to rate the Likert-type questions including two localfaculty members who teach communication skills and threeteam members (EH, TG, KM). Concordance was determined bycomparing expert responses with physician responses in twoways: one measure for responses receiving the highest expertratings (4 = effective or 5 = very effective) and one measure forvideos receiving the lowest expert ratings (1 = very ineffective or2 = ineffective). We awarded two separate scores to physicianresponses: [1] a score of ‘1’ if a physician’s most highly ratedvideo items matched the experts’ choice; and [2] a score of ‘1’ ifthe physician’s lowest rated videos matched the experts’choices. The sum of the two was the final score for eachquestion.

Fig. 2. Multiple-choice question sorter feature. As the physician views individual multiple video responses, they can drag and drop individual icons into the video sorting area

before submitting their answer in the Best Choice box.

Fig. 3. Multiple-choice question feedback screen. Physician: (a) receiving feedback on selected answer, (b) comparing rationale with expert’s rationale, (c) providing level of

agreement with expert’s rationale, and (d) explaining reasons behind agreement or disagreement.


2.1.2.3. Free text questions

Response format. This item included an opening scene videofollowed by a question asking physicians to type what they wouldsay to the patient. The goal was to directly examine physicians’communication skills as they formulated a response to theportrayed patient. After submitting their answer, a physicianwas asked to view a video clip of what the experts deemed asexemplar communication skills along with their rationale associ-ated with the clip.

Scoring method. Similar to the method applied to scoringrationales, we developed and applied a coding structure for scoringphysicians’ open-ended responses. For example, for the questiondemonstrating a physician’s discussion of a patient’s behavioral

change, we applied the following codes: elicits patient perspectiveand states concerns in a non-judgmental manner. A global scorewas awarded to each question: 1 for complete presence, 0.5 forincomplete presence, and 0 for the absence of key descriptiveelements in the responses. Mean inter-rater reliabilities across fourraters ranged from kappa = 0.50 to 0.95 with an averagekappa = 0.80.

Both versions included a review screen (Fig. 5) at the end of thetest, which presented learning objectives associated with testitems and automatically scored multiple-choice responses. Thetests were developed using Adobe Flash, which supports variousinteractivity features using multimedia, animation, videos in theWeb environment. MySQL database system captured and stored

Fig. 4. Likert-scale screen. Physician: (a) viewing the case video, (b) viewing the leaf video items, and (c) dragging and dropping icons using very ineffective to very effective scale.

Fig. 5. Review screen: at the completion of the test, physician receives feedback on the multiple choice questions and can return to the screen to view the test item.


responses, which the investigators accessed via customizedreports.

Before launching the on-line test versions, we conducted ausability test with eight local internal medicine physicians toidentify any major technical difficulties and barriers to successfullycompleting the test [22–24]. This resulted in the discovery of 36usability issues and four technical bugs that were mostlyassociated with the sorting feature and Likert-type scales. Usabilityproblems were resolved following debriefing of key issues with thedevelopers.

2.2. Study subjects

A total of 300 internal medicine physicians enrolled in the ABIMMOC program were invited to participate in this pilot study. Weselected physicians whose certificates were due to expire inDecember 2010 and 2011 (i.e., had one or two years remaining

before their 10-year certificates would expire). Inclusion criteriawere: (a) general internists enrolled in MOC who had completed thepeer and patient module in the past year, or were almost donecompleting the module; (b) a proportional representation of menand women to ensure consistency with the population of physiciansenrolled in MOC; (c) physicians who reported, via ABIM’s PracticeCharacteristics Survey, spending at least 70% of their professionaltime providing patient care; and (d) a cross-section of physiciansworking in solo, small group, and large group practice settingsincluding office or ambulatory, hospital, intensive care, andemergency department settings. The study participants wereoffered an honorarium of $200 and 10 MOC points for completingthe assigned online communication skill assessment. The Universityof Washington Internal Review Board approved the study.

After 100 physicians agreed to participate, we randomly assigned50 to the MCQ version, and the other 50 to the multi-format version.The analyses of the recruited subjects’ demographic and practice

Table 2Demographic and practice profile of physician subjects (MCQ: n = 38; multi-format: n = 39).

Category MCQ Multi-format p-Value*

Female (%) 47.4 41.0 >0.05

Geographic distribution (%)

Northeast 13.2 33.3 >0.05

South 42.1 25.6

Midwest 13.2 23.1

West 28.9 17.9

Puerto Rico 2.6 0

Birth country (%)

US/Canada 55.3 56.4 >0.05

Internationally born 44.7 43.6

Medical school location (%)

US/Canada 65.8 56.4 >0.05

International 34.2 43.6

Practice type (%)

Solo and group practice 52.6 56.4 >0.05

Single specialty 55.3 59.0 >0.05

Multi-specialty 39.5 39.5 >0.05

Academic practice 7.9 5.1 >0.05

Residency teaching clinic 5.3 7.7 >0.05

Percent of total time spent in

Patient care (average month) 9.7 9.3 >0.05

Medical teaching (average month) 3.6 3.0 >0.05

Care of patients in office/ambulatory settings 38.4 36.5 >0.05

Care of patients in hospital settings 33.6 28.9 >0.05

Percent of physician’s patients that have

Medicare and Medicaid as their primary sources of insurance coverage 16.4 22.4 >0.05

Private Insurance as their primary sources of insurance coverage 23.6 25.0 >0.05

Self Pay or are uninsured 11.0 16.7 >0.05

* p values based on statistical tests of frequency distribution of demographic and practice profiles between MCQ and multi-format test groups.


characteristics collected from the ABIM Practice CharacteristicsSurvey showed that the two study groups were equivalent onmost characteristics (Table 2). Practice Characteristics Surveycontained questions that asked about their practice when theyentered MOC. This survey is updated every 18 months. During thestudy period of May and August 2009, 77 out of 100 recruitedphysicians completed the tests (MCQ, n = 38; multi-format, n = 39).We received no responses from 23 participants after multipleattempts to reach them by email. We correlated physicians’ testperformance with data collected via ABIM peer and patient modulethat included patient assessment of physicians’ communicationskills and peer assessment of clinical performance [25]. This electivemodule involves confidential, anonymous surveys of the physicians’peers and patients. The physician distributes the surveys consecu-tively to patients seen in the office with a minimum of 25 surveysrequired for the module. For the peer assessment, the physiciansimply sends the surveys to their peers with no required numbers.Table 3 presents results from peer (n = 768) and patient (n = 1910)surveys administered by ABIM as part of the MOC process. Therewere no statistically significant differences in physicians’ compe-tencies between both groups. Lastly, we used data related tophysicians’ birth country and medical school country to examineperformance differences between US/Canada and internationallyborn study participants.

Data analyses using SPSS (Statistical Package for the SocialSciences) included descriptive statistics reporting mean andstandard deviations of test scores as well as correlations betweentest scores and external measures.

3. Results

3.1. MCQ version

The physicians’ mean score on the MCQ test was 9 out of 18(50% correct; SD = 2.8; 95% CI = 8.0–9.9). The initial Cronbach alphareliability was 0.54. After removing seven items with less than0.10 or negative item-total correlations, the adjusted reliability

improved to 0.74 based on 11 items. A recalculated mean was 5.3out of 11 (48% correct; SD = 2.8; CI = 4.4–6.2). We found nostatistically significant association between test performance andphysicians’ birth country and medical school country.

3.2. Multi-format

The mean score of the multi-format test was 19.2 out of possiblescore of 34 (56%; SD = 4.8; 95% CI = 17.7–20.8). The overall alphareliability was 0.70 based on 20 scored items (9 MCQ + rationale, 6Likert-type scales, 5 open-ended questions). When removing oneMCQ item and two Likert items with item-total correlations <.10,the reliability improved to 0.74 based on 17 items. The recalculatedmean score was 14.5 out of 28 (52%; SD = 4.4; 95% CI = 13.0–15.9).MCQ-rationale questions moderately correlated with both Likert-type format questions (r = 0.39, p < 0.05) and open-ended ques-tions (r = 0.59, p < 0.001). A moderate correlation was also foundbetween the Likert-type format and open-ended question formats(r = 0.42, p < 0.05). In addition, we found moderate correlationsbetween test scores and birth country of physicians (r = 0.49,p = 0.002) as well as medical school country (r = 0.52, p = 0.001),that is physicians born in the US/Canada and completed medicalschool training in the US/Canada scored higher than internationalparticipants.

Performance on the nine common multiple-choice items byphysicians in the MCQ group (52% correct) and multi-format group(50% correct) was not statistically significant.

3.3. Peer and patient ratings

The overall ratings, collected through ABIMs peer and patientmodule, were high (Table 2). Both MCQ and multi-formatphysicians received average ratings of 8.0 on 9-point peer ratings(SD-MCQ: 0.44; multi-format: 0.33). Physicians in both groupsreceived an average of 4.8 (scale: 1–5) on patient ratings (SD-MCQ:0.13; multi-format: 0.14). To examine further patterns in testperformance between ‘‘excellent’’ and ‘‘adequate’’ communicators,

Table 3Comparison of peer and patient ratings of physicians enrolled in ABIM maintenance of certification: MCQ (multiple-choice questions) and multi-format (mean ratings,

standard deviation).

Survey type ABIM peer and patient survey items MCQ

(n = 38)

Multi-format

(n = 39)

p-Value

Peer ratings (n = 768) The average number of peers who would recommend the physician to a friend or family 9.7 (.91) 9.8 (.49)

The average number of peers who would not recommend them to a friend or family 2.2 (1.3) 0 N/A

Physician competence in (scale: 1–9):

Respect 8.1 (.51) 8.0 (.38) 0.34

Medical knowledge 7.9 (.43) 7.8 (.41) 0.54

Ambulatory care skills 8.0 (.42) 8.0 (.34) 0.96

Integrity 8.2 (.50) 8.2 (.35) 0.83

Psychosocial aspects of illness 7.8 (.56) 7.8 (.43) 0.89

Management of multiple complex problems 7.9 (.47) 7.9 (.35) 0.66

Compassion 8.0 (.56) 8.0 (.47) 0.84

Responsibility 8.1 (.46) 8.1 (.37) 0.92

Management of hospitalized patients 8.0 (.44) 7.8 (.41) 0.09

Problem-solving 7.9 (.42) 7.9 (.37) 0.60

Average no. of peer raters 10 10

(Range) (9–10) (9–10)

Patient ratings (n = 1910) Physician competence in (scale: 1–5):

Telling you everything 4.9 (.12) 4.9 (.14) 0.69

Greeting you warmly 4.9 (.14) 4.9 (.13) 0.55

Treating you like you’re on the same level 4.9 (.13) 4.9 (.13) 0.62

Letting you tell your story 4.8 (.15) 4.8 (.15) 0.44

Showing interest in you as a person 4.9 (.14) 4.9 (.14) 0.97

Warning you during the physical exam about what he/she is going to do 4.8 (.14) 4.8 (.18) 0.40

Discussing options with you

Encouraging you to ask questions 4.7 (.16) 4.7 (.21) 0.99

Explaining what you need to know about your problems 4.8 (.16) 4.8 (.17) 0.43

Using words you can understand 4.8 (.16) 4.7 (.19) 0.26

4.9 (.15) 4.8 (.16) 0.68

Average no. of patient raters 25 25

(Range) (24–25) (24–25)

Table 4User evaluation data (n = 77).

User evaluation items MCQ (n = 38) Multi-format (n = 39)

It was easy to use the multiple-choice items 4.3 (1.2) 4.2 (0.8)

The overall test format was innovative 4.2 (1.2) 4.2 (0.8)

The quality of the video was excellent 4.0 1.3) 4.2 (0.8)

The overall test format was engaging 4.0 (1.3) 4.1 (0.8)

It was easy to use the Review screen 3.9 (1.3) 4.1 (0.9)

The Review screen was valuable 3.7 (1.4) 3.9 (0.9)

The sorting feature for organizing video clips was helpful 3.6 (1.3) 3.7 (1.2)

I recommend this test format for future Maintenance of Certification 3.1 (1.4) 3.1 (1.3)

Test format provided a good assessment of communication skill 3.0 (1.2) 3.3 (1.2)

It was easy to use the Likert rating format N/A 3.9 (1.0)

Typing a rationale for my decisions a useful testing experience N/A 3.4 (1.2)

Comparing my response with an expert’s response was helpful N/A 3.4 (1.2)

Typing my responses was a useful testing experience N/A 3.1 (1.1)


we categorized physicians receiving 4.8 and above across all 10patient survey items and above 8.0 on 5 peer survey itemspertinent to the Web assessment domains (respect, integrity,psychosocial aspects of illness, compassion, responsibility). Wefound slight differences in the adjusted total scores betweenexcellent and adequate communicators: MCQ: 6.3 (n = 10) vs. 5.0(n = 28); multi-format: 16.2 (n = 9) vs. 15.6 (n = 30). However,neither comparison was found to be statistically significant. Thepeer and patient ratings based on physicians’ country of origin andmedical school location were not statistically significant.

3.4. Evaluation results

Overall, physicians’ mean ratings of the test taking experienceswere positive (Table 4). When comparing the ratings of 11common evaluation items between the MCQ and multi-formatversions, we found no statistical differences in mean ratings. Forquestions unique to the multi-format version, physicians rated theLikert-type scale question format the highest (3.9) but in general,

typing their answers or comparing their answers with experts’answers received somewhat neutral ratings (3.1–3.4). No itemswere rated below 3.0. We also analyzed data related to physicians’use of the optional Sorter Feature. In both test versions, 50% of thesubjects made use of the feature throughout the test while 20% didnot use the feature at all.

4. Discussion and conclusion

4.1. Discussion

In this pilot study, we provided preliminary findings related totwo online assessment formats targeting physicians’ patient-centered communication skills. The acceptable reliability (0.74) ofboth test formats after removing poor items as well as relativelypositive ratings of physicians’ test-taking experiences point to thevalue of ongoing research into developing a rigorous multi-formatWeb-based communication skill assessment tool. Our initialfindings have several implications for future research. What


constitutes communication skills that can be validly assessed in anonline environment needs further refinement. Although weincluded open-ended questions as a way to better understandphysicians’ underlying reasoning, most multiple-choice and Likert-type scale items assessed physicians’ recognition of exemplar andpoor demonstrations of communication skills. While recognitionskills in identifying excellence in portrayed communication skillsare important, its predictive value of a physician’s actualcompetence in handling a wide variety of patient-centeredcommunication scenarios is unknown.

Several limitations are discussed. First, the item writing waslimited to samples of physician behaviors portrayed in videomaterials that were 10 years old. Second, the subject recruitment fellshort by 23 out of 100 targeted physicians, resulting in insufficientinformation on test item performance. Third, we identified sevenitems in the MCQ test and three items in the Multi-format versionwith poor item-total correlations. Unlike a knowledge test thatmeasures conceptual understanding, communication skill domain ismore context dependent and sensitive to both verbal and non-verbalcues demonstrated by physicians. We speculate that physicians inour study regarded choices other than the pre-determined correctanswers to be acceptable based on how they interpreted the contextof provider-patient interactions and appropriateness of verbal andnon-verbal behaviors portrayed by providers in video clips. It is forthis precise reason that assessing physicians’ communication skill ina video-based assessment will continue to pose challenges withscoring. Fourth, given the high ratings and little variance in thephysicians’ communication skills as measured by their peer andpatient ratings, we suspect a selection bias in the type of physicianswho volunteered in our study. This may partially explain why wefailed to generate meaningful correlations between physicians’ testscores and peer/patient ratings. Fifth, we did not link our studyresults to more direct measures of physicians’ actual communica-tion skills based on observations of their patient encounters. As aresult, our assessment versions cannot serve as a diagnostic tool foridentifying physicians at risk due to poor communication skills [13].Lastly, different test formats in the two assessment versions made itdifficult to directly compare physicians’ performance between theversions. The multi-format test included more items than the MCQtest, potentially inflating the relative internal consistency. Using theSpearman–Brown Prophecy with an assumption of 30 items in eachtest version, the internal consistency of MCQ improves to 0.89 from0.74 and in multi-format, to 0.83 from 0.74.

Several issues remain for future work. First, further explorationsare needed related to the moderate correlations we found betweentest scores and both places of birth and medical school training ofstudy participants in the mixed-format version. We believe themixed-format items such as the Likert-type scale and free-textquestions captured more nuanced demonstration of one’s com-munication skills. Therefore, it is plausible physicians born andtrained in the US and Canada performed better than theirinternational counterparts on this format. Whether test itemsbeyond multiple-choice questions can differentiate native vs. non-native English speakers’ communication skills is a question worthpursuing. Second, alternate methods for scoring Likert-type scaleitems should be explored using ratings submitted by the expertswe recruited for this study. Based on neutral ratings physiciansprovided regarding their experiences in typing open-endedrationales, the question of how to tap into physicians’ reasoningremains unanswered. Third, it may be feasible in a future study toexamine physicians’ communication skills assessed in an on-lineenvironment broken down by pre-identified excellent andadequate communicators based on their peer and patient ratings.This may lead to understanding whether the Web-based commu-nication assessment validates physicians’ real-life communicationskills reported by peers and patients. Fourth, based on the

preliminary psychometric evidence established in the presentstudy, a future study can target the concurrent validity of the Web-based communication assessment using external measures such asunannounced standardized patient visits.

Lastly, scoring open-ended responses can be time and laborintensive; we estimate that each of the four team members spentapproximately 10 h towards scoring physicians’ open-endedresponses. This amounts to a total of 40 h of coding orapproximately an hour of coding per subject in the multi-formattest. Mazor et al. [13] asked clinicians to voice record theirresponses into a microphone when answering the question as towhat they would say to the patient next. This method provedeffective in differentiating physicians with varying degrees ofcommunication skills in real patient settings. Capturing voicerecording may represent a more authentic way to understand aphysician’s real-life communication style. However, it still begs thequestion of how physicians’ recorded responses can be efficientlyscored. One potential solution may be to ask physicians to self-score their open-ended responses in comparison with expert’sresponses as a way to generate quantifiable scores.

4.2. Conclusion

As the ACGME (Accreditation Council of Graduate MedicalEducation) competency of ‘‘interpersonal communication’’ skills isbeing incorporated into physicians’ maintenance of certificationrequirements, the need for effective communication skill assess-ment tools remains high. Future development and validation of on-line assessment tools need to be informed by the best format andapproach for establishing validity evidence.

4.3. Practice implications

Our pilot study has demonstrated the feasibility of an onlineassessment of physician communication skills. Web-assessment ofcommunication skills is a promising area that merits furtherexploration, especially around how best to score free textresponses. With efficient and reliable scoring, the Web offers animportant way to measure and enhance physicians’ communica-tion skills.

Acknowledgements

This project was funded by the ABIM Foundation. The authorswish to thank Ms. Kathryn M. Ross at ABIM for her valuableassistance with subject recruitment and other key data collections.The authors would also like to acknowledge Ms. Odawni Palmerand Ms. Carolyn Prouty, both in the Department of InternalMedicine at the University of Washington, for their projectmanagement support. The following content experts providedtheir time and comments in reviewing the online communicationassessment tools: Dr. Lynne Robins, Department of MedicalEducation and Biomedical Informatics and Dr. Rick Arnold,Department of Internal Medicine, both at University of Washing-ton. Lastly, the development team members at University ofWashington TIER (Technology Innovations in Education andResearch) in the School of Nursing are recognized for their creativeand innovative work: Mr. Alan Gojdics, Mr. David Hughes, Mr.David Jones, and Ms. Ashley Bond.

References

[1] Duffy FD, Gordon GH, Whelan G, Cole-Kelly K, Frankel R, Buffone N, et al.Participants in the American Academy on Physician and Patient’s Conferenceon Education and Evaluation of Competence in Communication and Interper-sonal Skills. Assessing competence in communication and interpersonal skills:the Kalamazoo II report. Acad Med 2004;79:495–507.


[2] Carney PA, Dietrich AJ, Eliassen MS, Owen M, Badger LW. Recognizing andmanaging depression in primary care: a standardized patient study. J FamPract 1999;48:965–72.

[3] Carney PA, Ward DH. Using unannounced standardized patients to assess theHIV preventive practices of family nurse practitioners and family physicians.Nurse Pract 1998;23:56–8. 63, 67–68.

[4] Chan DK, Gallagher TH, Reznick R, Levinson W. How surgeons disclose medicalerrors to patients: a study using standardized patients. Surgery 2005;138:851–8.

[5] Van der Vleuten C, Swanson D. Assessment of clinical skills with standardizedpatients: state of the art. Teach Learn Med 1990;2:58–76.

[6] Wallace P. Following the threads of an innovation: the history of standardizedpatients in medical education. Caduceus 1997;13:5–28.

[7] Holmboe ES, Sherbino J, Long DM, Swing SR, Frank JR. The role of assessment incompetency-based medical education. Med Teach 2010;32:676–82.

[8] Fertleman C, Gibbs J, Eisen S. Video improved role play for teaching commu-nication skills. Med Educ 2005;39:1155–6.

[9] Humphris GM, Kaney S. Assessing the development of communication skills inundergraduate medical students. Med Educ 2001;35:225–31.

[10] Roter DL, Larson S, Shinitzky H, Chernoff R, Serwint JR, Adamo G, et al. Use of aninnovative video feedback technique to enhance communication skills train-ing. Med Educ 2004;38:145–57.

[11] Hulsman RL, Ros WJ, Winnubst JA, Bensing JM. The effectiveness of a com-puter-assisted instruction programme on communication skills of medicalspecialists in oncology. Med Educ 2002;36:125–34.

[12] Hulsman RL, Mollema ED, Hoos AM, de Haes JC, Donnison-Speijer JD. Assess-ment of medical communication skills by computer: assessment method andstudent experiences. Med Educ 2004;38:813–24.

[13] Mazor KM, Haley HL, Sullivan K, Quirk ME. The video-based test of communi-cation skills: description, development, and preliminary findings. Teach LearnMed 2007;19:162–7.

[14] Kim S, Spielberg F, Mauksch L, Farber S, Duong C, Fitch W, et al. Comparingnarrative and multiple-choice formats in online communication skill assess-ment. Med Educ 2009;43:533–41.

[15] Kim S, Brock D, Prouty CD, Odegard PS, Shannon SE, Robins L, et al. A web-based team-oriented medical error communication assessment tool: devel-opment, preliminary reliability, validity, and user ratings. Teach Learn Med2011;23:68–77.

[16] Schuwirth LW, van der Vleuten CP. Different written assessment methods:what can be said about their strengths and weaknesses? Med Educ 2004;38:974–9.

[17] Norman G, Swanson D, Case S. Conceptual and methodology issues in studiescomparing assessment formats, issues in comparing item formats. Teach LearnMed 1996;8:208–16.

[18] Ward WC. A comparison of free-response and multiple-choice forms of verbalaptitude tests. Appl Psychol Meas 1982;6:1–11.

[19] Makoul G. Essential elements of communication in medical encounters: theKalamazoo consensus statement. Acad Med 2001;76:390–3.

[20] Campbell C, Lockyer J, Laidlaw T, Macleod H. Assessment of a matched-pairinstrument to examine doctor–patient communication skills in practisingdoctors. Med Educ 2007;41:123–9.

[21] Lang F, McCord R, Harvill L, Anderson DS. Communication assessment usingthe common ground instrument: psychometric properties. Fam Med2004;36:189–98.

[22] Nielsen J. Estimating the number of subjects needed for a thinking aloud test.Int J Hum–Comput Stud 1994;41:385–97.

[23] Nielsen J. Applying discount usability engineering. IEEE Software 1995;12:98–100.

[24] Virzi RA. Refining the test phase of usability evaluation: How many subjects isenough? Hum Factors 1992;34:457–68.

[25] Lipner RS, Blank LL, Leas BF, Fortna GS. The value of patient and peer ratings inrecertification. Acad Med 2002;77:S64–6.

Documents

The feasibility of a multi-format Web-based assessment of physicians’ communication skills