Performance-Based Testing to Measure Ophthalmic Skills Using Computer Simulation Authors John T. LiVecchi, MD William Ehlers, MD Lynn Anderson, PhD Assistant

Performance-Based Testing to Measure Ophthalmic Skills Using Computer Simulation

AuthorsJohn T. LiVecchi, MD William Ehlers, MD Lynn Anderson, PhD Assistant Clinical Professor Associate Professor Chief Executive OfficerDrexel University College of Medicine University of CT Health Center Joint Commission on Allied Health and University of Central Florida University of Connecticut Personnel in Ophthalmology (JCAHPO)College of MedicineDirector of Oculoplastic SurgerySt. Luke’s Cataract & Laser Institute

OverviewJCAHPO is a non-profit, non-governmental organization that provides certification of ophthalmic medical assistants and performs other

educational and credentialing services. JCAHPO is governed by a Board of Directors composed of representatives from participating ophthalmic organizations and a public member. (April, 2011)

The authors have no financial interest in the subject matter of this poster.

AbstractPurpose

To investigate the validity and reliability of an interactive computer-based simulation and test a computer automated scoring algorithm to replace clinical hands-on skill testing with live observers by assessing the knowledge and performance of ophthalmic technicians on clinical skills.

DesignValidity and reliability study of video-taped ophthalmic technicians’ performance of computer simulations on 12 clinical skills.

Participants50 JCAHPO candidates: Certified Ophthalmic Technician (COT®) or Certified Ophthalmic Medical Technologist (COMT®).

MethodsTests were conducted to evaluate ophthalmic technician’s knowledge and ability to perform 12 ophthalmic skills using high fidelity computer simulations in July 2003 and again in August 2010. Performance checklists on technique and task results were developed based on best practices. A scoring rationale was established to evaluate performance using weighted scores and computer adapted algorithms. Candidate performance was evaluated by a computer-automated scoring system and expert evaluations of video-computer recording of skills tests. Inter-rater reliability of the instruments was investigated by comparing the agreement of the computer scoring and the rating of two ophthalmic professional raters on the scoring agreement of a process step and results between the computer and the raters . Computer and rater agreement for a particular step must be statistically significant by Chi-square analysis or a percentage of agreement of 90% or higher.

ResultsOf 80 process steps evaluated in seven COT skills, 71% of the process steps were found to be in agreement (statistically significant by Chi-square or 90% agreement criteria); and 29% of the process steps were found to be suspect. Similarly, of five COMT skills with 86 process steps evaluated, 75% were in agreement and 25% of the process steps were suspect. Given the high degree of agreement between the raters and computer scoring, the inter-rater reliability was judged to be high.

ConclusionsOur results suggest that computer performance scoring is a valid and reliable scoring system. This research founda high level of correspondence between human scoring and computer-automated scoring systems.

Tasks Performed• Keratometry• Lensometry• Tonometry• Ocular Motility• Visual Fields• Retinoscopy • Refinement• Versions and Ductions• Pupil Assessment• Manual Lensometry with Prism• Ocular Motility with Prism• Photography with Fluorescein

Angiography

Simulation Design

• Standardized skill checklists were created based on best practices

• Multiple scenarios were created for each skill, which were randomly administered.

• Interactive arrows allow candidates to manipulate simulated equipment.

• Fidelity (realistic & reliable) analysis assessed the degree to which test simulation required the same behavior as those required by the task. Necessary fidelity allows a person to:

- Manipulate the simulation- Clearly understand where they are in performance - Demonstrate capability on evaluative criteria

Simulation Test Design ChallengesImportant considerations in the development of the simulation scoring included:• Accurate presentation of the skill through simulation

• Presentation of correct alternative procedures• Presentation of incorrect alternative procedures:

1. Not performing the step correctly 2. Performing the steps out of order3. Arriving at the wrong answer even if the correct process is used

• Scoring: Differentiate exploration and intentional performance• Validation of all aspects of the simulation to ensure successful candidate

navigation, usability, and fidelity• Candidate tutorial training to ensure confident interaction with simulated

equipment and tasks on the performance test

Test Design, Simulation Scoring, and Rating• Candidate performance was evaluated on technique and results on each of the 12 ophthalmic tasks. • Procedural checklists were developed for all tasks based on best practices. Subject matter experts

including ophthalmologists and certified ophthalmic technician job incumbents determined criteria for judging correct completion of each procedural step, and if steps were completed in an acceptable process order. (In some cases, the procedural step could be completed in any order and still yield a satisfactory process.)

• Each step on performance checklists was analyzed to determine the importance of the step and a weighted point-value was assigned for scoring. These weighted checklists were then used by raters and the computer for scoring.

• The values ranged from 6 points for a step considered to be important but have little impact on satisfactory performance; to 21 points for a step that was considered critical to satisfactorily completing the skill. A cut score was established for passing the skill performance.

• Using the computer, candidates were tested on all skills. Candidate performance was scored by the computer and a video-computer recording was created for evaluation by live rater observation.

• Computer automated scoring has a high correlation to live rater observation scoring. 1, 2

• The results were compared to determine the agreement between computer scoring and the scoring of professional raters using the same checklists.

• The accuracy of the skills test results was also evaluated. Each task’s results were comparedto professional standards for performing the skill for each scenario presented withinthe simulation.

Validity Analysis

• Computer simulation validity measures included content, user, and scoring validity.• Measurement of the candidate’s ability to accurately complete the task was based

on performance checklists.• To ensure that computer scoring and rater scoring was being done on the same

candidate performance, each candidate’s performance of a computer simulation skill was recorded on video for viewing by the observers.

• The scoring of the simulations was validated by comparing the candidate’s scores on each skill with job incumbent professional’s assessments of the candidate’s performance.

• The raters were asked to evaluate whether the candidate performed each step correctly, and if the order of performing the steps was acceptable given the criteria presented in the checklist.

• The computer scoring based on the criteria specified in scoring checklists was compared to ophthalmic professional’s judgments using the same checklists.

Data Analysis• Test validity was high with candidate pass rates over 80% on the various individual tasks. • Candidates were surveyed on their perceptions of the simulation’s accurate portrayal of clinical

skills they perform for daily job performance. • The inter-rater reliability of the instruments was analyzed by comparing the computer scoring of

the candidates to the ratings of the two ophthalmic professionals using the same checklist with a confidence interval of +/- 95%.

• Scores generated by the computer and scores generated by each rater were entered into a database as exhibited in Table 1 (Slide 9). A representative sample task (keratometry) is displayed.

• The scores for a test’s overall process steps and the accuracy of results were compared. • The decision rule used to determine the raters’ score which was compared with the computer

score was as follows:– Scores of both raters had to agree with each other for a process step for a given candidate to

be included in the analysis. – If the two raters did not agree, a third rater evaluated the process for final analysis. • Table 2 (Slide 10) indicates representative results for inter-rater reliability for three tasks with

agreement between the computer scoring and the rater scoring.• Chi-square and percentage of agreement analyses were used to determine

statistical significance.

Data Comparison of Computer Scoring and Rater Scoring

Test Process Computer Rater 1 Rater 2 Computer Rater 1 Rater 2

EXAMPLE Candidate 1 Candidate 1

Candidate 1 Candidate 2 Candidate 2 Candidate 2

Keratometry Focus the eyepiece

13 13 13 13 13 13

Instruct patient

13 13 13 13 13 13

Total Process Score

74 80 80 80 80 80

Total Results Pass Pass Pass Pass Pass Pass

Vertical Power Results

Fail Fail Fail Pass Pass Pass

Vertical Axis Results

Pass Pass Pass Pass Pass Pass

Horizontal Power Results


Horizontal Axis Results


Table 1

Test Process Decision Reason Rater Agree

All Events

Po=Percent of Agreement

Keratometry Focus eyepiece

Acceptable chi2sig 10 11 1.000

Position keratometer

Acceptable po=1 11 11 1.000

Position patient

Not rated 0 11 .000

Record the horizontal drum reading

Suspect po<.9 7 11 0.857

Lensometry Focus eyepiece

Suspect po<.9 10 12 0.800

Ocular Motility

Instruct patient


Cover-Uncover test


Agreement Between the Computer Scoring and the Rater Scoring Table 2

ResultsValidity•90% of the candidates reported that the COT simulation accurately portrayed the clinical skills they perform for daily job performance. •89% of the candidates reported that the COMT simulation accurately portrayed the clinical skills they perform for daily job performance. •The same scoring checklist was used by both the computer and raters to judge the candidate performance, assuring consistent and objective measurement rather than subjective judgment regarding candidate skills. Reliability •Of 80 process steps evaluated in seven COT skills, 71% of the process steps were found to be in agreement (statistically significant by Chi-square or 90% agreement criteria) and 29% of the process steps were found to be suspect. •Of five COMT skills with 86 process steps evaluated, 75% were in agreement and 25% of the process steps were suspect. •Given the high degree of agreement between the raters and computer scoring, the inter-rater reliability was judged to be high.

Discussion and ConclusionsDiscussionComputer simulations are now commonly used for education and entertainment. The key to incorporating new technologies to improve skills assessment is to formally incorporate automated scoring of individual performance steps identified in a checklist developed by subject matter experts and weighted with regard to importance of each step and performance of steps in the correct order when necessary. High fidelity computer simulations, with objective analysis of the correct completion of checklist steps and the determination of accurate test results, can provide accurate assessment of ophthalmic technicians’ clinical skills.

ConclusionThis comparative analysis demonstrates a high level of correspondence between human scoring and computer-automated scoring systems. Our results suggest that computer performance scoring is a valid and reliable system for assessing the clinical skills of Ophthalmic Technicians. This research further supports that computer simulation testing improves performance-based assessment by standardizing the examination and reducing observer bias. These findings are useful in evaluating and improving the training and certification of ophthalmic technicians.

References1. Williamson, David M., Mislevy, Robert J. and Bejar, Isaac I. Automated Scoring of Complex Tasks in Computer Based Testing: An Introduction. Mahwah, NJ: Lawrence Erlbaum Associates, Inc., 2006.

2. Yang, Y., Buckendahl, C.W., Juszkiewica, P.J., & Bhola, D.S. (2002). A review of strategies for validating computer automated scoring. Applied Measurement in Education, 15 (4), 391.

Documents

Performance-Based Testing to Measure Ophthalmic Skills Using Computer Simulation Authors John T. LiVecchi, MD William Ehlers, MD Lynn Anderson, PhD Assistant