Upload
jonas-hardy
View
221
Download
0
Embed Size (px)
Citation preview
Performance-Based Testing to Measure Ophthalmic Skills Using Computer Simulation
AuthorsJohn T. LiVecchi, MD William Ehlers, MD Lynn Anderson, PhD Assistant Clinical Professor Associate Professor Chief Executive OfficerDrexel University College of Medicine University of CT Health Center Joint Commission on Allied Health and University of Central Florida University of Connecticut Personnel in Ophthalmology (JCAHPO)College of MedicineDirector of Oculoplastic SurgerySt. Luke’s Cataract & Laser Institute
OverviewJCAHPO is a non-profit, non-governmental organization that provides certification of ophthalmic medical assistants and performs other
educational and credentialing services. JCAHPO is governed by a Board of Directors composed of representatives from participating ophthalmic organizations and a public member. (April, 2011)
The authors have no financial interest in the subject matter of this poster.
AbstractPurpose
To investigate the validity and reliability of an interactive computer-based simulation and test a computer automated scoring algorithm to replace clinical hands-on skill testing with live observers by assessing the knowledge and performance of ophthalmic technicians on clinical skills.
DesignValidity and reliability study of video-taped ophthalmic technicians’ performance of computer simulations on 12 clinical skills.
Participants50 JCAHPO candidates: Certified Ophthalmic Technician (COT®) or Certified Ophthalmic Medical Technologist (COMT®).
MethodsTests were conducted to evaluate ophthalmic technician’s knowledge and ability to perform 12 ophthalmic skills using high fidelity computer simulations in July 2003 and again in August 2010. Performance checklists on technique and task results were developed based on best practices. A scoring rationale was established to evaluate performance using weighted scores and computer adapted algorithms. Candidate performance was evaluated by a computer-automated scoring system and expert evaluations of video-computer recording of skills tests. Inter-rater reliability of the instruments was investigated by comparing the agreement of the computer scoring and the rating of two ophthalmic professional raters on the scoring agreement of a process step and results between the computer and the raters . Computer and rater agreement for a particular step must be statistically significant by Chi-square analysis or a percentage of agreement of 90% or higher.
ResultsOf 80 process steps evaluated in seven COT skills, 71% of the process steps were found to be in agreement (statistically significant by Chi-square or 90% agreement criteria); and 29% of the process steps were found to be suspect. Similarly, of five COMT skills with 86 process steps evaluated, 75% were in agreement and 25% of the process steps were suspect. Given the high degree of agreement between the raters and computer scoring, the inter-rater reliability was judged to be high.
ConclusionsOur results suggest that computer performance scoring is a valid and reliable scoring system. This research founda high level of correspondence between human scoring and computer-automated scoring systems.
Tasks Performed• Keratometry• Lensometry• Tonometry• Ocular Motility• Visual Fields• Retinoscopy • Refinement• Versions and Ductions• Pupil Assessment• Manual Lensometry with Prism• Ocular Motility with Prism• Photography with Fluorescein
Angiography
Simulation Design
• Standardized skill checklists were created based on best practices
• Multiple scenarios were created for each skill, which were randomly administered.
• Interactive arrows allow candidates to manipulate simulated equipment.
• Fidelity (realistic & reliable) analysis assessed the degree to which test simulation required the same behavior as those required by the task. Necessary fidelity allows a person to:
- Manipulate the simulation- Clearly understand where they are in performance - Demonstrate capability on evaluative criteria
Simulation Test Design ChallengesImportant considerations in the development of the simulation scoring included:• Accurate presentation of the skill through simulation
• Presentation of correct alternative procedures• Presentation of incorrect alternative procedures:
1. Not performing the step correctly 2. Performing the steps out of order3. Arriving at the wrong answer even if the correct process is used
• Scoring: Differentiate exploration and intentional performance• Validation of all aspects of the simulation to ensure successful candidate
navigation, usability, and fidelity• Candidate tutorial training to ensure confident interaction with simulated
equipment and tasks on the performance test
Test Design, Simulation Scoring, and Rating• Candidate performance was evaluated on technique and results on each of the 12 ophthalmic tasks. • Procedural checklists were developed for all tasks based on best practices. Subject matter experts
including ophthalmologists and certified ophthalmic technician job incumbents determined criteria for judging correct completion of each procedural step, and if steps were completed in an acceptable process order. (In some cases, the procedural step could be completed in any order and still yield a satisfactory process.)
• Each step on performance checklists was analyzed to determine the importance of the step and a weighted point-value was assigned for scoring. These weighted checklists were then used by raters and the computer for scoring.
• The values ranged from 6 points for a step considered to be important but have little impact on satisfactory performance; to 21 points for a step that was considered critical to satisfactorily completing the skill. A cut score was established for passing the skill performance.
• Using the computer, candidates were tested on all skills. Candidate performance was scored by the computer and a video-computer recording was created for evaluation by live rater observation.
• Computer automated scoring has a high correlation to live rater observation scoring. 1, 2
• The results were compared to determine the agreement between computer scoring and the scoring of professional raters using the same checklists.
• The accuracy of the skills test results was also evaluated. Each task’s results were comparedto professional standards for performing the skill for each scenario presented withinthe simulation.
Validity Analysis
• Computer simulation validity measures included content, user, and scoring validity.• Measurement of the candidate’s ability to accurately complete the task was based
on performance checklists.• To ensure that computer scoring and rater scoring was being done on the same
candidate performance, each candidate’s performance of a computer simulation skill was recorded on video for viewing by the observers.
• The scoring of the simulations was validated by comparing the candidate’s scores on each skill with job incumbent professional’s assessments of the candidate’s performance.
• The raters were asked to evaluate whether the candidate performed each step correctly, and if the order of performing the steps was acceptable given the criteria presented in the checklist.
• The computer scoring based on the criteria specified in scoring checklists was compared to ophthalmic professional’s judgments using the same checklists.
Data Analysis• Test validity was high with candidate pass rates over 80% on the various individual tasks. • Candidates were surveyed on their perceptions of the simulation’s accurate portrayal of clinical
skills they perform for daily job performance. • The inter-rater reliability of the instruments was analyzed by comparing the computer scoring of
the candidates to the ratings of the two ophthalmic professionals using the same checklist with a confidence interval of +/- 95%.
• Scores generated by the computer and scores generated by each rater were entered into a database as exhibited in Table 1 (Slide 9). A representative sample task (keratometry) is displayed.
• The scores for a test’s overall process steps and the accuracy of results were compared. • The decision rule used to determine the raters’ score which was compared with the computer
score was as follows:– Scores of both raters had to agree with each other for a process step for a given candidate to
be included in the analysis. – If the two raters did not agree, a third rater evaluated the process for final analysis. • Table 2 (Slide 10) indicates representative results for inter-rater reliability for three tasks with
agreement between the computer scoring and the rater scoring.• Chi-square and percentage of agreement analyses were used to determine
statistical significance.
Data Comparison of Computer Scoring and Rater Scoring
Test Process Computer Rater 1 Rater 2 Computer Rater 1 Rater 2
EXAMPLE Candidate 1 Candidate 1
Candidate 1 Candidate 2 Candidate 2 Candidate 2
Keratometry Focus the eyepiece
13 13 13 13 13 13
Instruct patient
13 13 13 13 13 13
Total Process Score
74 80 80 80 80 80
Total Results Pass Pass Pass Pass Pass Pass
Vertical Power Results
Fail Fail Fail Pass Pass Pass
Vertical Axis Results
Pass Pass Pass Pass Pass Pass
Horizontal Power Results
Pass Pass Pass Pass Pass Pass
Horizontal Axis Results
Pass Pass Pass Pass Pass Pass
Table 1
Test Process Decision Reason Rater Agree
All Events
Po=Percent of Agreement
Keratometry Focus eyepiece
Acceptable chi2sig 10 11 1.000
Position keratometer
Acceptable po=1 11 11 1.000
Position patient
Not rated 0 11 .000
Record the horizontal drum reading
Suspect po<.9 7 11 0.857
Lensometry Focus eyepiece
Suspect po<.9 10 12 0.800
Ocular Motility
Instruct patient
Acceptable chi2sig 24 24 0.958
Cover-Uncover test
Acceptable chi2sig 17 24 0.941
Agreement Between the Computer Scoring and the Rater Scoring Table 2
ResultsValidity•90% of the candidates reported that the COT simulation accurately portrayed the clinical skills they perform for daily job performance. •89% of the candidates reported that the COMT simulation accurately portrayed the clinical skills they perform for daily job performance. •The same scoring checklist was used by both the computer and raters to judge the candidate performance, assuring consistent and objective measurement rather than subjective judgment regarding candidate skills. Reliability •Of 80 process steps evaluated in seven COT skills, 71% of the process steps were found to be in agreement (statistically significant by Chi-square or 90% agreement criteria) and 29% of the process steps were found to be suspect. •Of five COMT skills with 86 process steps evaluated, 75% were in agreement and 25% of the process steps were suspect. •Given the high degree of agreement between the raters and computer scoring, the inter-rater reliability was judged to be high.
Discussion and ConclusionsDiscussionComputer simulations are now commonly used for education and entertainment. The key to incorporating new technologies to improve skills assessment is to formally incorporate automated scoring of individual performance steps identified in a checklist developed by subject matter experts and weighted with regard to importance of each step and performance of steps in the correct order when necessary. High fidelity computer simulations, with objective analysis of the correct completion of checklist steps and the determination of accurate test results, can provide accurate assessment of ophthalmic technicians’ clinical skills.
ConclusionThis comparative analysis demonstrates a high level of correspondence between human scoring and computer-automated scoring systems. Our results suggest that computer performance scoring is a valid and reliable system for assessing the clinical skills of Ophthalmic Technicians. This research further supports that computer simulation testing improves performance-based assessment by standardizing the examination and reducing observer bias. These findings are useful in evaluating and improving the training and certification of ophthalmic technicians.
References1. Williamson, David M., Mislevy, Robert J. and Bejar, Isaac I. Automated Scoring of Complex Tasks in Computer Based Testing: An Introduction. Mahwah, NJ: Lawrence Erlbaum Associates, Inc., 2006.
2. Yang, Y., Buckendahl, C.W., Juszkiewica, P.J., & Bhola, D.S. (2002). A review of strategies for validating computer automated scoring. Applied Measurement in Education, 15 (4), 391.