Upload
alexina-mccormick
View
220
Download
1
Tags:
Embed Size (px)
Citation preview
Kendon ConradBarth Riley
University of Illinois at ChicagoMichael L. Dennis
Chestnut Health Systems
Overview
Global Appraisal of Individual Needs (GAIN)Benefits of Computerized Adaptive TestingCAT: How Does it WorkExamples of CAT in Clinical Assessment
Triage of persons around treatment decisions for starting and stopping rule
Content Balancing over multiple clinical dimensions
Identification persons with atypical symptom presentations
The GAIN
Comprehensive biopsychosocial instrument designed for intake into substance abuse treatment. Provides 5 axes DSM-IV diagnoses Also supports treatment planning, outcome
monitoring and program evaluation Versions varying from 2-5 minute screener, 20-30
minute quick, and 1-2 hour full Over 103 scales, 1000 created variables, and text
based narrative report
The biggest problem is how long it takes
The Benefits of The Benefits of Computerized Computerized Adaptive TestingAdaptive Testing
General and Targeted Measures
Generalized Heavy response
burden Lack specificity
Targeted Floor and ceiling
effects Limited content
validity Don’t “talk with
each other.”
Tailoring Outcome Measurement
Instrument A
Instrument B
Instrument C
CATSelects items from
Item Bank
administer item
?
Benefits of CAT & Item Banking
CAT
Item Bank
Respondent Burden
Tailoring/ Specificity
Coverage of content domains
Floor and ceiling effects
CAT vs. Short Forms
CAT has been found to be superior to “short forms” of tests, yielding more precise measures.
CAT: What Is It and How Does It Work?
Computerized Adaptive Testing
Decreased Difficulty
Decreased Difficulty
Typical Pattern of Responses
Increased Difficulty
Increased Difficulty
Middle Difficulty
Middle Difficulty
Score is calculated and the next best item is selected based on item difficulty+
/- 1
Std
. E
rror
Correct Incorrect
Item Selection
There are several methods for selecting items during a CAT.
The most common method is to find the item that provides the most information given the current estimate of the measure.
Item Selection cont.
Item Selection cont.
Item selection can also take into account the types of domains of items to be represented in the CAT session.
Examples: Items necessary for DSM-IV diagnosis
Stop Rules
The stop rule, which determines when the item administration process of the CAT ends, can be based on: Measurement precision Number of items administered Test-taking time Some combination of the above
Item Bank Size
The more items there are in an item bank, the more likely it is that items that are tailored to an individual’s level on the measured variable will be available.
Typically, item banks consist of hundreds of items.
The number of items will likely depend on The number of constructs or domains being assessed. Whether one wishes to estimate a measure or classify
persons into groups.
CAT for Clinical Assessment
The application of CAT to clinical research and assessment raises several new measurement issues. Triage of persons around treatment decisions for starting and stopping rule
Content Balancing over multiple clinical dimensions
Identification of persons with atypical presentation of symptoms
Example 1Triage of Individuals to Support Clinical Decision-Making
Classifying Persons Using CAT
CAT is typically used to estimate a measureFew studies have examined the use of CAT to
place persons into diagnostic groups.For placing persons into diagnostic groups, it
is desirable to vary the level of measurement precision depending on the category in which the person is placed.
Current CAT procedures do not allow one to vary measurement precision during the CAT session.
Triage of individuals to support clinical decision making
Strategy: Use of screener measures to set the value of thee initial measure and variable stop rules designed to maximize precision and efficiency for identification of persons in low, medium or high symptom severity
Implications: Taking into account initial location and/or precision around decision points can further improve the efficacy of assessment without hurting precision for decision making
Clinical Decision Making
To facilitate clinical diagnoses, it would be desirable for a CAT to: Classify patients by symptom severity Maximize measurement place within the area of
the measure that is most critical for decision making.
Use previously collected information to increase the efficiency of the CAT.
Study
We examined the ability of CAT to place persons into low, moderate and high levels of substance abuse and substance dependency.
The Substance Problem Scale (SPS) is a 16 item instrument that measures recency of substance use. “When was the last time you used alcohol or
other drugs weekly?
Defining Cut Points
Cut points can be established by examining where persons with different levels of severity fall onto the measurement continuum.
The Start Rules
Random: randomly select an item with difficulty calibrations between -0.5 and 0.5 logits (average level of difficulty).
Screener: Select an item that has a difficulty level that most closely approximates the respondent’s measure on a previously administered screener (SDScr).
The Variable Stop Rule
Stop rules for the CAT were defined in terms of maximum standard error of measurement for the low, mid and high range of substance abuse severity.
The mid range stop rule was set to SE=0.35 for all simulations.
Low and High range SE ranged from SE=0.5 to 0.75 logits.
CAT Standard Error
Middle range where
decisions and made and
precision is controlled
High & Low ranges where there is little impact on clinical
decisions and precision is
allowed to vary more
High & Low ranges where there is little impact on clinical
decisions and precision is
allowed to vary more
The Item Selection Algorithm
Start Rule UsingScreener
Select item
Administeritem
Re-estimatemeasure & SE
Stoprule met?
End test
Yes
No
Measure inhigh range?
Inmid range?
Low rangestop rule
High rangestop rule
Mid rangestop rule
Yes
Yes
No
No
Results
Screener starting rule improved efficiency of the CAT by approximately 7 percent compared to standard CAT procedures.
Variable stop rules improved efficiency by 15 to 38 percent, depending on definition of the mid range of severity, compared to standard stopping rules.
Results
Pre-calibration and variable stop rules resulted in accurate and efficient estimation of substance abuse severity.
The screener start rule had only a small effect on classification precision.
Next Step: Refining the Algorithm
Low Medium High
LenientStop Rule
StringentStop Rule
LenientStop Rule
Low Medium High
LenientStop Rule
LenientStop Rule
LenientStop Rule
StringentStop Rule
StringentStop Rule
Method A: Highest Precision in Medium Range of Severity
Method B: Highest Precision Immediately Around Cut Points
Example 2: Content Balancing over Multiple Dimensions
Measuring Multiple Dimensions
Strategy: Use of content balancing methods in combination with conventional item selection procedures to ensure selection of items from each substantive domain
Implications: Assessment of an individual’s clinical profile can be conducted both efficiently and comprehensively at both the total and subscale level.
Internal Mental Distress Scale
The IMDS consists of the following subscales: Depression Symptom Scale Anxiety/Fear Symptom Scale Traumatic Distress Scale Homicidal/Suicidal Scale
IMDS also has 4 general somatic items as part of the total scale score.
Clinicians want to estimates for the overall severity and in each of the subscale areas.
Internal Mental Distress Scale by Content Area
IMDS Subscale Item Calibrations
-3
-2
-1
0
1
2
3
Lo
git
s
H/STraumaAnxiety
DepressionSomatic
Example: No Content Balancing
Person Measure: -1.38
-3
-2
-1
0
1
2
3
Lo
git
s
All Screener Items Administered
Example: No Content Balancing
Person Measure: -1.00
-3
-2
-1
0
1
2
3
Lo
git
s
Depression: 2
H/S: 1
Anxiety: 1
Trauma: 1
Think other people don’t understand you: “Yes”
Example: No Content Balancing
Person Measure: -0.68
-3
-2
-1
0
1
2
3
Lo
git
s
Depression: 3
H/S: 1
Anxiety: 1
Trauma: 1
Lost interest in things: “Yes”
Example: No Content Balancing
Person Measure: -0.95
-3
-2
-1
0
1
2
3
Lo
git
s
Depression: 3
H/S: 1
Anxiety: 2
Trauma: 1
Thoughts people taking advantage of me: “No”
Person Measure: -1.19
-3
-2
-1
0
1
2
3
Lo
git
sExample: No Content Balancing
Depression: 4
H/S: 1
Anxiety: 2
Trauma: 1
Shyness: “No”
Person Measure: -0.96
-3
-2
-1
0
1
2
3
Lo
git
sExample: No Content Balancing
Depression: 4
H/S: 1
Anxiety: 3
Trauma: 1
Have to repeat action over and over: Yes
Results
If continued to 13 items – Except for screener items, no hostility/suicide
or trauma items were administered during the CAT session.
Mixed precision on the subscales
CAT Measures
Full Rasch Depression Hom./Suicide Anxiety Trauma IMDS*
Depression .87 .18 .73 .15 .89
Hom/Suicide .28 .64 .41 .10 .52
Anxiety .65 .21 .90 .22 .86
Trauma .31 .10 .43 .60 .57
IMDS .78 .24 .80 .29 .95
Screener .66 .34 .78 .26 .91
No Content Balancing
Low Precision Estimates
IMDS by Content Area
IMDS Screener Items
-3
-2
-1
0
1
2
3
Log
its
SuicidalTrauma
Anxiety
DepressionSomatic
M1C2 Suicidal thoughts
M2A Traumatic memories give distress
M1D10 Thoughts should be punished
M1B1 Trapped, lonely, sad, depressed
M1A2 Sleep trouble
Answers N, Y, Y, N, N and estimate overall Rasch severity as IMDS= -0.5 logit
IMDS Subscale Calibrations
-3
-2
-1
0
1
2
3
Log
its
Depression Anxiety Trauma Suicidal
five screener items change in rank order of severity
IMDS Subscale Item Calibrations
-3
-2
-1
0
1
2
3
Log
its
Depression Anxiety Trauma Suicidal
five screener items
Re-estimating IMDS
-3
-2
-1
0
1
2
3
Log
its
SuicidalTrauma
Anxiety
DepressionSomatic
Revised Estimate
five screener items
CAT Measures
Full Rasch Depression Hom./Suicide Anxiety Trauma IMDS*
Depression 0.89 0.39 0.72 0.54 0.86
Hom/Suicide 0.37 0.97 0.38 0.36 0.56
Anxiety 0.72 0.43 0.89 0.59 0.86
Trauma 0.53 0.41 0.56 0.91 0.77
IMDS 0.82 0.49 0.78 0.70 0.95
Screener 0.79 0.57 0.77 0.71 0.92
Cont. Balancing: CAT to Full IMDS
Subscale & Total now have good precision
Example 3: Identifying Persons with Atypical Symptom Presentations
Overview
Strategy: Rasch person fit statistics can identify persons with atypical clinical presentations in a computerized adaptive testing context
Implications: Clients sometimes endorse severe clinical symptoms that are not reflected by overall scores on standard assessments. Using statistics that can identify persons with such an atypical presentation has important clinical implications.
Rasch Fit Statistics
Both infit and outfit follow a chi-square distribution where the high scores are of primary concern
Infit or “Randomness”: More changes between yes/no that would be expected based on overall severity. Low – almost too perfect fit High –more transitions than expected
Outfit or “Atypicalness”: Focuses more on the tail ends –Group of answers Used to detect unexpected outlying, off-target responses. Outlier sensitive Low – almost too perfect fit High – endorsed high severity items, but not the
percursor items. (e.g.., easier items)
Problems with Fit
Responses by Severity
Low High Randomness Atypicalness111 11111100000 0000 0.3 0.5
111 10101100010 0000 0.6 1.0
111 11101010000 0000 1.0 1.0
111 00001110000 0000 0.9 1.3
011 11111110000 0000 3.8 1.0
111 11111100000 0001 3.8 1.0
101 01010101010 1010 4.0 2.3
000 00000000011 1111 12.6 4.3
Clinical Implications of Misfit
Misfit in the context of clinical assessment can reflect: Difficulty understanding the assessment Cross-cultural effects Differential effects of treatment on some
symptoms but not othersOur analyses indicate that there are
subgroups who endorse severe symptoms without endorsement of milder symptoms.
Example: atypical suicide profile
Example: Atypical Suicide
Depression is regarded as the major risk factor for suicide.
However, there is a less common profile characterized by suicide-related symptoms but in the absence of depressive symptoms.
This profile can be identified through the use of fit statistics (atypicalness).
00000000000011111Depression Suicide
Atypical Suicide
Atypical Group
Variable Low
N=176
Medium
N=546
High
N=471 Mean/Total %
Past Yr Suicide Any Symptoms 100.0% 100.0% 100.0% 100.0%
Suicidal Thoughts 100.0% 100.0% 99.8% 99.9%
Got gun to carry out plan** 17.0% 21.1% 53.5% 33.3%
Had a plan to commit suicide** 23.9% 28.0% 68.8% 43.5%
Attempted Suicide** 19.9% 25.5% 61.8% 39.0%
Past Yr Depression
Feeling Tired/No Energya** 100.0% 79.0% 55.0% 71.0%
Moving/Talking Slowerb** 59.2% 33.7% 27.0% 33.6%
No Energy/Losing Interest in Friends/Workc** 98.1% 72.6% 61.4% 73.6%
Note: a Based on a total number of cases = 662 b Total number of cases = 561 c Total number of cases = 531
Fit Statistics in CAT
Fit statistics such as infit and outfit become less sensitive to atypical response patterns as the number of items is reduced.
Since CAT usually administers items that the respondent has a 50% probability of endorsing, either a “yes” or a “no” response to a dichotomous question is equally likely, and therefore, consistent with the Rasch model.
Randomness by Number of Items
Number of Items
Randomness Categories
< 0.75 0.75-1.33 > 1.33
16 23.6% 58.2% 18.2%
12 28.2% 55.6% 16.2%
8 35.2% 52.8% 12.0%
4 51.1% 44.0% 4.9%
Atypicalness by Number of Items
Number of Items
Atypicalness Categories
< 0.75 0.75-1.33 > 1.33
16 30.2% 48.1% 21.7%
12 34.3% 51.1% 14.6%
8 38.4% 53.2% 8.4%
4 58.2% 40.0% 1.8%
Next Steps: Alternatives to Infit and Outfit
Several measures/procedures for detecting misfit have been developed, specifically for use with short tests and/or CAT. These include: Adjustment of critical values for fit statistics Statistical process control procedures Modified t, modified H and modified Z statistics
(Dimitrov and Smith, 2006).
Potential of CAT in Clinical Practice
Reduce respondent burdenReduce staff resourcesReduce data fragmentationStreamline complex assessment proceduresAssist in clinical decision makingIdentify persons with atypical profiles
Future ResearchFuture Research
How do we put it all together?Much of the research in the area of CAT has
used computer simulation. There is a need to test working CAT systems in clinical practice.
Contact Information
A copy of this presentation will be at: www.chestnut.org/li/posters
For information on this method and a paper on it, please contact Barth Riley at [email protected]
For information on the GAIN, please contact Michael Dennis at [email protected] or see www.chestnut.org/li/gain