61
Kendon Conrad Barth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Embed Size (px)

Citation preview

Page 1: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Kendon ConradBarth Riley

University of Illinois at ChicagoMichael L. Dennis

Chestnut Health Systems

Page 2: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Overview

Global Appraisal of Individual Needs (GAIN)Benefits of Computerized Adaptive TestingCAT: How Does it WorkExamples of CAT in Clinical Assessment

Triage of persons around treatment decisions for starting and stopping rule

Content Balancing over multiple clinical dimensions

Identification persons with atypical symptom presentations

Page 3: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

The GAIN

Comprehensive biopsychosocial instrument designed for intake into substance abuse treatment. Provides 5 axes DSM-IV diagnoses Also supports treatment planning, outcome

monitoring and program evaluation Versions varying from 2-5 minute screener, 20-30

minute quick, and 1-2 hour full Over 103 scales, 1000 created variables, and text

based narrative report

The biggest problem is how long it takes

Page 4: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

The Benefits of The Benefits of Computerized Computerized Adaptive TestingAdaptive Testing

Page 5: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

General and Targeted Measures

Generalized Heavy response

burden Lack specificity

Targeted Floor and ceiling

effects Limited content

validity Don’t “talk with

each other.”

Page 6: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Tailoring Outcome Measurement

Instrument A

Instrument B

Instrument C

CATSelects items from

Item Bank

administer item

?

Page 7: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Benefits of CAT & Item Banking

CAT

Item Bank

Respondent Burden

Tailoring/ Specificity

Coverage of content domains

Floor and ceiling effects

Page 8: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

CAT vs. Short Forms

CAT has been found to be superior to “short forms” of tests, yielding more precise measures.

Page 9: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

CAT: What Is It and How Does It Work?

Page 10: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Computerized Adaptive Testing

Decreased Difficulty

Decreased Difficulty

Typical Pattern of Responses

Increased Difficulty

Increased Difficulty

Middle Difficulty

Middle Difficulty

Score is calculated and the next best item is selected based on item difficulty+

/- 1

Std

. E

rror

Correct Incorrect

Page 11: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Item Selection

There are several methods for selecting items during a CAT.

The most common method is to find the item that provides the most information given the current estimate of the measure.

Page 12: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Item Selection cont.

Page 13: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Item Selection cont.

Item selection can also take into account the types of domains of items to be represented in the CAT session.

Examples: Items necessary for DSM-IV diagnosis

Page 14: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Stop Rules

The stop rule, which determines when the item administration process of the CAT ends, can be based on: Measurement precision Number of items administered Test-taking time Some combination of the above

Page 15: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Item Bank Size

The more items there are in an item bank, the more likely it is that items that are tailored to an individual’s level on the measured variable will be available.

Typically, item banks consist of hundreds of items.

The number of items will likely depend on The number of constructs or domains being assessed. Whether one wishes to estimate a measure or classify

persons into groups.

Page 16: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

CAT for Clinical Assessment

The application of CAT to clinical research and assessment raises several new measurement issues. Triage of persons around treatment decisions for starting and stopping rule

Content Balancing over multiple clinical dimensions

Identification of persons with atypical presentation of symptoms

Page 17: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Example 1Triage of Individuals to Support Clinical Decision-Making

Page 18: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Classifying Persons Using CAT

CAT is typically used to estimate a measureFew studies have examined the use of CAT to

place persons into diagnostic groups.For placing persons into diagnostic groups, it

is desirable to vary the level of measurement precision depending on the category in which the person is placed.

Current CAT procedures do not allow one to vary measurement precision during the CAT session.

Page 19: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Triage of individuals to support clinical decision making

Strategy: Use of screener measures to set the value of thee initial measure and variable stop rules designed to maximize precision and efficiency for identification of persons in low, medium or high symptom severity

Implications: Taking into account initial location and/or precision around decision points can further improve the efficacy of assessment without hurting precision for decision making

Page 20: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Clinical Decision Making

To facilitate clinical diagnoses, it would be desirable for a CAT to: Classify patients by symptom severity Maximize measurement place within the area of

the measure that is most critical for decision making.

Use previously collected information to increase the efficiency of the CAT.

Page 21: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Study

We examined the ability of CAT to place persons into low, moderate and high levels of substance abuse and substance dependency.

The Substance Problem Scale (SPS) is a 16 item instrument that measures recency of substance use. “When was the last time you used alcohol or

other drugs weekly?

Page 22: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Defining Cut Points

Cut points can be established by examining where persons with different levels of severity fall onto the measurement continuum.

Page 23: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

The Start Rules

Random: randomly select an item with difficulty calibrations between -0.5 and 0.5 logits (average level of difficulty).

Screener: Select an item that has a difficulty level that most closely approximates the respondent’s measure on a previously administered screener (SDScr).

Page 24: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

The Variable Stop Rule

Stop rules for the CAT were defined in terms of maximum standard error of measurement for the low, mid and high range of substance abuse severity.

The mid range stop rule was set to SE=0.35 for all simulations.

Low and High range SE ranged from SE=0.5 to 0.75 logits.

Page 25: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

CAT Standard Error

Middle range where

decisions and made and

precision is controlled

High & Low ranges where there is little impact on clinical

decisions and precision is

allowed to vary more

High & Low ranges where there is little impact on clinical

decisions and precision is

allowed to vary more

Page 26: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

The Item Selection Algorithm

Start Rule UsingScreener

Select item

Administeritem

Re-estimatemeasure & SE

Stoprule met?

End test

Yes

No

Measure inhigh range?

Inmid range?

Low rangestop rule

High rangestop rule

Mid rangestop rule

Yes

Yes

No

No

Page 27: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Results

Screener starting rule improved efficiency of the CAT by approximately 7 percent compared to standard CAT procedures.

Variable stop rules improved efficiency by 15 to 38 percent, depending on definition of the mid range of severity, compared to standard stopping rules.

Page 28: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Results

Pre-calibration and variable stop rules resulted in accurate and efficient estimation of substance abuse severity.

The screener start rule had only a small effect on classification precision.

Page 29: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Next Step: Refining the Algorithm

Low Medium High

LenientStop Rule

StringentStop Rule

LenientStop Rule

Low Medium High

LenientStop Rule

LenientStop Rule

LenientStop Rule

StringentStop Rule

StringentStop Rule

Method A: Highest Precision in Medium Range of Severity

Method B: Highest Precision Immediately Around Cut Points

Page 30: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Example 2: Content Balancing over Multiple Dimensions

Page 31: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Measuring Multiple Dimensions

Strategy: Use of content balancing methods in combination with conventional item selection procedures to ensure selection of items from each substantive domain

Implications: Assessment of an individual’s clinical profile can be conducted both efficiently and comprehensively at both the total and subscale level.

Page 32: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Internal Mental Distress Scale

The IMDS consists of the following subscales: Depression Symptom Scale Anxiety/Fear Symptom Scale Traumatic Distress Scale Homicidal/Suicidal Scale

IMDS also has 4 general somatic items as part of the total scale score.

Clinicians want to estimates for the overall severity and in each of the subscale areas.

Page 33: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Internal Mental Distress Scale by Content Area

IMDS Subscale Item Calibrations

-3

-2

-1

0

1

2

3

Lo

git

s

H/STraumaAnxiety

DepressionSomatic

Page 34: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Example: No Content Balancing

Person Measure: -1.38

-3

-2

-1

0

1

2

3

Lo

git

s

All Screener Items Administered

Page 35: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Example: No Content Balancing

Person Measure: -1.00

-3

-2

-1

0

1

2

3

Lo

git

s

Depression: 2

H/S: 1

Anxiety: 1

Trauma: 1

Think other people don’t understand you: “Yes”

Page 36: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Example: No Content Balancing

Person Measure: -0.68

-3

-2

-1

0

1

2

3

Lo

git

s

Depression: 3

H/S: 1

Anxiety: 1

Trauma: 1

Lost interest in things: “Yes”

Page 37: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Example: No Content Balancing

Person Measure: -0.95

-3

-2

-1

0

1

2

3

Lo

git

s

Depression: 3

H/S: 1

Anxiety: 2

Trauma: 1

Thoughts people taking advantage of me: “No”

Page 38: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Person Measure: -1.19

-3

-2

-1

0

1

2

3

Lo

git

sExample: No Content Balancing

Depression: 4

H/S: 1

Anxiety: 2

Trauma: 1

Shyness: “No”

Page 39: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Person Measure: -0.96

-3

-2

-1

0

1

2

3

Lo

git

sExample: No Content Balancing

Depression: 4

H/S: 1

Anxiety: 3

Trauma: 1

Have to repeat action over and over: Yes

Page 40: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Results

If continued to 13 items – Except for screener items, no hostility/suicide

or trauma items were administered during the CAT session.

Mixed precision on the subscales

Page 41: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

CAT Measures

Full Rasch Depression Hom./Suicide Anxiety Trauma IMDS*

Depression .87 .18 .73 .15 .89

Hom/Suicide .28 .64 .41 .10 .52

Anxiety .65 .21 .90 .22 .86

Trauma .31 .10 .43 .60 .57

IMDS .78 .24 .80 .29 .95

Screener .66 .34 .78 .26 .91

No Content Balancing

Low Precision Estimates

Page 42: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

IMDS by Content Area

Page 43: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

IMDS Screener Items

-3

-2

-1

0

1

2

3

Log

its

SuicidalTrauma

Anxiety

DepressionSomatic

M1C2 Suicidal thoughts

M2A Traumatic memories give distress

M1D10 Thoughts should be punished

M1B1 Trapped, lonely, sad, depressed

M1A2 Sleep trouble

Answers N, Y, Y, N, N and estimate overall Rasch severity as IMDS= -0.5 logit

Page 44: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

IMDS Subscale Calibrations

-3

-2

-1

0

1

2

3

Log

its

Depression Anxiety Trauma Suicidal

five screener items change in rank order of severity

Page 45: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

IMDS Subscale Item Calibrations

-3

-2

-1

0

1

2

3

Log

its

Depression Anxiety Trauma Suicidal

five screener items

Page 46: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Re-estimating IMDS

-3

-2

-1

0

1

2

3

Log

its

SuicidalTrauma

Anxiety

DepressionSomatic

Revised Estimate

five screener items

Page 47: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

CAT Measures

Full Rasch Depression Hom./Suicide Anxiety Trauma IMDS*

Depression 0.89 0.39 0.72 0.54 0.86

Hom/Suicide 0.37 0.97 0.38 0.36 0.56

Anxiety 0.72 0.43 0.89 0.59 0.86

Trauma 0.53 0.41 0.56 0.91 0.77

IMDS 0.82 0.49 0.78 0.70 0.95

Screener 0.79 0.57 0.77 0.71 0.92

Cont. Balancing: CAT to Full IMDS

Subscale & Total now have good precision

Page 48: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Example 3: Identifying Persons with Atypical Symptom Presentations

Page 49: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Overview

Strategy: Rasch person fit statistics can identify persons with atypical clinical presentations in a computerized adaptive testing context

Implications: Clients sometimes endorse severe clinical symptoms that are not reflected by overall scores on standard assessments. Using statistics that can identify persons with such an atypical presentation has important clinical implications.

Page 50: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Rasch Fit Statistics

Both infit and outfit follow a chi-square distribution where the high scores are of primary concern

Infit or “Randomness”: More changes between yes/no that would be expected based on overall severity. Low – almost too perfect fit High –more transitions than expected

Outfit or “Atypicalness”: Focuses more on the tail ends –Group of answers Used to detect unexpected outlying, off-target responses. Outlier sensitive Low – almost too perfect fit High – endorsed high severity items, but not the

percursor items. (e.g.., easier items)

Page 51: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Problems with Fit

Responses by Severity

Low High Randomness Atypicalness111 11111100000 0000 0.3 0.5

111 10101100010 0000 0.6 1.0

111 11101010000 0000 1.0 1.0

111 00001110000 0000 0.9 1.3

011 11111110000 0000 3.8 1.0

111 11111100000 0001 3.8 1.0

101 01010101010 1010 4.0 2.3

000 00000000011 1111 12.6 4.3

Page 52: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Clinical Implications of Misfit

Misfit in the context of clinical assessment can reflect: Difficulty understanding the assessment Cross-cultural effects Differential effects of treatment on some

symptoms but not othersOur analyses indicate that there are

subgroups who endorse severe symptoms without endorsement of milder symptoms.

Example: atypical suicide profile

Page 53: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Example: Atypical Suicide

Depression is regarded as the major risk factor for suicide.

However, there is a less common profile characterized by suicide-related symptoms but in the absence of depressive symptoms.

This profile can be identified through the use of fit statistics (atypicalness).

00000000000011111Depression Suicide

Page 54: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Atypical Suicide

Atypical Group

Variable Low

N=176

Medium

N=546

High

N=471 Mean/Total %

Past Yr Suicide Any Symptoms 100.0% 100.0% 100.0% 100.0%

Suicidal Thoughts 100.0% 100.0% 99.8% 99.9%

Got gun to carry out plan** 17.0% 21.1% 53.5% 33.3%

Had a plan to commit suicide** 23.9% 28.0% 68.8% 43.5%

Attempted Suicide** 19.9% 25.5% 61.8% 39.0%

Past Yr Depression

Feeling Tired/No Energya** 100.0% 79.0% 55.0% 71.0%

Moving/Talking Slowerb** 59.2% 33.7% 27.0% 33.6%

No Energy/Losing Interest in Friends/Workc** 98.1% 72.6% 61.4% 73.6%

Note: a Based on a total number of cases = 662 b Total number of cases = 561 c Total number of cases = 531

Page 55: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Fit Statistics in CAT

Fit statistics such as infit and outfit become less sensitive to atypical response patterns as the number of items is reduced.

Since CAT usually administers items that the respondent has a 50% probability of endorsing, either a “yes” or a “no” response to a dichotomous question is equally likely, and therefore, consistent with the Rasch model.

Page 56: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Randomness by Number of Items

Number of Items

Randomness Categories

< 0.75 0.75-1.33 > 1.33

16 23.6% 58.2% 18.2%

12 28.2% 55.6% 16.2%

8 35.2% 52.8% 12.0%

4 51.1% 44.0% 4.9%

Page 57: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Atypicalness by Number of Items

Number of Items

Atypicalness Categories

< 0.75 0.75-1.33 > 1.33

16 30.2% 48.1% 21.7%

12 34.3% 51.1% 14.6%

8 38.4% 53.2% 8.4%

4 58.2% 40.0% 1.8%

Page 58: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Next Steps: Alternatives to Infit and Outfit

Several measures/procedures for detecting misfit have been developed, specifically for use with short tests and/or CAT. These include: Adjustment of critical values for fit statistics Statistical process control procedures Modified t, modified H and modified Z statistics

(Dimitrov and Smith, 2006).

Page 59: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Potential of CAT in Clinical Practice

Reduce respondent burdenReduce staff resourcesReduce data fragmentationStreamline complex assessment proceduresAssist in clinical decision makingIdentify persons with atypical profiles

Page 60: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Future ResearchFuture Research

How do we put it all together?Much of the research in the area of CAT has

used computer simulation. There is a need to test working CAT systems in clinical practice.

Page 61: Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Contact Information

A copy of this presentation will be at: www.chestnut.org/li/posters

For information on this method and a paper on it, please contact Barth Riley at [email protected]

For information on the GAIN, please contact Michael Dennis at [email protected] or see www.chestnut.org/li/gain