1
RESULTS BACKGROUND Impact of Expert Data Review and Centralized Scoring on Test-Retest Reliability of the ADAS-Cog Alexandra S. Atkins, Ph.D. 1 , Ioan Stroescu, Ph.D. 1 , Vicki G. Davis, Dr.PH. 1 , Tiffany Williamson, BA 1 , Kelley Boyd, BS 1 , Lyn Harper Mozley, PhD 2 , Tiffini Voss, MD 2 , Kerry Budd McMahon 2 , and Richard S.E. Keefe, Ph.D. 1,3 1 NeuroCog Trials (Durham, NC), 2 Merck & Co (Kenilworth, NJ), 3 Duke University (Durham, NC) Data included 3 pretreatment ADAS-Cog 12 assessments for 155 randomized subjects with a diagnosis of mild-moderate AD. Tests were administered and scored by trained, certified raters at investigational sites. In order to qualify for rater training and certification, potential raters were required to submit credentials for review and approval. Once approved, raters completed all training and certification activities prior to testing. Following each ADAS-Cog 12 administration in the trial, source documents and video media were submitted for central review by expert data monitors. Following review, raters received feedback regarding scoring and administration performance. Raters with numerous or egregious errors were remediated to improve future testing. Test-retest reliability of site-reported and central scores was assessed with intraclass correlation coefficients (ICCs) across 2 pretreatment test-retest intervals. Discrepancies between site-reported and central scores were described using the average sum of absolute differences (ASAD). The impact of central scoring on rates of missing data was also examined by comparing the rates of missing data for site-reported and central scores. METHODS Central review by expert data monitors can improve test-retest reliability of the ADAS-Cog and substantially reduce the rate of missing data in clinical trials for mild-moderate AD. These findings suggest inclusion of central data review within a trial can improve the integrity of data collection of cognitive endpoints. CONCLUSIONS DISCLOSURES REFERENCES (1) Connor & Sabbagh (2008). J Alzheimers Dis. 15(3): 461–464. (2) Schafer et al.(2011). Curr Alzheimer Res. 8(4):3736. AS Atkins is a full-time employee of NeuroCog Trials, Durham, NC, USA, and has received support from National Institute of Mental Health. I Stroescu, V Davis, T Williamson and K Boyd are employees of NeuroCog Trials. L Harper Mozley, T Voss and K Budd McMahon are employees of Merck & Co, Inc., Kenilworth, NJ, USA. RSE Keefe currently or in the past 3 years has received investigator-initiated research funding support from the Department of Veteran’s Affair, Feinstein Institute for Medical Research, GlaxoSmithKline, National Institute of Mental Health, Novartis, Psychogenics, Research Foundation for Mental Hygiene, Inc., and the Singapore National Medical Research Council. He currently or in the past 3 years has received honoraria, served as a consultant, or advisory board member for Abbvie, Akebia, Amgen, Asubio, AviNeuro/ChemRar, BiolineRx, Biogen Idec, Biomarin, Boehringer-Ingelheim, Eli Lilly, EnVivo/FORUM, GW Pharmaceuticals, Janssen, Lundbeck, Merck, Minerva Neurosciences, Inc., Mitsubishi, Novartis, NY State Office of Mental Health, Otsuka, Pfizer, Reviva, Roche, Sanofi/Aventis, Shire, Sunovion, Takeda, Targacept, and the University of Texas South West Medical Center. Dr. Keefe receives royalties from the BACS testing battery, the MATRICS Battery (BACS Symbol Coding) and the Virtual Reality Functional Capacity Assessment Tool (VRFCAT). He is also a shareholder in NeuroCog Trials and Sengenix. RESULTS ADAS-Cog Intraclass Correlation Coefficients Comparison of Site-Reported and Central Scores ADAS-Cog Total at Baseline* Average Sum of Absolute Differences (ASAD): For each subtest, the absolute difference between the site- reported score and the central score is calculated. The absolute difference is then summed across all subtests for each rater. Then the average sum of absolute differences is calculated across all raters. The lower the ASAD, the greater the average agreement between raters’ scores and the reference standard. An ASAD of 0 indicates perfect agreement. ICC (N=68)* Screening to Run-In (~25 days apart) Run-In to Baseline (~14 days apart) Site 0.78 0.80 Central Review 0.83 0.84 *Sample Size is based on number of assessments with total scores from both the Site and Central Review at all 3 pre-treatment visits. ICC Screening to Run-In (~25 days apart) Run-In to Baseline (~14 days apart) (N=85)* (N=96)* Site 0.78 0.75 Central Review 0.83 0.81 *Sample Size is based on number of assessments with total scores from both the Site and Central Review at the 2 visits being compared. 0 5 10 15 20 25 30 35 40 Invalid Missing Recovered Corrected Number of Invalid, Missing, Recovered and Corrected Data Points by ADAS-Cog Item* ICCs for both test-retest intervals were improved for central scores compared to site-reported scores: Using all available data (Panel A), Screening to Run-In ICCs were .78 for site reported scores and improved to .83 for central scores. Run-In to Baseline ICCs were .75 for site-reported scores and improved to .81 for central scores. Missing or invalid data were present in 28% (44/155) of site-reported baseline assessments and impacted the total score in 23% (35/155) of tests. The baseline ASAD for the remaining 120 assessments was 2.2 (SD=3.55). Total ADAS-Cog scores were recovered by central reviewers for all but 1 assessment. An additional 8 assessments were deemed invalid due to egregious errors in administration detected during central review, resulting in a final missing/invalid data rate of 6% (9/155). The Alzheimer's Disease Assessment Scale Cognitive Subscale (ADAS-Cog) is the primary neurocognitive outcome measure in many clinical trials for mild-moderate Alzheimer’s disease (AD). Despite wide use, variations in administration and scoring are well-documented (1), and may reduce reliability and sensitivity of the measure (2). Clinical trials devote considerable effort and resources to rater training and quality assurance of ADAS-Cog endpoints. To examine the efficacy of these efforts in improving ADAS-Cog reliability, we evaluated the impact of central data review on ADAS-Cog test-retest reliability in an ongoing, multicenter, placebo-controlled treatment trial (NCT01852110). CONTACT INFORMATION For more information, please contact Dr. Atkins: [email protected] , (919) 401-4642, www.neurocogtrials.com 25% 23% 18%18% 14%14% 13%13% 12% 12% 10% 8% 7% 6% 6% 6% 6% 6% 6% 5% 5% 4% 3% 2% 1% 1% 1% 0% 5% 10% 15% 20% 25% 30% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 10 15 20 25 30 35 40 45 50 55 60 65 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 10 15 20 25 30 35 40 45 50 55 60 65 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Frequency Score Discrepancies ADAS-Cog Item Discrepancies at Baseline N=155 Frequency Frequency Frequency Frequency Rater ADAS-Cog Total Score (N=123) Central Review ADAS-Cog Total Score (N=146) Absolute Difference ADAS-Cog Total Score (N=119) Sum of Absolute Difference ADAS-Cog Items (N=110) [email protected] www.neurocogtrials.com ADAS-Cog Item A B

Impact of Expert Data Review and Centralized Scoring on Test … · 2015. 11. 6. · Central review by expert data monitors can improve test-retest reliability of the ADAS-Cog and

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Impact of Expert Data Review and Centralized Scoring on Test … · 2015. 11. 6. · Central review by expert data monitors can improve test-retest reliability of the ADAS-Cog and

RESULTSBACKGROUND

Impact of Expert Data Review and Centralized Scoring

on Test-Retest Reliability of the ADAS-CogAlexandra S. Atkins, Ph.D.1, Ioan Stroescu, Ph.D.1, Vicki G. Davis, Dr.PH.1, Tiffany Williamson, BA1, Kelley Boyd, BS1, Lyn Harper Mozley, PhD2, Tiffini Voss, MD2,Kerry Budd McMahon2, and Richard S.E. Keefe, Ph.D. 1,3

1NeuroCog Trials (Durham, NC), 2Merck & Co (Kenilworth, NJ), 3Duke University (Durham, NC)

• Data included 3 pretreatment ADAS-Cog12 assessments for 155 randomized subjects

with a diagnosis of mild-moderate AD. Tests were administered and scored by trained,

certified raters at investigational sites. In order to qualify for rater training and

certification, potential raters were required to submit credentials for review and

approval. Once approved, raters completed all training and certification activities prior

to testing.

• Following each ADAS-Cog12 administration in the trial, source documents and video media were submitted for central review by expert data monitors. Following review, raters received feedback regarding scoring and administration performance. Raters with numerous or egregious errors were remediated to improve future testing.

• Test-retest reliability of site-reported and central scores was assessed with intraclasscorrelation coefficients (ICCs) across 2 pretreatment test-retest intervals. Discrepancies between site-reported and central scores were described using the average sum of absolute differences (ASAD).

• The impact of central scoring on rates of missing data was also examined by comparing the rates of missing data for site-reported and central scores.

METHODS

Central review by expert data monitors can improve test-retest reliability of the ADAS-Cog and substantially reduce the rate of missing data in clinical trials for

mild-moderate AD. These findings suggest inclusion of central data review within a trial can improve the integrity of data collection of cognitive endpoints.

CONCLUSIONS

DISCLOSURES

REFERENCES

(1) Connor & Sabbagh (2008). J Alzheimers Dis. 15(3): 461–464. (2) Schafer et al.(2011). Curr Alzheimer Res. 8(4):3736.

AS Atkins is a full-time employee of NeuroCog Trials, Durham, NC, USA, and has received support from National Institute of Mental Health. I Stroescu, V Davis, T Williamson and K Boyd are employees of NeuroCog Trials. L Harper Mozley, T Voss and K Budd

McMahon are employees of Merck & Co, Inc., Kenilworth, NJ, USA. RSE Keefe currently or in the past 3 years has received investigator-initiated research funding support from the Department of Veteran’s Affair, Feinstein Institute for Medical Research,

GlaxoSmithKline, National Institute of Mental Health, Novartis, Psychogenics, Research Foundation for Mental Hygiene, Inc., and the Singapore National Medical Research Council. He currently or in the past 3 years has received honoraria, served as a consultant,

or advisory board member for Abbvie, Akebia, Amgen, Asubio, AviNeuro/ChemRar, BiolineRx, Biogen Idec, Biomarin, Boehringer-Ingelheim, Eli Lilly, EnVivo/FORUM, GW Pharmaceuticals, Janssen, Lundbeck, Merck, Minerva Neurosciences, Inc., Mitsubishi,

Novartis, NY State Office of Mental Health, Otsuka, Pfizer, Reviva, Roche, Sanofi/Aventis, Shire, Sunovion, Takeda, Targacept, and the University of Texas South West Medical Center. Dr. Keefe receives royalties from the BACS testing battery, the MATRICS Battery

(BACS Symbol Coding) and the Virtual Reality Functional Capacity Assessment Tool (VRFCAT). He is also a shareholder in NeuroCog Trials and Sengenix.

RESULTS

ADAS-Cog Intraclass Correlation Coefficients

Comparison of Site-Reported and Central Scores ADAS-Cog Total at Baseline*

Average Sum of Absolute Differences (ASAD): For each subtest, the absolute difference between the site-reported score and the central score is calculated. The absolute difference is then summed across all subtests for each rater. Then the average sum of absolute differences is calculated across all raters. The lower the ASAD, the greater the average agreement between raters’ scores and the reference standard. An ASAD of 0 indicates perfect agreement.

ICC

(N=68)*

Screening to Run-In

(~25 days apart)

Run-In to Baseline

(~14 days apart)

Site 0.78 0.80

Central Review 0.83 0.84

*Sample Size is based on number of assessments with total scores from both the Site and Central

Review at all 3 pre-treatment visits.

ICCScreening to Run-In

(~25 days apart)

Run-In to Baseline

(~14 days apart)

(N=85)* (N=96)*

Site 0.78 0.75

Central Review 0.83 0.81

*Sample Size is based on number of assessments with total scores from both the Site and Central

Review at the 2 visits being compared.

0

5

10

15

20

25

30

35

40

Invalid Missing Recovered Corrected

Number of Invalid, Missing, Recovered and Corrected Data Points by

ADAS-Cog Item*

• ICCs for both test-retest intervals were improved for central scores compared to site-reported scores:

– Using all available data (Panel A), Screening to Run-In ICCs were .78 for site reported scores and improved to .83 for central scores.

– Run-In to Baseline ICCs were .75 for site-reported scores and improved to .81 for central scores.

• Missing or invalid data were present in 28% (44/155) of site-reported baseline assessments and impacted the total score in 23% (35/155) of tests.

The baseline ASAD for the remaining 120 assessments was 2.2 (SD=3.55).

• Total ADAS-Cog scores were recovered by central reviewers for all but 1 assessment. An additional 8 assessments were deemed invalid due to

egregious errors in administration detected during central review, resulting in a final missing/invalid data rate of 6% (9/155).

The Alzheimer's Disease Assessment Scale Cognitive Subscale (ADAS-Cog) is the

primary neurocognitive outcome measure in many clinical trials for mild-moderate

Alzheimer’s disease (AD). Despite wide use, variations in administration and

scoring are well-documented (1), and may reduce reliability and sensitivity of the

measure (2).

Clinical trials devote considerable effort and resources to rater training and quality

assurance of ADAS-Cog endpoints. To examine the efficacy of these efforts in

improving ADAS-Cog reliability, we evaluated the impact of central data review on

ADAS-Cog test-retest reliability in an ongoing, multicenter, placebo-controlled

treatment trial (NCT01852110).

CONTACT INFORMATION

For more information, please contact Dr. Atkins: [email protected], (919) 401-4642, www.neurocogtrials.com

25%

23%

18%18%

14%14%13%13%12%12%

10%

8%7% 6% 6% 6% 6% 6% 6% 5% 5%

4%3% 2% 1% 1% 1%

0%

5%

10%

15%

20%

25%

30%

0123456789

101112131415

10 15 20 25 30 35 40 45 50 55 60 65

0123456789

101112131415

10 15 20 25 30 35 40 45 50 55 60 65

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

05

1015202530354045505560657075

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Fre

qu

en

cy

Sc

ore

Dis

cre

pa

nc

ies

ADAS-Cog Item Discrepancies at Baseline

N=155

Fre

qu

en

cy

Fre

qu

en

cy

Fre

qu

en

cy

Fre

qu

en

cy

Rater – ADAS-Cog Total Score (N=123) Central Review – ADAS-Cog Total Score (N=146)

Absolute Difference – ADAS-Cog Total Score (N=119) Sum of Absolute Difference – ADAS-Cog Items (N=110)

[email protected]

www.neurocogtrials.com

ADAS-Cog Item

A

B