1
Introduction . o Daily diaries are utilized in clinical trials and observational studies, to record patient health-related quality of life (HRQOL) over a time-limited recall period (eg, 24 hours) over multiple days 1 o Useful for symptomatic conditions such as pain due to day-to-day variability. o Applicable when day-to-day recall may be affected (eg, mood or health state) o The burden of daily report can lead to missing data and biased interpretation o Typically methodologists choose a minimum number of non-missing responses (days) to calculate an average score (eg, 4-5 days to generate a weekly average) o Longitudinal study factors may influence the amount of missing data: 1) more severe patients may have an increased likelihood of missing diary entries (Table 1) and 2) missing data tends to increase as a trial progresses 2,3 Griffiths P, 1 Floden E, 2 Doll H, 1 Morris M, 2 Hudgens S 2 1 Clinical Outcomes Solutions, Folkestone, UK; 2 Clinical Outcomes Solutions, Tucson, AZ, USA. Result s . o Reliability bias increased as the proportion of missing data increased o Higher bias for MNAR compared with MAR o When participants have up to 4 days of missing data: o MNAR or MAR: reliability estimates were less biased than complete case analysis o When participants have 5 days of missing data: o MAR: Reliability estimates were comparable to complete case analysis o MNAR: Reliability estimates were more biased than complete case analysis o Loss of power for missingness mechanisms compared to the fully observed data showed MDA assuming MNAR and MAR mostly performed better then CCA (Table 2) o Over 90% of MDA samples reached the critical threshold (>0.70) with 4 days missing o Lower 95% CIs were mostly <0.70 for all missingness scenarios with >1 day missing Conclusions . For a simulation of real clinical trial pain scores where the ICC was approx. 0.8, reliability was either comparable or less biased when calculating weekly averages from available data irrespective of how many days present compared with a complete case analysis When at least 3 days of data are present, more reliable estimates are obtained from using all available data to create participant mean scores As few as two days of data could be justified in the right context For example, when only 10% or 20% of the population have missing data When 50% sample was missing data, although most MDA point estimates achieved reliability (>0.70), lower 95% CI of ICC was poor (<0.70) This was for an extreme case (50% of population with missing data) Implications Calculating a weekly average pain score utilising all available data maintains power compared with complete case analysis without adversely affecting instrument reliability Utilizing all available patient data is more inline with Intention to Treat principles Psychometric Properties in the Face of Missing Data A Simulation Study Assessing the Effect of Missing Data on Test-Retest Reliability in Diary Studies Method s . Simulation of test-retest data o Correlation matrix was designed from a real trial data of pain recorded on an 11-point NRS o 1000 datasets of N = 100 o Time 1 data: Each “participant” had 7 days of integer pain scores on a 0–10 scale o Time 2 data: a second timepoint was simulated by adding a random value with mean=1 and SD=1 to each day of each participants score o All scores were rounded to the nearest integer o Mean (SD) ICC over 1000 datasets = 0.82 (0.024) Missingness o Missingness was created in the dataset for MCAR, MAR and MNAR mechanisms (described in Table 1) o 10 %, 20%, 30%, 40% and 50% of the sample had missing data assigned, according to the missing mechanism o 1 day, 2 days, 3 days, 4 days and 5 days of data were deleted according to the missing mechanism o 7 days of deleted data (complete case analysis) was also used as a comparator o Thus, for each scenario a new dataset was created (eg 10% of population missing 1 day, 10% missing 2 days etc.) Testing reliability in Weekly Mean Scores o A reliability index (the intraclass correlation [ICC], Box 1) and associated confidence interval (CI) was averaged over all samples for the fully observed data and all missingness scenarios: 1. For each missing data scenario, the difference between the fully observed data ICC and the missing data scenario ICC was calculated 2. Equivalence of the ICC between the fully observed data simulation and each missingness condition was tested at 0.05 and 0.10 ICC units difference (Figure 1) 3. Number of samples achieving a reliability estimate of ≥0.70 (Table 2) Background . o Previously, we assessed bias of mean diary scores including missing observations: 4 o Complete cases analysis (CCA) had larger bias than missing data analysis: o For more than 50% of CCA samples, the estimate bias was larger than 0.50 standard deviations (SD) of the true mean o Missing data analysis (MDA), using estimates derived from all available records, led to less bias: o For all MDA samples, the estimate bias was smaller than 0.20 SDs of the true mean o False positive rate (Type 1 error) comparing group differences for the CCA or MDA was controlled regardless of the number of missing diary days o Compared with fully observed data, power to detect a difference between group mean scores (1-Type 2 error) was reduced in the CCA (>15% lower) but not in the MD (All mechanisms: <2% lower) o Therefore, excluding patients with missing data: o reduces power compared with using all available data, and o leads to larger biases in score estimates Mechanism Description Simulation Technique Missing Completely at Random (MCAR) Not related to health status (eg, forgot diary) 1. Participants chosen at random to receive missing data 2. Days selected for deletion had equal weighting Missing at Random (MAR) Related to other (observed) data (eg, health status gets worse immediately prior to the missing event) 1. Participants chosen to receive missing data weighted by severity 2. Days more likely to be deleted weighted by the next most recent score Missing Not at Random (MNAR) Related to the unobserved (ie, missing) variable (eg, missing due to severity – patient too ill to complete) 1. Participants chosen to receive missing data weighted by severity 2. Days more likely to be deleted weighted by the severity of each days score Table 1 Missingness Mechanisms Objective s . o To understand the impact of the type of missingness (MCAR, MAR and MNAR) on reliability in a simulated test-retest study using a pain numeric rating scale (NRS). Specifically, we assess: 1. Bias of reliability estimates between complete case analysis and missing data analysis (assigned under three missingness mechanisms) 2. The loss of power of reliability estimates between fully observed data and both complete case analysis and missing data analysis Figure 2 ICCs of 100 Individual Samples compared with Fully Observed Data and MCAR Missing Not at Random Missing at Random 2 Days Missing 4 Days Missing 5 Days Missing Figure 3 Complete Case Analysis DAYS MISSING MAR MNAR Estimate Lower CI Estimate Lower CI Fully Observed Data 100% 88.4% 100% 88.4% 1 Day 100% 81.2% 100% 82.5% 2 Days 100% 69.1% 99.9% 68.3% 3 Days 99.6% 50.4% 99.4% 42.3% 4 Days 97.1% 27.0% 92.9% 12.5% 5 Days 88.3% 11.6% 52.8% 1.1% 7 Days 57.6% 7.5% 57.6% 7.5% Table 2 Proportion of Samples Achieving ≥0.70 Reliability (50% Missing Scenario) Waterfall Plots showing reliability (ICC) BLUE LINE: fully observed data RED LINE: MCAR samples BARS represent each sample Data show that reliability estimates for MAR and MNAR data at 4 Days missing are less biased than CCA MAR data at 5 Days missing are similar to CCA Poster presented at the International Society for Quality of Life Research 25 th Annual Conference, 24 th –27 th October 2018, Dublin, Ireland References: (1) Coons SJ, Eremenco S, Lundy JJ, O’Donohoe P, O’Gorman H, & Malizia W (2015). Capturing patient-reported outcome (PRO) data electronically: the past, present, and promise of ePRO measurement in clinical trials. The Patient-Patient-Centered Outcomes Research, 8(4), 301-309. (2) Fairclough DL (2010). Design and analysis of quality of life studies in clinical trials. Chapman and Hall/CRC.(3) Seitz C, Lanius V, Lippert, S, Gerlinger C, Haberland C, Oehmke F, & Tinneberg HR (2018). Patterns of missing data in the use of the endometriosis symptom diary. BMC women's health, 18(1), 88. (4)Griffiths P, Floden E, and Hudgens S (2017) Scoring and Interpretation of Daily Diary Data in the Presence of Non-ignorable Missing Data. International Society for Quality of Life Research. (5)Bell ML, & Fairclough DL (2014). Practical and statistical issues in missing data for longitudinal patient- reported outcomes. Statistical methods in medical research, 23(5), 440-459. (6) McGraw KO, & Wong SP (1996). Forming inferences about some intraclass correlation coefficients. Psychological methods, 1(1), 30. Figure 1 Mean ICC Differences from Fully Observed Data Missing at Random Missing Not at Random Average ICC Difference from Full Observed Data Average ICC Difference from Full Observed Data X Axis – Samples ordered by reliability (ICC) from lowest to highest Box 1 ICC (2,1) Formula for Agreement 6 = 2 2 + 2 + 2 Where = between subjects variance, = within subjects variance and = Error variance

A Simulation Study Assessing the Effect of Missing Data on ......Capturing patient-reported outcome (PRO) data electronically: the past, present, and promise of ePRO measurement in

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Simulation Study Assessing the Effect of Missing Data on ......Capturing patient-reported outcome (PRO) data electronically: the past, present, and promise of ePRO measurement in

Introduction .o Daily diaries are utilized in clinical trials and observational studies, to record

patient health-related quality of life (HRQOL) over a time-limited recall period (eg,

24 hours) over multiple days 1

o Useful for symptomatic conditions such as pain due to day-to-day variability.

o Applicable when day-to-day recall may be affected (eg, mood or health state)

o The burden of daily report can lead to missing data and biased interpretation

o Typically methodologists choose a minimum number of non-missing responses

(days) to calculate an average score (eg, 4-5 days to generate a weekly average)

o Longitudinal study factors may influence the amount of missing data: 1) more

severe patients may have an increased likelihood of missing diary entries (Table 1)

and 2) missing data tends to increase as a trial progresses 2,3

Griffiths P,1 Floden E,2 Doll H,1 Morris M,2 Hudgens S2

1Clinical Outcomes Solutions, Folkestone, UK; 2Clinical Outcomes Solutions, Tucson, AZ, USA.

Results .o Reliability bias increased as the proportion of missing data increased

o Higher bias for MNAR compared with MAR

o When participants have up to 4 days of missing data:

o MNAR or MAR: reliability estimates were less biased than complete case analysis

o When participants have 5 days of missing data:

o MAR: Reliability estimates were comparable to complete case analysis

o MNAR: Reliability estimates were more biased than complete case analysis

o Loss of power for missingness mechanisms compared to the fully observed data showed

MDA assuming MNAR and MAR mostly performed better then CCA (Table 2)

o Over 90% of MDA samples reached the critical threshold (>0.70) with 4 days missing

o Lower 95% CIs were mostly <0.70 for all missingness scenarios with >1 day missing

Conclusions .• For a simulation of real clinical trial pain scores where the ICC was approx. 0.8, reliability

was either comparable or less biased when calculating weekly averages from availabledata irrespective of how many days present compared with a complete case analysis

• When at least 3 days of data are present, more reliable estimates are obtained fromusing all available data to create participant mean scores• As few as two days of data could be justified in the right context

• For example, when only 10% or 20% of the population have missing data• When 50% sample was missing data, although most MDA point estimates achieved

reliability (>0.70), lower 95% CI of ICC was poor (<0.70)• This was for an extreme case (50% of population with missing data)

Implications• Calculating a weekly average pain score utilising all available data maintains power

compared with complete case analysis without adversely affecting instrument reliability• Utilizing all available patient data is more inline with Intention to Treat principles

Psychometric Properties in the Face of Missing DataA Simulation Study Assessing the Effect of Missing Data on Test-Retest

Reliability in Diary Studies

Methods .Simulation of test-retest data

o Correlation matrix was designed from a real trialdata of pain recorded on an 11-point NRS

o 1000 datasets of N = 100o Time 1 data: Each “participant” had 7 days of

integer pain scores on a 0–10 scaleo Time 2 data: a second timepoint was simulated

by adding a random value with mean=1 andSD=1 to each day of each participants score

o All scores were rounded to the nearest integero Mean (SD) ICC over 1000 datasets = 0.82 (0.024)

Missingness

o Missingness was created in the dataset for

MCAR, MAR and MNAR mechanisms (described

in Table 1)

o 10%, 20%, 30%, 40% and 50% of the sample

had missing data assigned, according to the

missing mechanism

o 1 day, 2 days, 3 days, 4 days and 5 days of

data were deleted according to the missing

mechanism

o 7 days of deleted data (complete case

analysis) was also used as a comparator

o Thus, for each scenario a new dataset was

created (eg 10% of population missing 1

day, 10% missing 2 days etc.)

Testing reliability in Weekly Mean Scores

o A reliability index (the intraclass correlation[ICC], Box 1) and associated confidence interval(CI) was averaged over all samples for the fullyobserved data and all missingness scenarios:1. For each missing data scenario, the

difference between the fully observed dataICC and the missing data scenario ICC wascalculated

2. Equivalence of the ICC between the fullyobserved data simulation and eachmissingness condition was tested at 0.05and 0.10 ICC units difference (Figure 1)

3. Number of samples achieving a reliabilityestimate of ≥0.70 (Table 2)

Background .o Previously, we assessed bias of mean diary scores including missing

observations:4

o Complete cases analysis (CCA) had larger bias than missing data analysis:

o For more than 50% of CCA samples, the estimate bias was larger than 0.50

standard deviations (SD) of the true mean

o Missing data analysis (MDA), using estimates derived from all available records,

led to less bias:

o For all MDA samples, the estimate bias was smaller than 0.20 SDs of the

true mean

o False positive rate (Type 1 error) comparing group differences for the CCA or MDA

was controlled regardless of the number of missing diary days

o Compared with fully observed data, power to detect a difference between group

mean scores (1-Type 2 error) was reduced in the CCA (>15% lower) but not in the

MD (All mechanisms: <2% lower)

o Therefore, excluding patients with missing data:o reduces power compared with using all available data, and

o leads to larger biases in score estimates

Mechanism Description Simulation Technique

Missing Completely at Random(MCAR)

Not related to health status (eg, forgot diary)

1. Participants chosen at random to receive missing data

2. Days selected for deletion had equal weighting

Missing at Random (MAR)

Related to other (observed) data (eg, health status gets worse immediately prior to the missing event)

1. Participants chosen to receive missing data weighted by severity

2. Days more likely to be deleted weighted by the next most recent score

Missing Not at Random (MNAR)

Related to the unobserved (ie, missing) variable (eg, missing due to severity – patient too ill to complete)

1. Participants chosen to receive missing data weighted by severity

2. Days more likely to be deleted weighted by the severity of each days score

Table1 Missingness Mechanisms

Objectives .o To understand the impact of the type of missingness (MCAR, MAR and MNAR) on

reliability in a simulated test-retest study using a pain numeric rating scale (NRS).

Specifically, we assess:

1. Bias of reliability estimates between complete case analysis and missing data

analysis (assigned under three missingness mechanisms)

2. The loss of power of reliability estimates between fully observed data and

both complete case analysis and missing data analysis

Figure 2 ICCs of 100 Individual Samples compared with Fully Observed Data and MCAR

Missing Not at RandomMissing at Random

2 D

ays

Mis

sin

g4

Day

s M

issi

ng

5 D

ays

Mis

sin

g

Figure 3 Complete Case Analysis

DAYS MISSING

MAR MNAR

Estimate Lower CI Estimate Lower CI

Fully Observed Data

100% 88.4% 100% 88.4%

1 Day 100% 81.2% 100% 82.5%

2 Days 100% 69.1% 99.9% 68.3%

3 Days 99.6% 50.4% 99.4% 42.3%

4 Days 97.1% 27.0% 92.9% 12.5%

5 Days 88.3% 11.6% 52.8% 1.1%

7 Days 57.6% 7.5% 57.6% 7.5%

Table2 Proportion of Samples Achieving ≥0.70 Reliability (50% Missing Scenario)

Waterfall Plots showing reliability (ICC) • BLUE LINE: fully observed data• RED LINE: MCAR samples• BARS represent each sample

Data show that reliability estimates for MAR and MNAR data at 4 Days missing are less biased than CCA

MAR data at 5 Days missing are similar to CCA

Poster presented at the International Society for Quality of Life Research 25th Annual Conference, 24th–27th October 2018, Dublin, Ireland

References: (1) Coons SJ, Eremenco S, Lundy JJ, O’Donohoe P, O’Gorman H, & Malizia W (2015). Capturing patient-reported outcome (PRO) data electronically: the past, present, and promise of ePRO measurement in clinicaltrials. The Patient-Patient-Centered Outcomes Research, 8(4), 301-309. (2) Fairclough DL (2010). Design and analysis of quality of life studies in clinical trials. Chapman and Hall/CRC.(3) Seitz C, Lanius V, Lippert, S, Gerlinger C,Haberland C, Oehmke F, & Tinneberg HR (2018). Patterns of missing data in the use of the endometriosis symptom diary. BMC women's health, 18(1), 88. (4)Griffiths P, Floden E, and Hudgens S (2017) Scoring and Interpretationof Daily Diary Data in the Presence of Non-ignorable Missing Data. International Society for Quality of Life Research. (5)Bell ML, & Fairclough DL (2014). Practical and statistical issues in missing data for longitudinal patient-reported outcomes. Statistical methods in medical research, 23(5), 440-459. (6) McGraw KO, & Wong SP (1996). Forming inferences about some intraclass correlation coefficients. Psychological methods, 1(1), 30.

Figure 1 Mean ICC Differences from Fully Observed Data

Missing at Random

Missing Not at Random

Average ICC Difference from Full Observed Data

Average ICC Difference from Full Observed Data

X Axis – Samples ordered by reliability (ICC) from lowest to highest

Box1 ICC(2,1) Formula for Agreement6

𝐼𝐶𝐶 =𝜎𝐵2

𝜎𝐵2 + 𝜎𝑊

2 + 𝜎𝑒2

Where 𝐵 = between subjects variance, 𝑊 = within subjects variance and 𝑒 = Error variance