Data Issues: Quality and Analysis By Ziyad Mahfoud, Ph.D.
Associate Professor of Biostatistics Department of Public Health
Ziyad Mahfoud, Ph.D.1
Slide 2
Sample Size and Power The sample size of the study is usually
based on several factors such as the study design, type of main
outcome, expected differences between cases and controls with
regards to the main outcome, significance level (usually set at 5%)
and the power (usually set somewhere between 80% and 90%) of the
study. The power of the study is the ability of the study to detect
significant differences/associations when in reality those
differences/associations exist. In case there was a drop in the
power of the study then this means, even if significant
associations exist the study will have diminished ability to detect
them. Ziyad Mahfoud, Ph.D.2
Slide 3
The most common reason for a drop in the power of the study is
the inability to reach the sample size needed for the analysis. Not
reaching analyzable number of patients could be due to: Recruitment
problems Data quality problems Ziyad Mahfoud, Ph.D.3
Slide 4
Recruitment Problem 1: The number of recruited people We will
keep a track of the number of recruited patients from each of the 5
sites The numbers will be compared to those expected at the start
of the study If present, slow recruitment will be noted and the PIs
will be informed for possible corrective measures. Ziyad Mahfoud,
Ph.D.4
Slide 5
Expected number recruited per year SiteExpected number /year
Jordan350 Kingdom of Saudi Arabia100 Lebanon300 Qatar130 United
Arab Emirates250 Ziyad Mahfoud, Ph.D.5
Slide 6
Recruitment Problem 2: The type of recruited people Ineligible
patients: Patients should be at least 18 years old, with Arab as
self reported ancestry and RA diagnosed according to the American
College of Rheumatology (ACR) criteria Controls will be recruited
simultaneously with the cases. That is we will NOT wait till the
end of cases recruitment to start with the controls. Past
recruitment, controls will be matched to cases based on age, gender
and self reported ancestry. To minimize recruitment of too many
unnecessary controls, track sheets will be given to each center
where age, gender and self reported ancestry of recruited people
(cases and controls) will be recorded. This will allow each center
to see from the cases and controls recruited who matches with who
so that they have an idea of what type of controls they are still
missing and need to recruit. Ziyad Mahfoud, Ph.D.6
Slide 7
Data Quality Problem 1-missing data Missing data especially on
the main outcome result in the participant not contributing to the
main analysis of the study For example if for one participant we
are missing his/her blood samples Missing data on variables used
for matching (age, gender and self reported ancestry) will render
us unable to find a match and hence that patient will not
contribute to the study analysis Missing data on other covariates
diminishes our ability to find associations between such variables
and the main outcomes. For example cigarette smoking status Ziyad
Mahfoud, Ph.D.7
Slide 8
Main outcome Blood Need to make sure that you get blood from
each patient Need to make sure that you get the required amount of
blood from each sample Need to make sure that it is well preserved
and shipped. Ziyad Mahfoud, Ph.D.8
Slide 9
Each participant has Demographic and clinical questionnaire
Ancestry questionnaire Blood samples Labels on all three entities
has to be present/or entered and be exactly the same. Missing
labels or unmatching labels (for the same patient) render data not
usable for analysis Ziyad Mahfoud, Ph.D.9 Data Quality Problem 2-
patient numbers/labels
Slide 10
Quality Control starts with you since data is collected through
interviews and not self reports by patients. Choosing eligible
patients and controls interviewing them and collecting the blood
and storing it are all controlled by you. In general interviews
yield minimal number of missing data; for example, when a patient
refuses to answer a certain question. When obtaining informed
consent from potential participants (Cases and Controls) you have
to be very clear that the consent is for the questionnaire and for
the blood sample. This will minimize later refusals by participants
(cases and controls) to give blood. Ziyad Mahfoud, Ph.D.10
Slide 11
What are we trying to do to help you? Using surveygizmo for
data collection/entry, the program does not allow you to skip very
important questions (those pertaining to eligibility criteria,
patient number..etc). Moreover, in some instances it will not allow
you to put answers that are out of range such as age. We will check
the data periodically to track patients recruitment and look for
quality control issues that we discussed here. Quarterly news
letters will be sent to all investigators and coordinators for
updates. Also, monthly Skype calls to discuss progress and other
issues that arise. Recruitment or Quality control issues will be
reported to the PI and the co-PI in the concerned center for
correction measures to be taken. Ziyad Mahfoud, Ph.D.11