23
33 3 Management of Data in Clinical Trials, Second Edition, by Eleanor McFadden Copyright © 2007 John Wiley & Sons, Inc. DATA DEFINITION, FORMS, AND DATABASE DESIGN The identification of the data items to be collected for the trial, the subsequent design of the data collection instruments, and the design of the computer database are critical to the success of a trial. These three activities are inter- related and are best done in the order listed. That is, first comes the identifica- tion of the data items to be collected for the trial, then the design of the data collection forms, and then the design and setup of the trial database. It is always useful to look at the data collected for other, similar trials before designing new forms. If there are existing forms that can be used for the new trial, this eliminates a lot of extra work, so it is worthwhile consulting with colleagues to see what forms they have. If existing forms are used as the start- ing point, it is important that the forms be modified if necessary to collect only the data items which are relevant to this particular trial. The data collection instruments, whether paper or electronic, should be available prior to entry of the first patient on the trial. A trial should not be activated without data col- lection instruments in place, regardless of how urgent it is to start accrual. DEFINING DATA ITEMS TO BE COLLECTED There are different types of data that may need to be collected for a trial, and during the planning phase of a study it is important to think through all the requirements for the trial. For example, along with collecting the research data, it may also be necessary to collect data to (a) help with the administration of

Management of Data in Clinical Trials || Data Definition, Forms, and Database Design

  • Upload
    eleanor

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

33

3

Management of Data in Clinical Trials, Second Edition, by Eleanor McFaddenCopyright © 2007 John Wiley & Sons, Inc.

DATA DEFINITION, FORMS, AND DATABASE DESIGN

The identifi cation of the data items to be collected for the trial, the subsequent design of the data collection instruments, and the design of the computer database are critical to the success of a trial. These three activities are inter-related and are best done in the order listed. That is, fi rst comes the identifi ca-tion of the data items to be collected for the trial, then the design of the data collection forms, and then the design and setup of the trial database. It is always useful to look at the data collected for other, similar trials before designing new forms. If there are existing forms that can be used for the new trial, this eliminates a lot of extra work, so it is worthwhile consulting with colleagues to see what forms they have. If existing forms are used as the start-ing point, it is important that the forms be modifi ed if necessary to collect only the data items which are relevant to this particular trial. The data collection instruments, whether paper or electronic, should be available prior to entry of the fi rst patient on the trial. A trial should not be activated without data col-lection instruments in place, regardless of how urgent it is to start accrual.

DEFINING DATA ITEMS TO BE COLLECTED

There are different types of data that may need to be collected for a trial, and during the planning phase of a study it is important to think through all the requirements for the trial. For example, along with collecting the research data, it may also be necessary to collect data to (a) help with the administration of

34 DATA DEFINITION, FORMS, AND DATABASE DESIGN

the trial and (b) document compliance with regulations and Good Clinical Practice (GCP).

Identifi cation Data

When forms are submitted for a trial, it is essential that they be linked to the appropriate patient and also to the correct trial. Often one Coordinating Center is responsible for the collection of data on many clinical trials, not just one. A form must therefore have space for recording suffi cient information for correct identifi cation. This should include a patient identifi er and, unless it is preprinted on the form, a trial identifi er. It may also be useful to collect the name of the institution that submits the data in case there are errors in other identifi cation data. Knowing the institution often allows errors to be detected, and they can then be contacted for corrections to the erroneous data.

Research Data

Research data provide the information that is analyzed to answer the ques-tions being asked in the study objectives. The identifi cation of these data items should be done during the protocol development phase and should involve all members of the trial team so that they all have the opportunity to input ideas from their perspective.

It is always tempting to collect data items “ just in case ” they turn out to be interesting when the data are analyzed. However, collecting too much data can be detrimental because, as the volume of required data increases, the more the quality of data recorded on case report forms can decrease. It is therefore important to (a) limit data collection to those items which are truly necessary to answer the trial objectives and (b) to manage the trial. In assessing the data requirements, the team should distinguish between data needed for the clinical care of the patient and the data needed to answer the research objectives. Data collection for the trial should be limited to the research data. For example, to ensure that the patient has appropriate clinical care, it may be important to know the exact time when medication was given, but this level of detail may not be important when analyzing the data for the trial. The information will be recorded in the patient ’ s medical record, but there is no need to collect it on the case report form. In many trials, only a fraction of the clinically relevant data will end up as part of the trial database.

Omissions of critical data at this stage are almost impossible to overcome once the trial is underway, since it is hard to collect data retrospectively. If specifi c information was not recorded in the medical record at the time the patient was seen, it is lost to the trial. The data would therefore be missing for every patient entered before the omission was detected, and the data collec-tion forms were amended to include the data. It is even worse if the missing data are not discovered until the fi nal analysis is done, because then the data will be missing for every patient.

To avoid this, during the development stages the statistician and principal investigator should prepare a preliminary analysis plan outlining the informa-tion that will be included in the fi nal report of the trial. The other members of the trial team can review this and provide input. Once this has been done, it will be easier to identify the data items that need to be available for eventual analysis. At minimum, this will usually include key dates of events such as date of entry on the trial, information about patient safety while on the trial, and information on the study endpoints for each patient.

Administrative Data

In addition to the data required to answer the research questions, it is usually necessary to collect administrative data to help with the management of the trial. The amount of administrative data depends to a great degree on the complexity of the trial structure. In a small, single institution trial, much less information is needed than in a large multicenter trial where data and materials are being shipped to various locations.

During the trial, the Coordinating Center must be able to communicate with participants for a variety of reasons, such as dissemination of new versions of the protocol, requests for missing/overdue data, or responses to data queries. This means that the Coordinating Center must be able to link a patient to a specifi c institution and maintain a roster of contact details for that institution. This could include name, address, telephone and fax numbers, and names, titles, and email addresses for key trial personnel at that institution.

If the trial involves shipping of materials to a Reference Center, then a tracking system is needed to record what materials were sent, when they were sent, when they were received, and, if relevant, when they were returned to the institution. Other administrative data depend on the specifi cs of the trials, but could include information about monitors assigned to each site, dates of monitor visits, or drug supply levels at each site.

Regulatory Data

Most trials will call for maintenance of some documentation that shows com-pliance with regulatory requirements. The level of detail will depend on the type of trial. A registration trial of an investigational agent sponsored by a pharmaceutical company will require more than an academic trial using drugs that have already been approved. The types of documentation needed could include Ethics Committee approvals for the protocol and all amendments, copies of patient consent forms, and qualifi cations of personnel at the partici-pating sites. In some instances, instead of collecting copies of all documents, it may be acceptable instead to record that the institution confi rmed that the documents exist. Any requirements for collection/submission of regulatory documents should be defi ned prior to the start of the trial, along with any

DEFINING DATA ITEMS TO BE COLLECTED 35

36 DATA DEFINITION, FORMS, AND DATABASE DESIGN

requirements for providing updates throughout the course of the trial. More information about regulatory requirements can be found in Chapter 8 .

Reference Center Data

If trial materials are being reviewed at a Reference Center, it is important to defi ne the data that will be generated as a result of the review and to decide how the data will be communicated to the participating institutions and to the Coordinating Center. There may also be administrative or quality control data that need to be collected at the Reference Center(s), such as (a) inventories of materials received and dates of receipt and (b) results of routine checks on the accuracy of the techniques being used for review. The Coordinating Center would be responsible for making sure that all these requirements are defi ned before the trial activates. For example, if X rays are reviewed at a central Reference Center, and data resulting from these reviews are entered into a local database at the Reference Center, with periodic electronic transfer of the data to the Coordinating Center, it is important that the key identifi cation data items be included in the records and that they be in the same format as used in the Coordinating Center database. The Coordinating Center should be told of the structure, format, and meaning of all the data items. Any changes to the Reference Center database during the study should also be conveyed to the Coordinating Center and, ideally, be discussed and agreed on by both Centers prior to implementation. If the Reference Center adds a data item in the middle of the electronic record and does not tell the Coordinating Center, there will be problems with interpretation of the electronic data transferred to the Coordinating Center.

DESIGN OF CASE REPORT FORMS

Once the data items have been defi ned, case report forms (CRFs) are devel-oped. There are different terminologies used for forms. Sometimes all pages together in a book for one patient are called the CRF, sometimes each type of page is called a CRF and the patients have multiple CRFs for the trial. The latter terminology is used for examples in this text. Forms can be paper or electronic, and many of the design issues apply to both. The fi rst part of this section deals primarily with design of paper forms, and issues specifi c to elec-tronic forms are highlighted at the end of the section.

A case report form is a printed or electronic document that is designed to collect required research, administrative, and/or regulatory data for a clinical trial. The measurement and recording of the trial data are perhaps the most critical steps in the overall data management process, and it is therefore impor-tant that the CRFs be designed with clarity and ease of use in mind. The design of CRFs has a direct impact on the quality of the data collected for a trial, so it is worthwhile to take time over the design and development of the forms

and to develop a layout that is user - friendly. Case report forms should always be available before a trial is activated. Activating a trial without the CRFs in place is likely to result in a trial with incomplete and inconsistent data. There-fore the urgency to activate a trial should always be balanced by the need to have these important tools in place. It is recommended that, whenever pos-sible, forms be piloted by some of the prospective trial participants prior to activation of the trial. The piloting can be done by completing the proposed forms using historical data from appropriate medical records available at the sites. This process allows the eventual users of the forms to have meaningful input into the design, and piloting the used of the forms can often identify problems that can be corrected prior to implementation of the forms in real time.

The data collection forms should be concise and collect only the necessary data. When designing forms for a trial, thought should be given to the following.

Content and Organization of Case Report Forms

The ultimate objective of the CRF is to collect data that answer the trial ’ s objectives. Once the required data items have been identifi ed, it is necessary to decide how to organize these data items on the forms. When designing forms, keep in mind that it is not always best to minimize the number of forms by trying to fi t as much as possible onto one page. It may be better to have more forms, each with a small amount of data. When deciding on the forms that are needed for a trial, ask yourself:

1. When will data be available? 2. Where will the data be collected? 3. Who will be completing the forms?

As a fi rst step, the timing of collection of the different items should be estab-lished. For example, identify all the data that are to be collected at the time that the patient is entered on the study. This normally includes data on the patient ’ s past medical history, data that confi rm the patient ’ s eligibility, and results of baseline tests required by the protocol. Other relevant time points could be the data collected during all or part of the protocol treatment (it could be each cycle, for example), the data collected when a patient completes treatment, and any data collected as follow - up after the treatment ends. All of these are logical divisions and can help in deciding which data items belong on which forms.

Besides the timing of the data collection, it is also useful to identify where the data will be collected and by whom . These are also logical divisions that can help to decide which data should be collected on which form. For example, there may be baseline data gathered from the medical record by a Clinical Research Associate and other baseline data completed by a medical specialist

DESIGN OF CASE REPORT FORMS 37

38 DATA DEFINITION, FORMS, AND DATABASE DESIGN

such as a surgeon. Even though all the data are collected at the same time point, it would be more effi cient to have two different forms for recording the information — one for the Clinical Research Associate and one for the surgeon. This allows the two people to complete their parts of the data collection requirements in parallel, rather than one person having to wait for the other to complete their section of the form before passing it on. Likewise, if some of the data are available in the cardiology department and some in the physical therapy department, two separate forms may work best.

These steps will help to organize the data items into logical groups of items, and individual case report forms can be created using these groups.

Format of Questions and Coding Conventions

The purpose of case report forms is to collect complete and unambiguous data for the clinical trial and to ensure standardization and consistency of data across participating sites. Their format should be designed with three functions in mind:

• Recording data in a paper form • Data entry into the computer • Data retrieval for analysis

If data are to be entered electronically at the sites, then the fi rst bullet point will only apply if forms are completed at the site prior to them entering the data.

The person recording data on the form (paper or electronic) should be able to answer the questions and record the answers in an unambiguous and effi -cient way, minimizing any possibility of misinterpretation or transcription errors; the person who is entering data into the computer should be able to transcribe values from the paper form to the keyboard with minimum effort in following the fl ow of data and entering the data values; the person who is analyzing the data must be able to interface the data and the statistical soft-ware with a minimum of data conversion.

There are several ways to phrase questions on case report forms, but there are confl icting ideas on which is most effective in collecting complete, accurate data. Since the goals of different trials vary and the environments in which data are collected can be very different, it is recommended that the designer review other forms that have been used in similar clinical trials and adapt a format that suits the particular trial environment. If several forms are devel-oped for a trial, the most important criterion is to use a consistent design across forms so that the users can become familiar with the format used. Page layouts should be similar across forms, and the headers of the pages should be laid out in the same way, collecting the same identifi cation information. Coding conventions should also be consistent for all data items. For example, y = yes and n = no for all “ yes/no ” answers.

The questions asked on forms can call for different types of responses. Some may be text strings, such as the name of the treating physician; others may be categorical values refl ecting results of laboratory tests; still others may depend on patient characteristics and require the person completing the form to select the appropriate answer from a list of possible choices.

Lengthy text strings are usually restricted to information that will not be analyzed directly. For example, a form could require the completion of a coded variable asking whether the patient was taking antibiotics and then, if in the affi rmative, ask for the names of the antibiotics to be written on the form. In the analysis, the statistician can easily use the coded variable indicating that the patient was taking antibiotics, but unless the text string with the antibiotic names was subsequently translated into codes, it would not be easy to do any analysis on the actual drugs taken. Spelling differences alone would cause dif-fi culties in translating text strings. If it is important to know the actual drugs, then there are online dictionary coding systems available for translation of drug names into consistent codes. If text strings are collected, there needs to be adequate space to allow the handwritten text to be entered in one or more lines on the form. It is very frustrating to be expected to write something in a space that is just not big enough. Asking the user to put each character in a separate box can also be frustrating and can lead to incomplete information if there are not enough boxes.

When collecting categorical values, the important considerations are to provide the correct number of boxes for the answer; to preprint any required decimal points, commas, or other punctuation; and, when relevant, to specify the units to be used in recording the data. For example, in collecting the patient ’ s weight, the formats shown in Example 3.1 would be possible options.

If format 1 or 3 is used, the instructions for the form should indicate when the user should round the value up or down to the nearest whole number.

A common type of question on a form is one where the user has to select the correct answer from a list of given values. The following are some possible formats for these types of questions:

1. Multiple Choice Format. For a specifi c question, all possible answers are displayed on the form, and the user has to circle/check the correct answer (Example 3.2 ). This format is usually used if optical scanning is going to be used to convert the answers into electronic format, but can also be used when the data are keyed, with the operator keying a value for the relevant answer.

Example 3.1.

1. lb2. lb oz3. kg4. • kg

DESIGN OF CASE REPORT FORMS 39

40 DATA DEFINITION, FORMS, AND DATABASE DESIGN

If a multiple choice format is used, the instructions for answering should be consistent across all questions, and the user should be asked either to circle the correct answers or to check a box beside the correct answers. Mixing the two methods of indicating the answer should be avoided. Multiple choice questions can be convenient for the person completing the form, but unless the answers are being scanned electronically, they are less conducive to effi -cient data entry. Normally, as the answers are being scanned electronically, the software automatically converts them to codes, and these codes are stored in the database. If the data are being entered by a data entry operator, either the operator or a data coordinator may have to translate the answers to codes prior to entry. The multiple choice format does allow easy completion of the forms, but, because answers are converted to codes, it has no negative effect on the ease of analysis.

2. Self - Coding Forms — Numeric Codes. All possible answers to a question are displayed on the form, and each answer is assigned a numeric code. There is a space or box(es) on the form to enter the code corresponding to the correct answer.

The user enters “ 1, ” “ 2, ” or “ 9 ” in the box provided, depending on the patient ’ s gender (Example 3.3 ). This format is less intuitive for the person completing the form,. The codes themselves are meaningless in isolation, and only relevant when linked with a specifi c question and answer. However, when using this format, data entry is simplifi ed and data analy-sis can be done using the actual values recorded on the forms.

Note that the code for unknown here is given as “ 9 ” and not “ 3. ” It is good practice to assign the same value for unknown across all variables. Because this is a single digit answer and other questions may require more than 3 possible answers, “ 9 ” can be used as a consistent code for “ unknown ” for all single digit fi elds. For a double digit fi eld, “ 99 ” can be used, and in this way, the value for “ unknown ” is always the last option in the list after all other possibilities have been reviewed and considered by the user.

Example 3.2.

Patient’s gender? (Circle one) Male Female Unknown

Example 3.3.

Patient’s Gender 1. Male

2. Female 9. Unknown

3. Self - Coding Forms — Non - numeric Codes. All possible answers to a question are displayed on the form, and each answer is assigned a code that is more meaningful than a randomly assigned numeric code. There is a space or box(es) where the answer is to be entered, and the user is required to enter the code corresponding to the correct answer.

The user enters “ M, ” “ F, ” or “ U ” in the space provided (Example 3.4 ). This format is similar to the one using numeric codes, but here the possible answers are more intuitive to the person completing the form because the code being used is linked to the meaning of the code. Data entry can be slower using these codes since the operator has to use the full keyboard for entry rather than just the numeric pad. Most modern statistical packages allow analysis using the actual codes, but some packages may require conversion to numeric codes. Whatever format is chosen for this type of question and answer, it is recom-mended that it be consistently used on all forms for a study, and that, overall, only a small number of item formats be used.

Format of Questions. When formulating the questions which will go on the form, these guidelines will help on achieving clarity:

1. Keep the text of the question as short as possible while still retaining the meaning. The question does not have to be posed as a complete sentence if a short phrase is suffi ciently clear (Example 3.5 ).

2. Use terminology that will be familiar to the person completing the form. While words and phrases can be very familiar to members of the trial

Example 3.4.

Patient’s Gender M. Male

F. Female U. Unknown

Example 3.5.

• °F—Temperature on cycle 1, day 5_______________________________________________________________

This is more concise and just as clear as:_______________________________________________________________

• °F—What is the patient’s temperature on the fi fth day of the fi rst cycle?

DESIGN OF CASE REPORT FORMS 41

42 DATA DEFINITION, FORMS, AND DATABASE DESIGN

team designing the form, they may not be as clear to the clinical research associates who have to fi ll out the forms at the different institutions. Piloting draft forms can help to ensure that the meaning of the questions is clear and that the required data will be collected.

3. Ask only one question at a time, and do not introduce compound ques-tions that can be confusing (Example 3.6 ).

This is really two questions, and it can only be fully interpreted if the answer is “ Y. ” If the answer is “ N, ” the patient might be:

• Fully ambulatory but not on a regular diet • On a regular diet but not fully ambulatory • Neither ambulatory nor on a regular diet

If it is important to know the exact answers for both parts of the question, then two separate questions should be asked (Example 3.7 ).

4. Instructions should be positive rather than negative, telling the person completing the form what to do rather than what not to do.

Coding Conventions. Consistent coding conventions are also advisable when designing case report forms. The user becomes familiar with one way of com-pleting a certain type of fi eld and is more likely to provide the correct answer if that type of data is always collected in the same way.

Example 3.6.

Is the patient fully ambulatory and on a regular diet? Y—yes

N—no U—unknown

Example 3.7.

Is the patient fully ambulatory? Y—yes

N—no U—unknown

Is the patient on a regular diet? Y—yes

N—no U—unknown

Dates. When dates are recorded, it is important to clearly identify the format to be used, especially in a trial with multinational participation. In the United States, dates are usually recorded as month, day, and year, but in many other countries, dates are recorded as day, month, and year. Is a date written as 7/8/2006 meant to represent the 7th of August or the 8th of July? To reduce this kind of ambiguity and ensure that dates are correctly interpreted, the forms should clearly state which format is expected (Example 3.8 ).

In trials with international participation, it may be less confusing to abbreviate the name of the month and use that instead of the numeric representation — for example, 10 - Dec - 95 rather than 10 - 12 - 95 (day, month, year) or 12 - 10 - 95 (month, day, year).

Decimal Points. For answers that require the entry of a decimal point, the placement of the decimal point should be preprinted on the form so that the interpretation of the value entered is unambiguous.

Units of Measurement. When applicable, the appropriate units of measure-ment should be preprinted on a case report form. If there is a possibility that some participating sites will always report in different units than other sites (for example, in an international trial), then either (a) the forms should allow for specifying the units used or (b) the documenta-tion provided to the sites should include a conversion algorithm to the “ standard ” unit for the trial. In general, it is advisable to collect the actual value and units from the site and do all conversions centrally to ensure accuracy. It may also be necessary to collect normal ranges of laboratory test results, since these can also differ across laboratories and, without this value, it is impossible to know whether or not a result is within normal ranges.

Unknown/Not Applicable/Not Available/Not Done. There should be a con-sistent convention for recording these categories on case report forms. It is sometimes diffi cult to distinguish between a test that was not done and one where the results are not available at the time that the form is completed. If the test was not done, the Coordinating Center will know that it is futile to request that the missing data be provided. However, if the test was done but the result is not yet available, then a query from the Coordinating Center will usually recover the missing data. Whenever a question could have one of these categories as a feasible response, a code should be provided for each relevant category.

Example 3.8.

- - Date of birthmm dd yyyy

DESIGN OF CASE REPORT FORMS 43

44 DATA DEFINITION, FORMS, AND DATABASE DESIGN

Other. It is often diffi cult to predict all possible responses to a particular question, and in these circumstances it is advisable to allow for “ Other ” as a possible response. If used, there should also be provision for collect-ing information on what “ Other ” means (Example 3.9 ).

LAYOUT OF CASE REPORT FORMS

As well as having a consistent format for questions on a form, there are several principles that should be considered in deciding how to organize the questions on a page. It is important that the form be legible in the conditions under which it will be completed. The print should be large enough to read easily (minimum of 8 - point type for text), and if boxes are being used to collect coded answers, they must be large enough for handwritten answers. The layout of the questions on the form should be visually pleasing and should allow for ease of data entry.

Use of different fonts, bolding, italics, and underlining can help to highlight areas of a form, but only if used sparingly. If there are many different formats of text on the page, the users will not realize what is being brought to their attention and what is not.

To make sure that the response to a question is written in the correct place, the response fi eld should be located close to the specifi c question. This means that the response fi eld should either be at the beginning of the question or at the end. Consider the formats shown in Example 3.10 .

Format 1 has the questions aligned on the left - hand margin with the response fi eld at the end. The person completing the form reads the question and then fi lls out the answer without having to shift the direction of their vision. However, the answers are buried in among the text and are not as easy to follow when data entry is being done. The data entry operator will have to scan the question before fi nding the data to key.

Format 2 has the responses aligned on the left - hand margin, and data entry can be done by reading down that margin without having to read the actual questions. This second version will probably take users slightly longer to com-plete because they have to read the question from left to right and then return their vision to the left margin to fi ll in the correct answer.

Example 3.9.

Give reason for stopping treatment:1. Completed per protocol2. Patient refused to continue3. Serious side effects4. Progression of disease5. Other, specify_____________________9. Unknown

Format 3 has the questions aligned on the left margin and has dots used as fi llers to align the response fi elds on the right. While this does make data entry easier than in Format 1, and does provide some help in linking the question to the correct response fi eld, the fi llers may have to be manually tracked across the page when questions are short.

Format 4 has the right margin of the question aligned, with the answer fi elds immediately to the right. This means that the user can read the question and fi ll in the answer fairly easily, and answers are aligned for ease of data entry. This format is good when questions are short, there are minimal instructions, and the list of possible answers is also short. For forms where there are long questions or a lot of instructions, it is less appropriate.

Example 3.10.

Format 1: Responses to the right of the question.

Patient’s age at time of registration Gender (m = male, f = female) Smoker (y = yes, n = no) Date of diagnosis

mm yy

Format 2: Responses immediately to the left of the question.

Patient’s age at time of registration Gender (m = male, f = female) Smoker(y = yes, n = no)

Date of diagnosis mm yy

Format 3: Responses aligned on the right.

Patient’s age at time of registration............... Gender(m = male, f = female)............................. Smoker (y = yes, n = no)...................................... Date of diagnosis.................... mm yy

Format 4: Questions and answer fi elds right justifi ed.

Patient’s age at time of registration Gender (m = male, f = female) Smoker (y = yes, n = no) Date of Diagnosis mm yy

LAYOUT OF CASE REPORT FORMS 45

46 DATA DEFINITION, FORMS, AND DATABASE DESIGN

The format in Format 2 is recommended to improve accuracy both in recording the answer and in entering the responses into the computer. Format 4 is also appropriate when the questions and instructions are short.

LAYOUT OF PAGE

Usually case report forms can be designed in either single - column or double - column format, and there is no indication that one leads to better data than the other. Clearly, having two columns of data and less “ free ” space on the page allows more data to be recorded on one sheet of paper. However, the resulting form may look cluttered and confusing. Cutting and pasting sample pages in both formats will give an indication of the best layout for a particular form. Input from users is also extremely benefi cial.

In some trials, the two - column format has been used with questions on the left - hand side of the page and arrows leading to boxes with explanatory text or conditional questions on the right - hand side (Example 3.11 ).

When books of case report form pages are used with all blank pages for a patient being kept in a single binder, the back of the previous page can be used to document instructions for the facing page (Example 3.12 ).

Clearly there are many different ways to lay out questions on a form, and the methods chosen will depend on the type and complexity of the data being collected and on the background of the person who will complete the form. It is important that the form is easy to follow and that the instructions are clear. Keeping a reasonable amount of white space (space with nothing in it) on a form will often make it visually more pleasing.

For some case report forms it may be useful to leave blank space on the form so that the person completing the form can write additional comments. Often such comments are useful to those who are reviewing the data. If com-ments or free text are collected, the Coordinating Center will review all that is written when doing quality control, since the information may need to be queried for some reason (for example, if it appears to contradict a value in a

Example 3.11.

Was treatment given according to protocol?(circle one)

Yes No If No, calculatepercentage of druggiven this cycle_______%

code fi eld), or may provide additional information that could allow more accu-rate grading of a side effect. Note that in the latter situation, a query may need to be sent to the site to ask whether they agree with the revised grading.

HEADER

It is best to develop a standard header format for all the pages of a case report form for a specifi c trial. Under the name of the form, the header should collect the data necessary to identify the patient and the trial. It is also useful to record the name of the institution that entered the patient, at least on the fi rst page of a multipage form. In any trial it is essential to be able to uniquely identify each patient entered and to ensure that all data for that patient can be linked in the database and for analysis. A unique patient identifi er should therefore be assigned to each patient entered on the trial, and this patient identifi er should be recorded on all forms/materials for the trial. The identifi er must be unique to that patient in that trial and can be an ID assigned at time of entry, the patient ’ s hospital ID number, a national ID number, or any combination of these or other possible options. No other patient on the trial should have the same ID number either by chance or design. This is one reason why patient initials or dates of birth are not recommended as identifi ers. Names should not be used for confi dentiality reasons. Once the format of the identifi er has been decided, it should be collected on every page of all case report forms for the

Example 3.12.

Notes for Cardiac Form NYHA Heart Classification

1. Cardiac disease but without limitations to physical activity.

2. Cardiac disease with slight limitation of physical activity. Comfortable at rest but ordinary activity results in fatigue, palpitation, dyspnea, or anginal pain.

3. Marked limitation in physical activity. Comfortable at rest, less than ordinary activity causes fatigue, palpitation, dyspnea or anginal pain.

4. Inability to carry on physical activity without discomfort.

Cardiac Form

Patient ID _____________

Date of Visit ___/___/___ (mm, dd, yy)

___ NYHA classifi cation (see opposite page for defi nition)

___ LVEF%

HEADER 47

48 DATA DEFINITION, FORMS, AND DATABASE DESIGN

trial. If the form has multiple pages, it is also recommended that the trial number and unique patient ID also be recorded on those pages in case pages get separated.

NUMBERING DATA ITEMS ON FORMS

Often each data item in a form is assigned a unique number for easy identifi -cation of that specifi c item. The numbering can be unique within that form, or unique within all forms for the study. This is convenient for the Coordinating Center when requesting clarifi cation for specifi c items. The item can be identi-fi ed by its number in any query sent to the site.

PRINTING AND DISTRIBUTION OF FORMS

Once the form design is complete, decisions need to made about how to print and distribute the blank forms to participants.

During the trial, there will normally be a need for more than just the origi-nal copy of the data to be maintained. For example, if data are being submitted to a Coordinating Center for review, the original would normally be submitted and a copy retained at the participating site. This gives the site a copy for ref-erence if queries about the forms are sent by the Coordinating Center. There may also be a need to make a copy of some or all forms for the PI, for the reference centers, or for monitors. The number of copies needed will vary from trial to trial, and both the number of copies and responsibility for copying should be specifi ed in the Procedures Manual; that is, will it be done by the site or by the Coordinating Center?

If only one or two copies are needed in addition to the original, it may be decided to print the blank case report forms on multipart NCR (No Carbon Required) paper. This is specially treated paper that has ink capsules built into the surface of all copies except the top one. The NCR pages are glued together along one margin, and as the user writes on the top copy, the pressure causes the writing to go through to the other copies. Each copy can be a different color, and distribution of copies can be made by color. For example, the top copy to the Coordinating Center, the yellow copy to the PI, and the pink copy to be retained at the site. The advantage of the NCR paper is the reduction in time and expense of making copies. However, if more than two copies are needed in addition to the original, it requires heavy pressure when writing to ensure that the image goes through to the last copy, and this can slow down the completion of the form. The quality of copies beyond two is likely to be poor, and also the quality of the copies deteriorates over time. Other disad-vantages are that the blank forms must be printed; they cannot be copied from a master on a copy machine. Finally, if a copy is made from a single sheet of the NCR paper, because the paper is quite thin, it probably would need to be placed on the glass for copying rather than fed through a sheet feeder.

If the forms are designed with different parts of the form printed in differ-ent colors of ink, then the blank forms must be printed or copied on a color copying machine. For the majority of clinical trials, this would be too expensive to be viable, and there are no clear data indicating that the use of color improves the quality of the data. If color is used on a form and is an integral part of the instructions for completing the form (i.e., different instructions apply to sections printed in different colors), then all forms will have to be printed/copied centrally and sent to the participating sites. It is unlikely that all the sites would have facilities for making their own color copies.

Forms can also be printed/copied on colored paper to clarify instructions for completion. For example, if the same forms are submitted twice during a study, the instructions can ask the participant to complete the blue forms for cycle 1 and the yellow forms for cycle 2. If the instructions depend on the correct colors being used, then again the forms should be printed centrally and distributed.

The majority of clinical trials use white paper with black print. The forms can then be printed/copied centrally or one copy can be sent to each partici-pating site as a “ master ” copy, and copies can be made from that.

If the forms are specialized (e.g., NCR paper, color print or paper) or if the set of forms for one patient is prebound in a binder, then the sets can be pre-pared centrally and distributed to the sites as needed. A decision will be needed about the number of sets to send to each site, and a mechanism for resupply will be required. In some studies the patient ID is preprinted on the forms and is associated with a particular site. In this case, the Coordinating Center will need to monitor accrual by site to make sure that the site does not run out of CRFs. If the forms are in a binder, and some forms have to be fi lled out multiple times, a copy for each of the time points needs to be included in the book. This ensures that the participants follow the completion schedule, but can lead to a lot of wasted paper, especially if the patient goes off study early.

Decisions on printing and distribution should be made after assessing the resources available and the requirements for the trial. Forms should not be printed or distributed until the protocol document is fi nal and all members of the trial team have approved the fi nal forms. It can be very expensive if changes are made to the CRF after the forms have been printed.

ELECTRONIC DATA COLLECTION

The guidelines given so far have been for paper case report forms, but more and more, data are being collected electronically. There are several ways to do this, and more details of the techniques can be found in Chapter 4 . The method of data collection will impact on how the data entry screens are designed. In some systems, the participants fi ll in paper versions of forms, and then they enter data into the computer at the site. In this case, the screens would have to mirror the paper forms as closely as possible or the person entering the data will get confused. One potential problem is that computer screens have

ELECTRONIC DATA COLLECTION 49

50 DATA DEFINITION, FORMS, AND DATABASE DESIGN

size limitations and it may be diffi cult to fi t everything from a paper form onto one screen.

There are more sophisticated options for entering data, where underlying software can lead the user through the forms completion by automatically jumping to the next appropriate question and only displaying relevant ques-tions on the screen. There is also likely to be some real - time error checking built into an electronic collection system, and the user will see pop - up mes-sages if the data they enter are questioned by the logic. Options for the design of data fi elds in electronic forms are similar to those for paper. They can be coded values, multiple choice, or text and will probably depend on how the data are being collected and the capabilities of the software being used. Elec-tronic data capture screens should be thoroughly tested prior to activation of the trial.

There are also several software packages available for data collection by fax. These require paper forms to be completed and then faxed to the Coor-dinating Center, which means that the original is retained at the site. These data fax systems usually require forms to be laid out in a specifi c format (e.g., with wide margins) and to have a bar code or other identifi er that allows the fax system to identify the form and the patient.

MODIFICATIONS TO FORMS

Modifying case report forms after activation of a trial should be done only when absolutely essential. If it is necessary to make changes, the following should be considered:

1. If the change adds a question, think about its location on the form. If the data entry screens and database are exact images of the forms, adding the change at the end of the form will simplify the change process. However, separating the new data item from other related fi elds on the form may make it more diffi cult for the person completing the form. If the new question is added in the middle of the form, then care must be taken in restructuring the data entry screens and the database to accom-modate the change. If both old and new versions of the forms are still being submitted, allowance needs to be made for that. Other existing software may also need to be modifi ed.

2. If the change adds another possible answer to the list of options for a particular question, and the answers are recorded as numeric codes, then the next unused code number in sequence should be assigned for the new option. It should not be inserted in the middle of existing options, with those options being renumbered. For example, if the fi rst version of the form used codes 1, 2, and 3, the new code should be code 4 even if the option would be more logically placed after option 2. Renumbering would mean that code 3 meant one thing on one version of the form and

meant something else on the new version. If old copies of forms con-tinued to circulate, this would be dangerous. It could also cause confusion in the analysis.

3. If questions are being removed from the form, extreme care must be taken to ensure that the subsequent electronic records are compatible with the ones created prior to the change. If a fi eld is deleted on the form, but kept in the computer records that were generated from the original form, then allowance for the blank space must be made so that a location in the database does not have two meanings, depending on the version of the form being used.

DATABASE DESIGN

The design of the database for the trial is closely related to the defi nition of data items to be collected and the design of the case report forms. The maximum amount of data that can be entered into the clinical database is the complete set of data collected at the sites and submitted on the case report forms. However, for many trials there is no need to actually computerize every piece of data that is collected. Sometimes a series of data items can be summarized into one overall value, for example, if the analysis will only use the worst grade of each reported side effect, then instead of computerizing all occurrences of that particular side effect, it may only be necessary to enter the worst grade into the computer. Similarly there may be data collected to verify a key vari-able. For example, there may be several chest X rays done to assess whether a cancer tumor has responded to chemotherapy. The measurements taken from these serial chest X rays are all reported, but in fact the only information that may need to be in the computer database is the fact that the patient did or did not respond. The measurements ensure that the participating site is reporting the response to treatment correctly, but the actual values may not be important for analysis.

By scrutinizing the data being collected and assessing how it will be used, decisions can be made about the contents of the trial database. Once this has been done, the structure needs to be designed. Chapter 4 discusses software options for clinical trials, and obviously the database structure has to be com-patible with the software selected.

A commonly used database structure will have records that mirror the case report form. For each type of form used in the trial, there will be a database record. With this structure it is essential that all records for one patient can be linked together. The unique patient identifi er for the trial should be on all records so that this linkage is possible. If the same form is submitted multiple times for the same patient, then there needs to be a way to distinguish one record from another — for example, by using a visit date. If there is a possibility that not all records will be received for each patient, the statistical software (and possibly trial management software) will need to be able to deal with missing

DATABASE DESIGN 51

52 DATA DEFINITION, FORMS, AND DATABASE DESIGN

records. If the software is not able to handle missing records, then dummy records may need to be inserted to ensure complete records for all patients.

If multiple trials are being done that collect some data common across all of them, the database design can allow storage of the common data in the same area, regardless of the data collection format. If the database is a large one, with many different sources of data and many different record types, then the database should be designed and set up by an experienced systems analyst/database administrator. Poorly structured databases can be very ineffi cient in terms of storage and retrieval, and errors can easily be introduced if the linkage between records is not adequately defi ned. However, for small studies done in single locations, the capabilities of microcomputer software packages are usually adequate for database storage and retrieval.

EXAMPLES OF PROBLEMS ON ACTUAL CRFs

Problem Questions from Real CRFs

1. Did the patient have any clinically signifi cant cardiovascular events during the follow-up period?

(1 = no 2 = yes)

D M Y Grade

� � � � � ���� � Myocardial infarction (cardiac-ischemia [Grade 1 – 4] infarction [Grade 4])

� � � � � ���� � Cerebrovascular accident (CVA/transient ischemic event or attack (TIA) (Grade 3 – 4)

� � � � � ���� � Angina requiring percutaneous transluminal coronary angioplasty (PTCA) (Grade 2)

� �� �� ���� � Angina requiring coronary bypass graft (CABG)(Grade 3)

� �� �� ���� � Thromboembolic event (Grade 2 – 4)

� �� �� ���� � Other cardiovascular, specify _____ (Grade 1 – 4)

� �� �� ���� � Other cardiovascular, specify _____ (Grade 1 – 4)

� �� �� ���� � Other cardiovascular, specify _____ (Grade 1 – 4)

� �� �� ���� � Other cardiovascular, specify _____ (Grade 1 – 4)

Q1: “ Did the patient have any clinically signifi cant cardiovascular events during this follow-up period? ” What was really wanted was if any new ones occurred during this follow-up period, so the question should have asked “ Did the patient experience any new clinically signifi cant cardiovascular events during this follow-up period? ”

This question also provides a list of cardiovascular events 9 items long. The Center has to respond to each item, indicating whether or not the event was experienced by the patient. There should have been a lead-in question asking if the patient experienced any of the following; if the answer was NO, one should have skipped to the next question.

Q2. � Did the patient experience arthralgia during this follow-up period? (1 = no, 2 = yes, 3 = continuing)

D M Y Worst grade during this follow-up period (1 – 4) Date of fi rst diagnosis �� �� ���� �

Q3. � Did the patient experience myalgia during this follow up period? (1 = no, 2 = yes, 3 = continuing) D M Y Worst grade during this follow-up period (1 – 4) Date of fi rst diagnosis �� �� ���� �

Q2 and Q3: “ Did the patient experience arthralgia/myalgia during this follow-up period? ” The date asks for the “ date of fi rst diagnosis. ” It is not clear if they should enter a date within this visit period, or any prior date.

Q4. Non-fasting Cholesterol Level

���� . � � Units (1 = mg/dl, 2 = mmol/L, 3 = other ______________)

�� �� ���� Date of assessment

D M Y

Upper limit of Normal (ULN)

(Complete only if there was a change from baseline)

���� . �

Q4: It asks for Cholesterol Levels and for Date of Assessment. The site often enters the date they receive the results which puts the date after the fi rst visit. If the “ Date Blood Drawn for Assessment ” was asked for, we would have received the correct information.

Disease status at time of death (1 = no, 2 = yes). Please enter 2 (= yes) if ANY recurrence has occurred since starting trial treatment.

Q5. � No recurrence

Q6. � Local recurrence

Q7. � Regional recurrence

Q8. � Distant metastases

EXAMPLES OF PROBLEMS ON ACTUAL CRFs 53

54 DATA DEFINITION, FORMS, AND DATABASE DESIGN

Q9. � Primary cause of death: 1 = Progression of underlying breast cancer 2 = Endometrial cancer 3 = Myocardial infarction 4 = Stroke 5 = Thromboembolic event 6 = Other cause (specify) ___________________ 7 = Cause unknown

Header for Q5 – 9: “ Disease status at time of death (1 = no, 2 = yes). Please enter if ANY recurrence has occurred since starting trial treatment. ” It is not clear if the question is asking for (a) the current status of disease at time of death or (b) a history of the patient ’ s disease up to and including the time of death.

Q5: If the patient did not have a recurrence, record code 1 ( “ no ” ). If the patient had a recurrence, record code 2 ( “ yes ” ). Therefore, it states No; no recurrence means the patient had a recurrence. This is confusing!

Q10. Was treatment delayed or modifi ed? (Specify code) �

A. If yes (code = 2, 3, or 4): Number of days delayed �� Percentage of full dose given �� B. If code = 3 or 4, specify _________

Treatment Delayed/Modifi ed Codes 1 = No 2 = Yes, hematologic toxicity 3 = Yes, other toxicity, specify 4 = Yes, reason other than toxicity, specify

Q10: The Center should only complete the number of days or percentage of dose if there is a delay or modifi cation, but this is not specifi ed on the form. If the patient ’ s dose was delayed or modifi ed, indicate either the number of days delayed or the percentage of full dose given; however, if the dose was delayed and modifi ed, indicate both the number of days delayed and the percentage of full dose given.

Adverse Events Record the highest grade experienced during this 4-week period for all adverse events reported here. If the patient died during this follow-up period, skip to Q28 and report adverse event information on the Follow-Up Form (Form 32-E).

Q11. Gastrointestinal � Check here if no such adverse events have been observed during this follow-up period and skip to the next question. If the patient has experienced an adverse event in this category, complete the Grade (0–5) for all events listed below. Grade � a. Anorexia � b. Nausea � c. Vomiting

� d. Mucositis/stomatitis � e. Gastritis � f. Diarrhea � g. Constipation

Q11: This is the Adverse Event section of the Form. There is a box to check (by recording an X ) if the patient did not experience any AE under the category listed. If the patient experiences at least one AE under the category listed, the Center should record the appropriate Grade in the box for all conditions that apply and record 0 (zero) in the remaining boxes for that category. The DMC is observing that these questions are frequently being answered with an X instead of the required Grade.

Q12. � If NYHA Class II, III, or IV: Has symptomatic CHF been confi rmed by a cardiologist?

This is two questions: � What is NYHA Class? 1. II 2. III 3. IV 4. Other 5. Unknown

� If NYHA Class is indicative of symptomatic disease (i.e., Grade II or higher), was this confi rmed by a cardiologist?

1 = yes 2 = no 3 = unknown

SUMMARY

Deciding what data to collect, designing and distributing case report forms, and developing a computer database for a trial are critical for success of a trial. Adequate time should be allowed for these important steps, and the entire trial team should have input into all three. Errors during this stage of planning can be very costly for a variety of reasons. Failing to collect important data items, collecting too much data and compromising the quality of essential data items, designing forms that are unclear and diffi cult to use leading to errors in transcription and recording, or building a database with incorrect linkages or with an ineffi cient structure are all examples of problems that can jeopardize a trial. Case report forms should be piloted prior to their introduction for a trial, and a trial should never be activated without fi nal data collection instru-ments in place.

SUMMARY 55