24
Data Management Data management issues are integral to all core areas of Responsible Conduct of Research (RCR) instruction, and everyone involved in research-related activities should be aware of these issues to conduct and support research responsibly. What constitutes research data can often be discipline- specific, but in general “research data” refers to information collected, stored, and processed in a systematic manner to meet the objectives of a particular research project. Data can be collected manually or electronically, and can be quantitative or qualitative. Data can be represented as numerical figures, text, images, audio/video, etc. Data sources can be human or animal subjects, field notes, journals, laboratory specimens, observations, etc. Different disciplines can have different notions of what constitutes data in their disciplines and how it can be managed. The underlying issue however is how to manage research data responsibly. Data management issues encompass all stages of research from conceptualization of a project to the archiving and disposal of research materials. Those involved in research can face integrity issues in each stage, and therefore, should be prepared deal with and address the issues that may arise from these issues. For the purpose of this module, data-related integrity issues have been organized under the following topics: Data selection Data collection Data analysis Data handling Data reporting and publishing Data ownership Along with the mentioned topics, research conceptualization and training of research staff can have a significant impact on the integrity of research data. Research staff includes not only those who are directly involved in conducting the research activities but also those who provide support for such activities, and as a result, can have an impact on the integrity of the research effort. Proper conceptualization of a research project and the use of appropriate research methods along with adequate training of all those staff directly and indirectly involved in the research project will ensure data integrity. Those who are new to a research area or particular research methods can unintentionally commit mistakes or misuse research methods that can impact the integrity of research data along with every other aspect of a project. But issues related to conceptualization of research and research methods are difficult to cover in the core modules of RCR instruction due to the project-specific nature of such issues. Research staff should be trained not only on the research methods but also on the relevant standards and regulations. For example, data standards can play an important role in data management in some disciplines such as geography (example, federal spatial data standards, http://www.fgdc.gov/), while federal and state guidelines on data collection on animal or human subjects can have a significant impact on data integrity in other disciplines such as biology or health sciences. Research staff should also be aware of the open-ended nature of some basic research which can at times conflict with the regulations imposed by state, federal, and other bodies for the purpose of protecting research subjects, and be prepared to deal with the potential conflicts associated with such exploration for the purpose of advancing science. Therefore, training and supervising research staff adequately on the necessary research methods, data standards, institutional policies and regulations, and sponsors’ requirements relevant to the research project is essential to prepare them to make better decisions that ensure research integrity. Another aspect of data integrity that is becoming increasingly important is related to the use technology in research projects for data collection, storage, analysis, archival, etc. These technologies include

Data management (1)

Embed Size (px)

Citation preview

Page 1: Data management (1)

Data Management

Data management issues are integral to all core areas of Responsible Conduct of Research

(RCR) instruction, and everyone involved in research-related activities should be aware of these issues to conduct and support research responsibly. What constitutes research data can often be discipline-specific, but in general “research data” refers to information collected, stored, and processed in a systematic manner to meet the objectives of a particular research project.

Data can be collected manually or electronically, and can be quantitative or qualitative. Data can be represented as numerical figures, text, images, audio/video, etc. Data sources can be human or animal subjects, field notes, journals, laboratory specimens, observations, etc. Different disciplines can have different notions of what constitutes data in their disciplines and how it can be managed. The underlying issue however is how to manage research data responsibly.

Data management issues encompass all stages of research from conceptualization of a project to the archiving and disposal of research materials. Those involved in research can face integrity issues in each stage, and therefore, should be prepared deal with and address the issues that may arise from these issues. For the purpose of this module, data-related integrity issues have been organized under the following topics:

Data selection

Data collection

Data analysis

Data handling

Data reporting and publishing

Data ownership

Along with the mentioned topics, research conceptualization and training of research staff can have a significant impact on the integrity of research data. Research staff includes not only those who are directly involved in conducting the research activities but also those who provide support for such activities, and as a result, can have an impact on the integrity of the research effort. Proper conceptualization of a research project and the use of appropriate research methods along with adequate training of all those staff directly and indirectly involved in the research project will ensure data integrity. Those who are new to a research area or particular research methods can unintentionally commit mistakes or misuse research methods that can impact the integrity of research data along with every other aspect of a project. But issues related to conceptualization of research and research methods are difficult to cover in the core modules of RCR instruction due to the project-specific nature of such issues.

Research staff should be trained not only on the research methods but also on the relevant standards and regulations. For example, data standards can play an important role in data management in some disciplines such as geography (example, federal spatial data standards, http://www.fgdc.gov/), while federal and state guidelines on data collection on animal or human subjects can have a significant impact on data integrity in other disciplines such as biology or health sciences. Research staff should also be aware of the open-ended nature of some basic research which can at times conflict with the regulations imposed by state, federal, and other bodies for the purpose of protecting research subjects, and be prepared to deal with the potential conflicts associated with such exploration for the purpose of advancing science. Therefore, training and supervising research staff adequately on the necessary research methods, data standards, institutional policies and regulations, and sponsors’ requirements relevant to the research project is essential to prepare them to make better decisions that ensure research integrity.

Another aspect of data integrity that is becoming increasingly important is related to the use technology in research projects for data collection, storage, analysis, archival, etc. These technologies include

Page 2: Data management (1)

electronic instruments or hand-held devices for collecting data, computer systems for storing and sharing data, and software for analyzing data. But the use of technology can create additional integrity concerns that researchers must be prepared to deal with and act responsibly. Adequate training of research staff in the application and implications of technology used in a project can help to prevent technology-related integrity violations.

In summary, researchers should be familiar with the contextual nature of data and the six areas of data management mentioned earlier to have a better understanding of data integrity issues. Adequate project planning, training and supervision of research staff, understanding of standards and regulations, and knowledge of the implications of technology used can prevent or reduce data-related integrity violations in research. Finally, researchers should recognize the overlapping nature of data management issues with all the other core areas of RCR instruction and be prepared to deal with data integrity issues in a professionally responsible manner at all stages of research projects.

Data selection is defined as the process of determining the appropriate data type and source, as well as suitable instruments to collect data. Data selection precedes the actual practice of data collection. This definition distinguishes data selection from selective data reporting (selectively excluding data that is not supportive of a research hypothesis) and interactive/active data selection (using collected data for monitoring activities/events, or conducting secondary data analyses). The process of selecting suitable data for a research project can impact data integrity.

The primary objective of data selection is the determination of appropriate data type, source, and instrument(s) that allow investigators to adequately answer research questions. This determination is often discipline-specific and is primarily driven by the nature of the investigation, existing literature, and accessibility to necessary data sources.

Integrity issues can arise when the decisions to select ‘appropriate’ data to collect are based primarily on cost and convenience considerations rather than the ability of data to adequately answer research questions. Certainly, cost and convenience are valid factors in the decision-making process. However, researchers should assess to what degree these factors might compromises the integrity of the research endeavor.

Considerations/issues in data selection There are a number of issues that researchers should be aware of when selecting data. These include determining:

the appropriate type and sources of data which permit investigators to adequately answer the stated research questions,

suitable procedures in order to obtain a representative sample

the proper instruments to collect data. There should be compatibility between the type/source of data and the mechanisms to collect it. It is difficult to extricate the selection of the type/source of data from instruments used to collect the data.

Types/Sources of Data Depending on the discipline, data types and sources can be represented in a variety of ways. The two primary data types are quantitative (represented as numerical figures - interval and ratio level measurements), and qualitative (text, images, audio/video, etc.). Although scientific disciplines differ in their preference for one type over another, some investigators utilize information from both quantitative and qualitative with the expectation of developing a richer understanding of a targeted phenomenon. Data sources can include field notes, journal, laboratory notes/specimens, or direct observations of humans, animals, plants.

Page 3: Data management (1)

Interactions between data type and source are not infrequent. Researchers collect information from human beings that can be qualitative (ex. observing child rearing practices) or quantitative (recording biochemical markers, anthropometric measurements). Determining appropriate data is discipline-specific and is primarily driven by the nature of the investigation, existing literature, and accessibility to data sources.

Questions that need to addressed when selecting data type and type include:

1. What is (are) the research question(s)? 2. What is the scope of the investigation? (This defines the parameters of any study. Selected data

should not extend beyond the scope of the study). 3. What has the literature (previous research) determined to be the most appropriate data to collect? 4. What type of data should be considered: quantitative, qualitative, or a composite of both?

Methodological Procedures to Obtain a Representative Sample The goal of sampling is to select a data source that is representative of the entire data universe of interest. Depending on discipline, samples can be drawn from human or animal populations, laboratory specimens, observations, or historical documents. Failure to ensure representativeness may introduce bias, and thus compromise data integrity.

It is one thing to have a sampling methodology designed for representativeness and yet another thing for the data sample to actually be representative. Thus, data sample representativeness should be tested and/or verified before use of those data.

Potential biases limit the ability to draw inferences to larger populations. A partial list of biases could include sex, age, race, height, or geographical locale.

A variety of sampling procedures are available to reduce the likelihood of drawing a biased sample, and some of them are listed below:

1. Simple random sampling 2. Stratified sampling 3. Cluster sampling 4. Systematic sampling

These methods of sampling try to ensure the representativeness from the entire population by incorporating an element of ‘randomness’ to the selection procedure, and thus a greater ability to generalize findings to the targeted population. These methods contrast sharply with the ‘convenience’ sample where little or no attempt is made to ensure representativeness.

Random sampling procedures common in quantitative research contrasts with the predominant type of sampling conducted in qualitative research. Since investigators may be focusing on a small numbers of cases, sampling procedures are often purposive or theoretical rather than random. According to Savenye and Robinson (2004), “For the study to be valid, the reader should be able to believe that a representative sample of involved individuals was observed. The “multiple realities” of any cultural context should be represented.

Each strategy has its appropriate application for specific scenarios (the reader is advised to review research methodology textbooks for detailed information on each sampling procedure). Selection bias can occur when failing to implement a selected sampling procedure properly. The resulting non-representative sample may exhibit disproportionate numbers of participants sharing characteristics (ex. race, gender, age, geographic) that could interact with main effect variables (Skodol, Bender, 2003; Robinson, Woerner, Pollak, Lerner, 1996; Maynard, Selker, Beshansky, Griffith, Schmid, Califf,

Page 4: Data management (1)

D’Agostino, Laks, Lee, Wagner, 1995; Fourcroy, 1994; Gurwitz, Col, Avorn, 1992). Use of homogenous samples in clinical trials may limit the ability of researchers to generalize findings to a broader population (Sharpe, 2002; Dowd, Recker, Heaney, 2000; Johnson, 1990). The issues of sampling procedures apply to both quantitative and qualitative research areas. Savenye and Robinson (2004) contrast this approach with qualitative researchers’ tendency to interpret results of an investigation or draw conclusions based on specific details of a particular study, rather than in terms of generalizability to other situations and settings. While findings from a case study cannot be generalized, this data may be used to develop research questions later to be investigated in an experiment (Savenye, Robinson, 2004).

Selection of Proper Instrument Potential for compromising data integrity also exists in the selection of instruments to measure targeted data. Typically, researchers are familiar with the range of instruments that are conventionally used in a specialized field of study. Challenges occur when researchers fail to keep abreast of critiques of existing instruments or diagnostic tests (Goehring, Perrier, Morabia, 2004; Walter, Irwig, Glasziou, 1999; Khan, Khan, Nwosu, Arnott, Chien, 1999). Furthermore, researchers may be:

unaware of the development of more refined instruments

use instruments that have not been field-tested, calibrated, validated or measured for reliability

apply instruments to populations for which they were not originally intended

Questions that should be addressed in the selection of instruments include:

1. How was data collected in the past? 2. Is (are) the instrument(s) appropriate for the type of data sought? 3. Will the instrument(s) be adequate to collect all necessary data to the degree needed? 4. Is the instrument current, properly field-tested, calibrated, validated, and reliable? 5. Is the instrument appropriate for using in collecting data from a different source than originally

intended? Should the instrument be modified?

Attention to the data selection process is crucial in supporting the research steps that follow. Despite efforts to maintain strict adherence to data collection protocols, selection of fitting statistical analyses, accurate data reporting, and an unbiased write-up, scientific findings will have questionable value if the data selection process is flawed.

References:

Dowd, R., Recker, R.R., Heaney, R.P. (2000). Study subjects and ordinary patients. Osteoporos Int. 11(6): 533-6.

Fourcroy, J.L. (1994). Women and the development of drugs: why can’t a women be more like a man? Ann N Y Acad Sci, 736:174-95.

Goehring, C., Perrier, A., Morabia, A. (2004). Spectrum Bias: a quantitative and graphical analysis of the variability of medical diagnostic test performance. Statistics in Medicine, 23(1):125-35.

Gurwitz,J.H., Col. N.F., Avorn, J. (1992). The exclusion of the elderly and women from clinical trials I acute myocardial infarction. JAMA, 268(11): 1417-22.

Hartt, J., Waller, G. (2002). Child abuse, dissociation, and core beliefs in bulimic disorders. Child Abuse Negl. 26(9): 923-38.

Page 5: Data management (1)

Kahn, K.S, Khan, S.F, Nwosu, C.R, Arnott, N, Chien, P.F.(1999). Misleading authors’ inferences in obstetric diagnostic test literature. American Journal of Obstetrics and Gynaecology., 181(1`), 112-5.

Maynard, C., Selker, H.P., Beshansky, J.R.., Griffith, J.L., Schmid, C.H., Califf, R.M., D’Agostino, R.B., Laks, M.M., Lee, K.L., Wagner, G.S., : et al. (1995). The exclusions of women from clinical trials of thrombolytic therapy: implications for developing the thrombolytic predictive instrument database. Med Decis Making (Medical Decision making: an international journal of the Society for Medical Decision Making), 15(1): 38-43.

Robinson, D., Woerner, M.G., Pollack, S., Lerner, G. (1996). Subject selection bias in clinical: data from a multicenter schizophrenia treatment center. Journal of Clinical Psychopharmacology, 16(2): 170-6.

Sharpe, N. (2002). Clinical trials and the real world: selection bias and generalisability of trial results. Cardiovascular Drugs and Therapy, 16(1): 75-7.

Walter, S.D., Irwig, L., Glasziou, P.P. (1999). Meta-analysis of diagnostic tests with imperfect reference standards. J Clin Epidemiol., 52(10): 943-51.

Whitney, C.W., Lind, B.K., Wahl, P.W. (1998). Quality assurance and quality control in longitudinal studies. Epidemiologic Reviews, 20(1): 71-80.

Page 6: Data management (1)

Data collection is the process of gathering and measuring information on variables of interest, in

an established systematic fashion that enables one to answer stated research questions, test hypotheses, and evaluate outcomes. The data collection component of research is common to all fields of study including physical and social sciences, humanities, business, etc. While methods vary by discipline, the emphasis on ensuring accurate and honest collection remains the same.

The importance of ensuring accurate and appropriate data collection Regardless of the field of study or preference for defining data (quantitative, qualitative), accurate data collection is essential to maintaining the integrity of research. Both the selection of appropriate data collection instruments (existing, modified, or newly developed) and clearly delineated instructions for their correct use reduce the likelihood of errors occurring.

Consequences from improperly collected data include

inability to answer research questions accurately

inability to repeat and validate the study

distorted findings resulting in wasted resources

misleading other researchers to pursue fruitless avenues of investigation

compromising decisions for public policy

causing harm to human participants and animal subjects

While the degree of impact from faulty data collection may vary by discipline and the nature of investigation, there is the potential to cause disproportionate harm when these research results are used to support public policy recommendations.

Issues related to maintaining integrity of data collection:

The primary rationale for preserving data integrity is to support the detection of errors in the data collection process, whether they are made intentionally (deliberate falsifications) or not (systematic or random errors).

Most, Craddick, Crawford, Redican, Rhodes, Rukenbrod, and Laws (2003) describe ‘quality assurance’ and ‘quality control’ as two approaches that can preserve data integrity and ensure the scientific validity of study results. Each approach is implemented at different points in the research timeline (Whitney, Lind, Wahl, 1998):

1. Quality assurance - activities that take place before data collection begins 2. Quality control - activities that take place during and after data collection

Quality Assurance Since quality assurance precedes data collection, its main focus is 'prevention' (i.e., forestalling problems with data collection). Prevention is the most cost-effective activity to ensure the integrity of data collection. This proactive measure is best demonstrated by the standardization of protocol developed in a comprehensive and detailed procedures manual for data collection. Poorly written manuals increase the risk of failing to identify problems and errors early in the research endeavor. These failures may be demonstrated in a number of ways:

Page 7: Data management (1)

Uncertainty about the timing, methods, and identify of person(s) responsible for reviewing data

Partial listing of items to be collected

Vague description of data collection instruments to be used in lieu of rigorous step-by-step instructions on administering tests

Failure to identify specific content and strategies for training or retraining staff members responsible for data collection

Obscure instructions for using, making adjustments to, and calibrating data collection equipment (if appropriate)

No identified mechanism to document changes in procedures that may evolve over the course of the investigation .

An important component of quality assurance is developing a rigorous and detailed recruitment and training plan. Implicit in training is the need to effectively communicate the value of accurate data collection to trainees (Knatterud, Rockhold, George, Barton, Davis, Fairweather, Honohan, Mowery, O'Neill, 1998). The training aspect is particularly important to address the potential problem of staff who may unintentionally deviate from the original protocol. This phenomenon, known as ‘drift’, should be corrected with additional training, a provision that should be specified in the procedures manual.

Given the range of qualitative research strategies (non-participant/ participant observation, interview, archival, field study, ethnography, content analysis, oral history, biography, unobtrusive research) it is difficult to make generalized statements about how one should establish a research protocol in order to facilitate quality assurance. Certainly, researchers conducting non-participant/participant observation may have only the broadest research questions to guide the initial research efforts. Since the researcher is the main measurement device in a study, many times there are little or no other data collecting instruments. Indeed, instruments may need to be developed on the spot to accommodate unanticipated findings.

Quality Control While quality control activities (detection/monitoring and action) occur during and after data collection, the details should be carefully documented in the procedures manual. A clearly defined communication structure is a necessary pre-condition for establishing monitoring systems. There should not be any uncertainty about the flow of information between principal investigators and staff members following the detection of errors in data collection. A poorly developed communication structure encourages lax monitoring and limits opportunities for detecting errors.

Detection or monitoring can take the form of direct staff observation during site visits, conference calls, or regular and frequent reviews of data reports to identify inconsistencies, extreme values or invalid codes. While site visits may not be appropriate for all disciplines, failure to regularly audit records, whether quantitative or quantitative, will make it difficult for investigators to verify that data collection is proceeding according to procedures established in the manual. In addition, if the structure of communication is not clearly delineated in the procedures manual, transmission of any change in procedures to staff members can be compromised

Quality control also identifies the required responses, or ‘actions’ necessary to correct faulty data collection practices and also minimize future occurrences. These actions are less likely to occur if data collection procedures are vaguely written and the necessary steps to minimize recurrence are not implemented through feedback and education (Knatterud, et al, 1998)

Examples of data collection problems that require prompt action include:

errors in individual data items

systematic errors

violation of protocol

problems with individual staff or site performance

Page 8: Data management (1)

fraud or scientific misconduct

In the social/behavioral sciences where primary data collection involves human subjects, researchers are taught to incorporate one or more secondary measures that can be used to verify the quality of information being collected from the human subject. For example, a researcher conducting a survey might be interested in gaining a better insight into the occurrence of risky behaviors among young adult as well as the social conditions that increase the likelihood and frequency of these risky behaviors.

To verify data quality, respondents might be queried about the same information but asked at different points of the survey and in a number of different ways. Measures of ‘ Social Desirability’ might also be used to get a measure of the honesty of responses. There are two points that need to be raised here, 1) cross-checks within the data collection process and 2) data quality being as much an observation-level issue as it is a complete data set issue. Thus, data quality should be addressed for each individual measurement, for each individual observation, and for the entire data set.

Each field of study has its preferred set of data collection instruments. The hallmark of laboratory sciences is the meticulous documentation of the lab notebook while social sciences such as sociology and cultural anthropology may prefer the use of detailed field notes. Regardless of the discipline, comprehensive documentation of the collection process before, during and after the activity is essential to preserving data integrity.

References:

Knatterud.,G.L., Rockhold, F.W., George, S.L., Barton, F.B., Davis, C.E., Fairweather, W.R., Honohan, T., Mowery, R, O’Neill, R. (1998). Guidelines for quality assurance in multicenter trials: a position paper. Controlled Clinical Trials, 19:477-493.

Most, .M.M., Craddick, S., Crawford, S., Redican, S., Rhodes, D., Rukenbrod, F., Laws, R. (2003). Dietary quality assurance processes of the DASH-Sodium controlled diet study. Journal of the American Dietetic Association, 103(10): 1339-1346.

Whitney, C.W., Lind, B.K., Wahl, P.W. (1998). Quality assurance and quality control in longitudinal studies.Epidemiologic Reviews, 20(1): 71-80.

Page 9: Data management (1)

Data handling is the process of ensuring that research data is stored, archived or disposed off in a

safe and secure manner during and after the conclusion of a research project. This includes the development of policies and procedures to manage data handled electronically as well as through non-electronic means .

Data handling is important in ensuring the integrity of research data since it addresses concerns related to confidentially, security, and preservation/retention of research data. Proper planning for data handling can also result in efficient and economical storage, retrieval, and disposal of data. In the case of data handled electronically, data integrity is a primary concern to ensure that recorded data is not altered, erased, lost or accessed by unauthorized users.

Data handling issues encompass both electronic as well as non-electronic systems, such as paper files, journals, and laboratory notebooks. Electronic systems include computer workstations and laptops, personal digital assistants (PDA), storage media such as videotape, diskette, CD, DVD, memory cards, and other electronic instrumentation. These systems may be used for storage, archival, sharing, and disposing off data, and therefore, require adequate planning at the start of a research project so that issues related to data integrity can be analyzed and addressed early on.

Considerations/issues in data handling Issues that should be considered in ensuring integrity of data handled include the following:

Type of data handled and its impact on the environment (especially if it is on a toxic media).

Type of media containing data and its storage capacity, handling and storage requirements, reliability, longevity (in the case of degradable medium), retrieval effectiveness, and ease of upgrade to newer media.

Data handling responsibilities/privileges, that is, who can handle which portion of data, at what point during the project, for what purpose, etc.

Data handling procedures that describe how long the data should be kept, and when, how, and who should handle data for storage, sharing, archival, retrieval and disposal purposes.

Deciding how long research data should be kept may depend on the nature of the project, sponsoring agency’s guidelines, ongoing interest in or need for the data, cost of maintaining the data in the long run, and other relevant considerations. Under current Health and Human Services requirements, research records must be maintained for at least three years after the last expenditure report. Federal regulations or institutional guidelines may require that data be retained for longer periods.

In the case of data stored electronically, the potential for altering, erasing, losing, or unauthorized access is high. Several years of valuable research data can be compromised or lost as it happned in April 2001, when an intruder broke into a server used by a group of Univeristy of Washington graduate students and deleted the entire file system (UoW website, 2003). Although some aspects of protection from these threats are the responsibility of IT professionals, researchers are ultimately responsible for ensuring the security of their data.

In the “ Data Management Guidelines Issued by British Medical Research Council” published on the ORI website (2003) it states that:

" If the data are recorded electronically, the data should be regularly backed up on disc; a hard copy should be made of particularly important data; relevant software must be retained to ensure future access, and special attention should be given to guaranteeing the security of electronic data” (ORI website, 2003).

Page 10: Data management (1)

Creating a secure environment for electronic data usually involves all members of a project, which can include an IT Manager, system administrator, support personnel, and several end- users. Some issues to consider when handling data electronically include the following:

Protect systems’ and individual files with login and passwords

Manage access rights (in the case of computer system administrators not involved in the project their access rights could be limited)

Regularly update virus protection to prevent vulnerability of data

Limit physical access to equipment and storage media (for example, in the case of data stored on a computer using a stand-alone computer may be secure than a networked, computer)

Accurate data removal from old hardware and certification that the data was removed

Ensure data recoverability in case of emergencies

Regularly update electronic storage media to avoid outdated storage/retrieval devices

Backup multiple copies in secured multiple locations

Encrypt files when wireless devices are used, and keep track of wireless connectivity to prevent accidental file sharing

Record date and time when a piece of electronic data was originally recorded to prevent alteration or manipulation at a future date

In the article entiled “Preventing data theft”, Lynn Greiner quotes Paul Hyde, CEO of Kasten Chase (a company that develops high-assurance data security systems) that:

" It's important to have a level of security that is adequate if the machine is stolen. Everyone who is in the position where they could be separated from the device needs security.I think the best way to look at it, is to look at the criticality of what you're doing, (and) of its importance to the business environment. You have to determine what the value of the information is, and match up security accordingly" (Greiner, 2002).

One of the key issues to consider in storing or archiving data manually or electronically is “configuration management.” This involves keeping track of data on different media or format during different stages of the project by different users. For example, in a research effort raw data could be recorded in a laboratory notebook, then transferred to an electronic data file for analysis, which could result in output data. The output data then could be converted to plots or graphs. Configuration management will involve keeping track of all these and upgrading the data to newer media or formats as necessary during the life of a particular project. Effective configuration management will not only ensure data integrity but also simplify the use of data .

Disposing research data requires adequate plans, procedures, and impact analysis to ensure that the appropriate data is discarded in a safe and secure manner. Retaining data on paper files and electronic media when not needed after a project is over can lead to unauthorized access to confidential data. The likelihood of this is very high especially when principal investigators retire, leave the project, or die without establishing proper data management procedures on which data should be kept, disposed off, shared, etc.

Disposing of data containing confidential information on human subjects or national security requires additional care to ensure that the information could not be reconstructed from the disposed media. When disposing electronically data stored on computer disks, the disks will have to be erased several times and certified that data could not be recovered from them. Some federal and state agencies have guidelines on how many times a computer disk should be erased to ensure the disk is free of recoverable data. In the case of data stored on film or other toxic media, care should be taken to ensure that the disposal process does not pollute the environment.

Research organizations often contract to commercial data disposing companies to dispose of data stored on non-electronic media such as laboratory notebooks, paper files, etc., and it is the responsibility of the

Page 11: Data management (1)

research organization to ensure the commercial company will dispose off the data in a safe and non-recoverable manner .

Data handling requires adequate planning, development of procedures, and training and supervision of research staff to ensure that data is stored, archived or disposed off in a safe and secure manner that preserves the integrity of research data as well as simplifies data management .

References

University of Washington. "Is Your Computer Safe?" Computing & Communications Windows on Technology, No. 27, June 2002. 18 Nov. 2003. http://www.washington.edu/computing/windows/issue27/safe.html

Greiner, Lynn. "Preventing data theft " Computer Dealer News, February 22, 2002, Vol. 18 No. 3. 21 Nov. 2003.http://www.itbusiness.ca/index.asp?theaction=61&sid=47850

Office of Research Integrity. "Data Management Guidelines Issued by British Medical Research Council" September 2001, Vol. 9, No. 4. 20 Nov. 2003. http://ori.dhhs.gov/html/resources/britishmed.asp

Source: University Of Texas Southwestern Medical Center At Dallas Date: 2000-10-10 Collecting Research Data On Computer Wave Of Future, UT Southwestern Researchers Report In Jamahttp://www.sciencedaily.com/releases/2000/10/001010071729.htm

RCR Education Consortium (2004). Accessed on April 15, 2004. http://rcrec.org/index.php?module=ContentExpress&func=display&bid=24&btitle=Navigation&mid=29&ceid=2

Page 12: Data management (1)

Data Analysis is the process of systematically applying statistical and/or logical techniques to

describe and illustrate, condense and recap, and evaluate data. According to Shamoo and Resnik (2003) various analytic procedures “provide a way of drawing inductive inferences from data and distinguishing the signal (the phenomenon of interest) from the noise (statistical fluctuations) present in the data”..

While data analysis in qualitative research can include statistical procedures, many times analysis becomes an ongoing iterative process where data is continuously collected and analyzed almost simultaneously. Indeed, researchers generally analyze for patterns in observations through the entire data collection phase (Savenye, Robinson, 2004). The form of the analysis is determined by the specific qualitative approach taken (field study, ethnography content analysis, oral history, biography, unobtrusive research) and the form of the data (field notes, documents, audiotape, videotape).

An essential component of ensuring data integrity is the accurate and appropriate analysis of research findings. Improper statistical analyses distort scientific findings, mislead casual readers (Shepard, 2002), and may negatively influence the public perception of research. Integrity issues are just as relevant to analysis of non-statistical data as well.

Considerations/issues in data analysis There are a number of issues that researchers should be cognizant of with respect to data analysis. These include:

Having the necessary skills to analyze

Concurrently selecting data collection methods and appropriate analysis

Drawing unbiased inference

Inappropriate subgroup analysis

Following acceptable norms for disciplines

Determining statistical significance

Lack of clearly defined and objective outcome measurements

Providing honest and accurate analysis

Manner of presenting data

Environmental/contextual issues

Data recording method

Partitioning ‘text’ when analyzing qualitative data

Training of staff conducting analyses

Reliability and Validity

Extent of analysis

Having necessary skills to analyze A tacit assumption of investigators is that they have received training sufficient to demonstrate a high standard of research practice. Unintentional ‘scientific misconduct' is likely the result of poor instruction and follow-up. A number of studies suggest this may be the case more often than believed (Nowak, 1994; Silverman, Manson, 2003). For example, Sica found that adequate training of physicians in medical schools in the proper design, implementation and evaluation of clinical trials is “abysmally small” (Sica, cited in Nowak, 1994). Indeed, a single course in biostatistics is the most that is usually offered (Christopher Williams, cited in Nowak, 1994).

A common practice of investigators is to defer the selection of analytic procedure to a research team ‘statistician’. Ideally, investigators should have substantially more than a basic understanding of the rationale for selecting one method of analysis over another. This can allow investigators to better supervise staff who conduct the data analyses process and make informed decisions

Page 13: Data management (1)

Concurrently selecting data collection methods and appropriate analysis While methods of analysis may differ by scientific discipline, the optimal stage for determining appropriate analytic procedures occurs early in the research process and should not be an afterthought. According to Smeeton and Goda (2003), “Statistical advice should be obtained at the stage of initial planning of an investigation so that, for example, the method of sampling and design of questionnaire are appropriate”. Drawing unbiased inference The chief aim of analysis is to distinguish between an event occurring as either reflecting a true effect versus a false one. Any bias occurring in the collection of the data, or selection of method of analysis, will increase the likelihood of drawing a biased inference. Bias can occur when recruitment of study participants falls below minimum number required to demonstrate statistical power or failure to maintain a sufficient follow-up period needed to demonstrate an effect (Altman, 2001). Inappropriate subgroup analysis When failing to demonstrate statistically different levels between treatment groups, investigators may resort to breaking down the analysis to smaller and smaller subgroups in order to find a difference. Although this practice may not inherently be unethical, these analyses should be proposed before beginning the study even if the intent is exploratory in nature. If it the study is exploratory in nature, the investigator should make this explicit so that readers understand that the research is more of a hunting expedition rather than being primarily theory driven. Although a researcher may not have a theory-based hypothesis for testing relationships between previously untested variables, a theory will have to be developed to explain an unanticipated finding. Indeed, in exploratory science, there are no a priori hypotheses therefore there are no hypothetical tests. Although theories can often drive the processes used in the investigation of qualitative studies, many times patterns of behavior or occurrences derived from analyzed data can result in developing new theoretical frameworks rather than determined a priori (Savenye, Robinson, 2004). It is conceivable that multiple statistical tests could yield a significant finding by chance alone rather than reflecting a true effect. Integrity is compromised if the investigator only reports tests with significant findings, and neglects to mention a large number of tests failing to reach significance. While access to computer-based statistical packages can facilitate application of increasingly complex analytic procedures, inappropriate uses of these packages can result in abuses as well. Following acceptable norms for disciplines Every field of study has developed its accepted practices for data analysis. Resnik (2000) states that it is prudent for investigators to follow these accepted norms. Resnik further states that the norms are ‘…based on two factors:

(1) the nature of the variables used (i.e., quantitative, comparative, or qualitative),

(2) assumptions about the population from which the data are drawn (i.e., random distribution, independence, sample size, etc.). If one uses unconventional norms, it is crucial to clearly state this is being done, and to show how this new and possibly unaccepted method of analysis is being used, as well as how it differs from other more traditional methods. For example, Schroder, Carey, and Vanable (2003) juxtapose their identification of new and powerful data analytic solutions developed to count data in the area of HIV contraction risk with a discussion of the limitations of commonly applied methods. If one uses unconventional norms, it is crucial to clearly state this is being done, and to show how this new and possibly unaccepted method of analysis is being used, as well as how it differs from other more traditional methods. For example, Schroder, Carey, and Vanable (2003) juxtapose their identification of

Page 14: Data management (1)

new and powerful data analytic solutions developed to count data in the area of HIV contraction risk with a discussion of the limitations of commonly applied methods. Determining significance While the conventional practice is to establish a standard of acceptability for statistical significance, with certain disciplines, it may also be appropriate to discuss whether attaining statistical significance has a true practical meaning, i.e., ‘clinical significance’. Jeans (1992) defines ‘clinical significance’ as “the potential for research findings to make a real and important difference to clients or clinical practice, to health status or to any other problem identified as a relevant priority for the discipline”.

Kendall and Grove (1988) define clinical significance in terms of what happens when “… troubled and disordered clients are now, after treatment, not distinguishable from a meaningful and representative non-disturbed reference group”. Thompson and Noferi (2002) suggest that readers of counseling literature should expect authors to report either practical or clinical significance indices, or both, within their research reports. Shepard (2003) questions why some authors fail to point out that the magnitude of observed changes may too small to have any clinical or practical significance, “sometimes, a supposed change may be described in some detail, but the investigator fails to disclose that the trend is not statistically significant ”.

Lack of clearly defined and objective outcome measurements No amount of statistical analysis, regardless of the level of the sophistication, will correct poorly defined objective outcome measurements. Whether done unintentionally or by design, this practice increases the likelihood of clouding the interpretation of findings, thus potentially misleading readers. Provide honest and accurate analysis The basis for this issue is the urgency of reducing the likelihood of statistical error. Common challenges include the exclusion of outliers, filling in missing data, altering or otherwise changing data, data mining, and developing graphical representations of the data (Shamoo, Resnik, 2003). Manner of presenting data At times investigators may enhance the impression of a significant finding by determining how to present derived data (as opposed to data in its raw form), which portion of the data is shown, why, how and to whom (Shamoo, Resnik, 2003). Nowak (1994) notes that even experts do not agree in distinguishing between analyzing and massaging data. Shamoo (1989) recommends that investigators maintain a sufficient and accurate paper trail of how data was manipulated for future review.

Environmental/contextual issues The integrity of data analysis can be compromised by the environment or context in which data was collected i.e., face-to face interviews vs. focused group. The interaction occurring within a dyadic relationship (interviewer-interviewee) differs from the group dynamic occurring within a focus group because of the number of participants, and how they react to each other’s responses. Since the data collection process could be influenced by the environment/context, researchers should take this into account when conducting data analysis.

Data recording method Analyses could also be influenced by the method in which data was recorded. For example, research events could be documented by:

Page 15: Data management (1)

a. recording audio and/or video and transcribing later b. either a researcher or self-administered survey c. either closed ended survey or open ended survey d. preparing ethnographic field notes from a participant/observer e. requesting that participants themselves take notes, compile and submit them to researchers.

While each methodology employed has rationale and advantages, issues of objectivity and subjectivity may be raised when data is analyzed.

Partitioning the text During content analysis, staff researchers or ‘raters’ may use inconsistent strategies in analyzing text material. Some ‘raters’ may analyze comments as a whole while others may prefer to dissect text material by separating words, phrases, clauses, sentences or groups of sentences. Every effort should be made to reduce or eliminate inconsistencies between “raters” so that data integrity is not compromised.

Training of Staff conducting analyses A major challenge to data integrity could occur with the unmonitored supervision of inductive techniques. Content analysis requires raters to assign topics to text material (comments). The threat to integrity may arise when raters have received inconsistent training, or may have received previous training experience(s). Previous experience may affect how raters perceive the material or even perceive the nature of the analyses to be conducted. Thus one rater could assign topics or codes to material that is significantly different from another rater. Strategies to address this would include clearly stating a list of analyses procedures in the protocol manual, consistent training, and routine monitoring of raters.

Reliability and Validity Researchers performing analysis on either quantitative or qualitative analyses should be aware of challenges to reliability and validity. For example, in the area of content analysis, Gottschalk (1995) identifies three factors that can affect the reliability of analyzed data:

stability , or the tendency for coders to consistently re-code the same data in the same way over a period of time

reproducibility , or the tendency for a group of coders to classify categories membership in the same way

accuracy , or the extent to which the classification of a text corresponds to a standard or norm statistically

The potential for compromising data integrity arises when researchers cannot consistently demonstrate stability, reproducibility, or accuracy of data analysis

According Gottschalk, (1995), the validity of a content analysis study refers to the correspondence of the categories (the classification that raters’ assigned to text content) to the conclusions, and the generalizability of results to a theory (did the categories support the study’s conclusion, and was the finding adequately robust to support or be applied to a selected theoretical rationale?).

Extent of analysis Upon coding text material for content analysis, raters must classify each code into an appropriate category of a cross-reference matrix. Relying on computer software to determine a frequency or word count can lead to inaccuracies. “One may obtain an accurate count of that word's occurrence and frequency, but not have an accurate accounting of the meaning inherent in each particular usage”

Page 16: Data management (1)

(Gottschalk, 1995). Further analyses might be appropriate to discover the dimensionality of the data set or identity new meaningful underlying variables.

Whether statistical or non-statistical methods of analyses are used, researchers should be aware of the potential for compromising data integrity. While statistical analysis is typically performed on quantitative data, there are numerous analytic procedures specifically designed for qualitative material including content, thematic, and ethnographic analysis. Regardless of whether one studies quantitative or qualitative phenomena, researchers use a variety of tools to analyze data in order to test hypotheses, discern patterns of behavior, and ultimately answer research questions. Failure to understand or acknowledge data analysis issues presented can compromise data integrity.

References:

Gottschalk, L. A. (1995). Content analysis of verbal behavior: New findings and clinical applications. Hillside, NJ: Lawrence Erlbaum Associates, Inc

Jeans, M. E. (1992). Clinical significance of research: A growing concern. Canadian Journal of Nursing Research, 24, 1-4.

Lefort, S. (1993). The statistical versus clinical significance debate. Image, 25, 57-62. Kendall, P. C., & Grove, W. (1988). Normative comparisons in therapy outcome. Behavioral Assessment, 10, 147-158.

Nowak, R. (1994). Problems in clinical trials go far beyond misconduct. Science. 264(5165): 1538-41. Resnik, D. (2000). Statistics, ethics, and research: an agenda for educations and reform. Accountability in Research. 8: 163-88

Schroder, K.E., Carey, M.P., Venable, P.A. (2003). Methodological challenges in research on sexual risk behavior: I. Item content, scaling, and data analytic options. Ann Behav Med, 26(2): 76-103.

Shamoo, A.E., Resnik, B.R. (2003). Responsible Conduct of Research. Oxford University Press.

Shamoo, A.E. (1989). Principles of Research Data Audit. Gordon and Breach, New York.

Shepard, R.J. (2002). Ethics in exercise science research. Sports Med, 32 (3): 169-183.

Silverman, S., Manson, M. (2003). Research on teaching in physical education doctoral dissertations: a detailed investigation of focus, method, and analysis. Journal of Teaching in Physical Education, 22(3): 280-297.

Smeeton, N., Goda, D. (2003). Conducting and presenting social work research: some basic statistical considerations. Br J Soc Work, 33: 567-573.

Thompson, B., Noferi, G. 2002. Statistical, practical, clinical: How many types of significance should be considered in counseling research? Journal of Counseling & Development, 80(4):64-71.

Data publication and reporting is the process of preparing and disseminating research

findings to the scientific community. Scholarly disciplines can only advance through dissemination and review of research findings at professional meetings and publications in discipline-related journals. The tacit assumption in publishing is one of trust between the author(s) and readers regarding the accuracy and truthfulness of any submission.

Page 17: Data management (1)

The practice of ensuring research integrity is relevant at all stages of research investigation, from early conceptualization, design, implementation, to analysis. This practice also extends to the stage of documenting and preparing results for publication. In this process, researchers may experience many more challenges to preserving research integrity.

Considerations/issues in data reporting and publishing There are often factors in research settings that can result in compromises to data integrity. These factors may facilitate conditions where the goal of conducting research in as objective a manner as possible can sometimes be challenged. These can be categorized as either external or internal factors as follows:

External Factors:

Publication pressure

Professional competition

Job security

Lack of formal mentoring

Unclear guidelines

Lack of penalties

Little chance of getting caught

Bad examples from mentors (Price, Drake, Islam, 2001)

Internal Factors:

Individual ego or vanity

Personal financial gain

Psychiatric illness (Weed, 1998)

Incompetence

Sloppy writing/reporting

Importance of accurate and honest data reporting Investigators demonstrating lapses of integrity while engaged in data reporting and publishing can have a negative influence in the direction of future research efforts, threaten to compromise the credibility of a particular field of study, and may ultimately risk the well-being and safety of the public in general, as well as research subjects in particular.

Sources of guidance promoting good data reporting practices and publishing include faculty advisors who carefully instruct graduate students, departmental chairpersons mentoring researchers new to the field, regular review of published university policies, existing codes of professional ethics, or established government rules and regulations. Deficiencies in training or a lack of awareness of existing policies, codes, or rules may increase the likelihood of a deviation from the acceptable standards of practice in reporting and publishing.

Listed below are some issues related to integrity of data reporting and publication:

Page 18: Data management (1)

Misrepresentation Due to problems data in collection, researchers may omit data that is not supportive of the research hypothesis. Alternately, data may be fabricated if the data collection process was somehow interrupted or data was lost, and the researchers believe the invented data would have been similar to what was anticipated. In either case, the true scope of the data findings remains hidden from readers who are unable to accurately assess the validity of the findings.

Plagiarism Plagiarism is the act of taking credit for ideas or data that rightfully belongs to others. Related to this is the theft of ideas from grants and drafts of papers that a researcher has reviewed. This harms the researcher(s) from which the idea(s) or data was appropriated improperly acknowledged.

Selectivity of reporting / failure to report all pertinent data This is the practice of only using data that supports one’s research hypothesis and ignoring or omitting data that does not. A related practice is inaccurate reporting of missing data points. As explained under “Misrepresentation” earlier, the true scope of the data findings remains hidden from readers who are unable to accurately assess the validity of the findings.

Failure to disclose conflicts of interest Editors, reviewers, or readers who are not aware of possible conflicts of interest (financial and otherwise) may not have an opportunity to adequately assess the validity of research findings without being aware of possible undue influences from the sponsors of an investigation. These conflicts may compromise researchers’ credibility in their fields.

Publication bias / neglecting negative results Since the vast majority of research findings submitted to professional journals tend to be ‘positive’ in nature, the literature in most scientific fields demonstrates a negative bias. This in part reflects the reluctance of journal editors to publish articles with negative findings. Thus, researchers are less willing to report findings that fail to demonstrate an intended effect or yield an expected result. The value of these publications could be substantial in that other investigators would not needlessly pursue a fruitless path of research.

Analysis of data by several methods to find a significant result This is also known as ‘milking’ or ‘dredging the data’ and involves researchers utilizing a variety of statistical tests in the hopes of yielding a significant result. The proper procedure would be to base the selection of desired tests on a theory or theoretical framework rather than selecting tests a priori. Other related statistical issues include reporting percentages rather than absolute numbers due to small sample size, reporting differences when statistical significance is not reached suggesting a certain trend exists, reporting no difference when statistical power is inadequate, and failure to include the total number of eligible participants. The importance of this last point is the difficulty for readers to be able to determine whether a dismal non-respondent rate might compromise the representativeness of respondents.

Inadequate evaluation of prior research This refers to an insufficient review of available literature that presents an incomplete picture of the current status of a particular research area. A critique of the included citations may lack the required depth of analysis and fail to justify the need for proposed research.

Page 19: Data management (1)

Ignoring citations or prior work that challenge stated conclusions or call current findings into question Selective inclusion of citations that minimize threats to the justification for the present study can compromise the integrity of the study. Whether done intentionally or not, omissions can have the untoward consequence of providing support for an author’s position.

Misleading discussion of observations This may result from using inappropriate statistical tests, neglecting negative results, omitting missing data points, failing to report actual numbers of eligible subjects, using inappropriate graph labels or terminology, and data dredging. These can result in readers becoming less able to objectively critique the findings.

Reporting conclusions that are not supported Faulty data collection, inappropriate analyses, gaps in logic, and unexplained deviation from conventionally accepted methods of interpretation can result in conclusions that are not valid. Readers cannot assess the validity of the conclusions for themselves unless all the necessary information is honestly reported.

Breaking down of a single piece of research into multiple overlapping reports This can occur when the distinction and differences in findings between reports is negligible and the focus is publishing for quantity versus quality. A related practice is submission of duplicate publications in journal from different disciplines or in different languages. The expectation is that investigators would not read journal from different fields of study or languages. Literature reviews or meta-analyses that are conducted may lead to an inaccurate assessment of findings from a particular research area due to duplicate publications of the same study in different journals.

Just Attribution of Authorship Publication disputes generally fall into four categories(Ritter, Washington, 2001):

1. a researcher is listed as an author but did not have a chance to review or approve the manuscript 2. a researcher is promised first authorship when the project is completed, but the principal

investigator adds the work of someone else, who then becomes first author 3. a researcher claims first authorship on the basis of the amount of work he or she did when not

given that recognition, and 4. after leaving a laboratory, a researcher does not receive credit in an article that includes his or

her work. Related to this is submission of manuscripts not seen and reviewed by all the listed co-authors of a publication

A fair and equitable understanding of each author’s contribution to published research provides clear credit and acknowledgement for advancing a field of study.

Inappropriate use of terminology without precise definitions A potential barrier to successful cross-disciplinary investigations is the use of field-specific terminology. Encouraging the use of precise definitions can reduce confusion and promote understanding of research conducted.

Inflation of research results for the media This involves providing statements for public and not professional consumption that are insufficiently

Page 20: Data management (1)

supported by data for the purpose of publishing un-reviewed or untested results in a non-scientific or non-scholarly magazine/media. Premature reporting of results that turn out to be unsubstantiated may compromise the credibility of a particular field.

Publishing in peer-reviewed journals or presenting in scholarly meetings is the primary mechanism for investigators to disseminate their findings to the research community. This community relies on authors(s) to report the events of a study honestly and accurately. All researchers should be aware of the issues that compromise the integrity of data reporting and publishing. Ensuring integrity is essential to promoting the credibility of all fields of study.

Ethics Resources:

http://pubs.acs.org/cen/topstory/7946/7946sci1.html

http://onlineethics.org/reseth/mod/auth.html

References:

Marco, C.A., Larkin, G.L. (2000) Research ethics: ethical issues of data reporting and the quest for authenticity.Acad Emerg Med (Academic emergency medicine: official journal of the Society for Academic Emergency Medicine.), 7 (6): 691-694.

Price, J.H., Drake, J.A., Islam, R. (2001). Selected ethical issues in research and publication: perceptions of health education faculty. Health Education & Behavior, 28 (1): 51-64.

Stephen, K.R., Washington, C., Washington, E.N. (2001). Publication ethics: rights and wrongs: Balancing obligations and interests surrounding dissemination of research is an arduous task. Science & Technology , 79 (46): 24-31.

Weed, D.L. Preventing scientific misconduct. American Journal of Public Health, 88 (1) (Jan 1998): 125-129.

Data ownership refers to both the possession of and responsibility for information. Ownership

implies power as well as control. The control of information includes not just the ability to access, create, modify, package, derive benefit from, sell or remove data, but also the right to assign these access privileges to others (Loshin, 2002).

Implicit in having control over access to data is the ability to share data with colleagues that promote advancement in a field of investigation (the notable exception to the unqualified sharing of data would be research involving human subjects). Scofield (1998) suggest replacing the term ‘ownership’ with ‘stewardship’, “because it implies a broader responsibility where the user must consider the consequences of making changes over ‘his’ data”.

According to Garner (1999), individuals having intellectual property have rights to control intangible objects that are products of human intellect. The range of these products encompasses the fields of art, industry, and science. Research data is recognized as a form of intellectual property and subject to protection by U.S. law.

Importance of data ownership:

Page 21: Data management (1)

According to Loshin (2002), data has intrinsic value as well as having added value as a byproduct of information processing, “at the core, the degree of ownership (and by corollary, the degree of responsibility) is driven by the value that each interested party derives from the use of that information”.

The general consensus of science emphasizes the principle of openness (Panel Sci. Responsib. Conduct Res. 1992). Thus, sharing data has a number of benefits to society in general and protecting the integrity of scientific data in particular. The Committee on National Statistics’ 1985 report on sharing data (Fienberg, Martin, Straf, 1985) noted that sharing data reinforces open scientific inquiry, encourages a diversity of analyses and conclusions, and permits:

1. reanalyses to verify or refute reported results 2. alternative analyses to refine results 3. analyses to check if the results are robust to varying assumption

The cost and benefits of data sharing should be viewed in ethical, institutional, legal, and professional dimensions. Researchers should clarify at the beginning of a project if data can or cannot be shared, under what circumstances, by and with whom, and for what purposes.

Considerations/issues in data ownership Researchers should have a full understanding of various issues related to data ownership to be able to make better decisions regarding data ownership. These issues include paradigm of ownership, data hoarding, data ownership policies, balance of obligations, and technology. Each of these issues gives rise to a number of considerations that impact decisions concerning data ownership

Paradigm of Ownership – Loshin (2002) alludes to the complexity of ownership issues by identifying the range of possible paradigms used to claim data ownership. These claims are based on the type and degree of contribution involved in the research endeavor. Loshin (2002) identifies a list of parties laying a potential claim to data:

Creator – The party that creates or generate data

Consumer – The party that uses the data owns the data

Compiler - This is the entity that selects and compiles information from different information sources

Enterprise - All data that enters the enterprise or is created within the enterprise is completely owned by the enterprise

Funder - the user that commissions the data creation claims ownership

Decoder - In environments where information is “locked” inside particular encoded formats, the party that can unlock the information becomes an owner of that information

Packager - the party that collects information for a particular use and adds value through formatting the information for a particular market or set of consumers

Reader as owner - the value of any data that can be read is subsumed by the reader and, therefore, the reader gains value through adding that information to an information repository

Subject as owner - the subject of the data claims ownership of that data, mostly in reaction to another party claiming ownership of the same data

Purchaser/Licenser as Owner – the individual or organization that buys or licenses data may stake a claim to ownership

Data Hoarding This practice is considered antithetical to the general norms of science emphasizing the principle of openness. Factors influencing the decision to withhold access to data could include (Sieber, 1989):

Page 22: Data management (1)

(a) proprietary, economic, or security concerns

(b) documenting data which can be extremely costly and time consuming

(c) providing all the materials needed to understand or extend the research

(d) technical obstacles to sharing computer-readable data

(e) confidentiality

(f) concerns about the qualifications of data requesters

(g) personal motives to withhold data

(h) costs to the borrowers

(i) costs to funders

Data Ownership Policies Institutional policies lacking specificity, supervision, and formal documentation can increase the risk of compromising data integrity. Before research is initiated, it is important to delineate the rights, obligations, expectations, and roles played by all interested parties. Compromises to data integrity can occur when investigators are not aware of existing data ownership policies and fail to clearly describe rights, and obligations regarding data ownership. Listed below are some scenarios between interested parties that warrant the establishment of data ownership policies

Between academic institution and industry (public/private sector) – This refers to the sharing of potential benefits resulting from research conducted by academic staff but funded by corporate sponsors. The failure to clearly delineate data ownership issues early in public/private relationships has created controversy concerning the rights of academic institutions and those of industry sponsors (Foote, 2003).

Between academic institution and researcher staff –According to Steneck (2003) research funding is awarded to research institutions and not individual investigators. As recipients of funds, these institutions have responsibilities for overseeing a number of activities including budgets, regulatory compliance, and the management of data. Steneck (2003) notes “To assure that they are able to meet these responsibilities, research institutions claim ownership rights over data collected with funds given to the institution. This means that researchers cannot automatically assume that they can take their data with them if they move to another institution. The research institution that received the funds may have rights and obligations to retain control over the data”. Fishbein (1991) recommended that institutions clearly state their policies regarding ownership of data, and present guidelines for such a policy.

Collaboration between research colleagues–This is applicable to collaborative efforts that occur both within and between institutions. Whether collaborations are between faculty peers, students, or staff, all parties should have a clear understanding of who will determine how the data will be distributed and shared (if applicable) even before it is collected.

Between authors and journals - To reduce the likelihood of copyright infringement, some publishers require a copyright assignment to the journal at the time of submission of a manuscript. Authors should be aware of the implications of such copyright assignments and clarify the policies involved.

Balance of obligations Investigators must learn to negotiate the delicate balance that exists between an investigator’s willingness to share data in order to facilitate scientific progress, and the obligation to employer/sponsor, collaborators, and students to preserve and protect data (Last, 2003). Signed agreements of nondisclosure between investigators and their corporate sponsors can circumvent efforts to publish data or share with colleagues. However, in some cases as with human participants data sharing may not be allowed due to confidentiality reasons.

Technology

Page 23: Data management (1)

Advances in technology have enabled investigators to explore new avenues of research, enhance productivity, and use data in ways unimagined before. However, careless application of new technologies has the potential to create a slew of unanticipated data ownership problems that can compromise research integrity. The following examples highlight data ownership issues resulting from the careless application of technology:

Computer – The use of computer technology has permitted rapid access to many forms of computer-generated data (Veronesi, 1999). This is particularly the case in the medical profession where patient medical record data is becoming increasingly computerized. While this process facilitates data access to health care professionals for diagnostic and research purposes, unauthorized interception and disclosure of medical information can compromise patients’ right of privacy. While the primary justification for collecting medical data is to benefit the patient, Cios and Moore (2002) question whether medical data has a special status based on their applicability to all people.

Genetics – Due to advances in technology, i nvestigators of the Human Genome Project have opportunities to make significant contributions by addressing previously untreatable diseases and other human conditions. However, the status of genetic material and genetic information remains unclear (de Witte, Welie, 1997). Wiesenthal and Wiener (1996) discuss the conflict between the rights of the individual for privacy, and the need for societal protection. The critical issues that investigators need to be aware of include the ownership of genetic data, confidentiality rights to such information, and legislation to control genetic testing and its applications (Wiesenthal and Wiener, 1996).

The mentioned data ownership issues serve to highlight potential challenges to preserving data integrity. While the ideal is to promote scientific openness, there are situations where it may not be appropriate (especially in the case of human participants) to share data. The key is for researchers to know various issues impacting ownership and sharing of their research data and make decisions that promote scientific inquiry and protect the interests of the parties involved.

References

Cios, K. J., Moore, G. W. (2002). Uniqueness of medical mining. Artif Intell Med (Artificial intelligence in medicine), 26(1-2): 1-24.

de Witte, J. I. & Welie, J. V. (1997). The status of genetic material and genetic information in The Netherlands. Soc Sci Med (Social Science & Medicine (1982), 45(1): 45-9. Fienberg, S. E., Martin, M.E., Straf, M.L. (1985). Sharing Research Data. Washington , DC: National Acad. Press.

Fishbein, E. A. (1991). Ownership of research data. Academic Medicine, 66(3), 129-33.

Foote, M. (2003). Review of current authorship guidelines and the controversy regarding publication of clinical data. Biotechnol Annu Rev (Biotechnology annual review), 9: 303-13.

Garner, B. A. (1999). Black’s Law Dictionary, 7 th edition. West Group, St. Paul, MN. Last, R. L. (2003). Sandbox ethics in science: sharing of data and materials in plant biology. Plant Physiol (Plant physiology.), 132(1): 17-8. Loshin, D. (2002). Knowledge Integrity: Data Ownership (Online) June 8, 2004http://www.datawarehouse.com/article/?articleid=3052

Page 24: Data management (1)

Panel Sci. Responsib. Conuct Res. (1992). Responsible Science. Ensuring the Integrity of the Research Process. Vol. 1. Comm. Sci. Eng. Public Policy. Washington, DC: Natl. Acad. Press.

Scofield, M. (1998). Issues of Data Ownership (online), retrieved June 10, 2004http://www.dmreview.com/editorial/dmreview/print_action.cfm?articleId=296

Shamoo, A. E., Resnik, D. B. (2002). Intellectual Property. Responsible Conduct of Research. New York: Oxford University Press. Sieber, J. E. (1989). Sharing scientific data I: new problems for IRBs. IRB (IRB; a Review of Human Subjects Research), 11(6): 4-7.

Steneck, N. H. (2003). ORI Introduction to the Responsible Conduct of Research. Department of Health and Human Services.

Veronesi, J. F. (1999). Ethical issues in computerized medical records. Crit Care Nurs Q (Critical Care Nursing Quarterly), 22(3): 75-80.

Wiesenthal, D. L., Wiener, N. I. (1996). Privacy and the Human Genome Project. Ethics Behav (Ethics & Behavior),6(3): 189-202.