45
Collecting evidence about studies to guide acquisition policy Janez Štebe , ADP, Slovenia IASSIST, Edinburgh, 25 May

Collecting evidence about studies to guide acquisition policy Janez Štebe, ADP, Slovenia I A SS IST, Edinburgh, 25 May

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Collecting evidence about studies to guide

acquisition policy

Janez Štebe , ADP, Slovenia

IASSIST, Edinburgh, 25 May

• Goal of presentation: – To make an overview of usual criteria for

selection in acquisition– to implement them in an “inquiry form” to

collect evidence about studies– comment on a process of collecting evidence

User centred approach

• It is an aim in preparing data archive policies to put emphasis on users needs, expectations

• to deliver a highest quality service under the limits of resources available

• We would like that our collection consists of data of a high quality, relevant for a research purpose, timely, easily accessible etc.

A guide to archive workflow• Ekkehard Mochmann, Paul de Guchteneire:

The social science data archive step by step (http://www.ifdo.org/data/data_archive_workflow01.html)

• See also M. Gutmann, K. Schürer, D. Donakowski and Hilary Beedham: The selection, appraisal, and retention of social science data (Data Science Journal, 3, 2004, http://journals.eecs.qub.ac.uk/codata/Journal/contents/3_04/3_04pdfs/DS386.pdf)

First step

• 1. Identification of datasets (Mochmann, Guchteneire)– “Data archives have a function as national

repositories for datasets. They have presented themselves as such and they are seen as such.

– This implies that the archive has to keep a good overview of what is available and what is needed. “

• Users expects:

– that the archive has an evidence about all important social science studies and data materials, even those that are not yet available or are available elsewhere

Top 11 topics (ADP holding, 2005)

0

20

40

60

80

100

Next 11 topics (ADP holding, 2005)

0

20

40

60

80

100

Last topics (ADP holding, 2005)

0

20

40

60

80

100

"On going research registers...– ...are usually compiled on the basis of mail surveys. – Standard forms are sent out to researchers and research

institutes. – Information is gathered on research topics, type of

research, time schedules and methodology. – For the archive's acquisition it is very useful if the on

going research register includes information on whether machine readable datasets are produced as a result of the registered researched projects." (Mochmann, Guchteneire)

What kind of evidence about studies that exists do we have?

• SICRIS - (Slovenian Current Research Information System) for current research projects – is specialized in observing the research community at the

national level from the year 1998 forth– Does not cover previous years and commercially founded

research projects– Information gathered through the course of researching is

often missing (including - if a data set exists)– Unit of information is a broad research project – not a

specific study

COBISS

• national record-keeper for all Slovenian libraries. • Units particularly interesting are research reports. • up-to-date information on bibliography of

researcher• Potentially useful as a source of evidence about

data, but a lot of non-relevant results occur when searching for

• Does not cover non-academic sector

WEB• Usually there are lists of current and past research

projects on web pages of research institutes and universities.

• Limited amount of information about a study, hard to gauge archival potential

• Best source of information are research papers or reports, only that one has to go through to find a link to a data and to evaluate its potential

• Are there any experiences with automated search, web crawlers and similar?

An initiative at a national level to...

• Supply additional information in a records that are kept in SICRIS and COBISS about aspects of a study that are relevant for guiding a selection

• Simple question to include: – “Does a data set exists as a result of a study and

where it is available?”

• Founding bodies might encourage researchers to offer the data to an archive

barriers to preservation

• data currency (data producer wish to exploit their material more widely – lack of culture of sharing )

• question of a situation of empirical research (if publication activity is the main criteria, there is no real interest to share valuable data among others)

• parallel canals of data access has been established

• in knowledge society information increase its economic value, this tendency limits data sharing culture: need for a government regulation, to put emphasis on ethical concern about fostering solidarity among scholars;

• to help those that are excluded, powerless (students, young not yet well established researchers, or more individualistically oriented, to take a share in research through provision of access to data)

• Even so data producer is in competitive advantage (he knows the conceptualisation and the theory behind well in advance; has knowledge and skills of particularities of the analysis of that problem area).

• One would plead even for that kind of expertise to be regularly transferred among wider community: data confrontation seminars, visiting scholars, students exchange, etc.

• data centred integration of research efforts

“Inquiryforms...

• “The final step in identifying datasets is sending out inquiry forms” (Mochmann, Guchteneire)

• Our aim with the online form for collecting information about studies: – To cover those aspects that are important for selection

in acquisition – to keep it simple to reduce the burden of filling in the

information

• A request for giving information was send around widely both to general users community and specifically to potential data providers

• Little response to an inquiry was achieved in a first round

Introductory explanation in a form

• “... indents to compile a list of studies (...) that may be broadly interesting for secondary analysis.“

• Comment: – most general criteria for selection – a broad interest

(number of people that will potentially use ... for scientific purposes) as a dependent variable when testing predictive power of other criteria.

• “The criteria for selection of datasets to archive are complicated. A general guideline is whether or not the data are usable for future scientific research.” (Mochmann, Guchteneire)

• “The extent to which the data will advance knowledge” (Gutmann et al.)

• We may study current use patterns;– Current use patterns may not reflect entirely the

potential of studies

– Experts from a discipline may assist in selection; they are competent for evaluating scientific potential in a collaboration with the archivists

– Users suggestions and evaluation of perceived potential of data already in an archive and studies in an inventory.

Introductory explanation (...)• “By providing information to a studies inventory

you will contribute to a users centred data sources collection building.”

• Comments: – a common goal of archive as service provider and

users to achieve more exhausted covering of studies in a collection

– Emphasise personal and group benefits that follows from a contribution... (cf. Dillman: The Total Design Method - TDM)

Introductory explanations (...)• These information will be the basis for the selection

of the most important studies, that will be utilised in the first round of a coordinated archival processing.

• seeks for those studies, from which the machine readable data sets and their documentation exist and that are potentially available for academic use.

• we do not want to exclude any topic of the range of social science in general

• We primarily aim at including those studies that are characterised by most of the following qualities:– Theoretically or practically important single-nation

studies, that fill research gaps or have many implications for a wide range of practical problems, and are of long term scholarly value.

– Covers general or influential populations.

– Being part of comparative or continuous research

– Being of methodological excellence

The Total Design Method (Don A. Dillman)

• According to a exchange theory:– “a person is most likely to respond to a questionnaire

when the perceived costs of doing so are minimized, the rewards are maximised, and the respondent trust that the expected rewards will be delivered.”

• TDM: Identifying and designing each aspect of a inquiry form to maximise response

• The building of inventory of studies is an ongoing process

• Try to continue to build on trust – establishment of sponsorship of trusted

authorities...– direct access to addresses of people, personalise

the contact (e-mail in combination with telephone and personal visits)

– assures that rewards will be delivered

Introductory explanations (...)

• After the selection of relevant studies ADP will take on responsibility for archiving and processing the studies

a structure of inquiry form– “to get detailed information on the possible

archivability of specific data. – Who is the owner of the data, – when will the data be available, – and similar questions should be answered with

this form. “(Mochmann, Guchteneire)

a structure of inquiry form ...• Bibliographic information to identify a data

set – Original title of the study: – English title of the study:– Author Responsibility – Producer– Date of Production – Place of Production

Assessment of depositing probability

• Data are in ADP already

• There is a possibility of depositing data in ADP

• Uncertain about possibility of depositing data

• Current place of distribution (or source of information about study)– Distributor:– URL of the study/distributor:

Relevance

– ”The necessary knowledge to select a dataset for the archive is not well defined.

– The relevance of the dataset topics, studied population and the methods of data collection are taken into consideration. “ (Mochmann, Guchteneire)

Relevance

– “... interesting enough in view of general society, in view of developments in science?

– ... already well represented in the rest of the archive holdings, does the dataset contribute new aspects (...)?

– Is the method of data collection appropriate for the population and topics?

– Are the topics in the dataset rich enough for future analysis?” (Mochmann, Guchteneire)

Information about study to gauge relevance (...)

– Country: – Universe: – Series Name: – Topic classification: – Abstract:

Size (Mochmann, Guchteneire)

• inquiry form includes information about:– Sample Procedures– Mode of Data Collection – Number of units

– Number of variables– Geographic coverage

Administrative criteria (Mochmann, Guchteneire)

• Documentation• Privacy protection • Ownership

– We reserved a notes section in a form for additional information about conditions of availability

Descriptive relevance categories used in ADP: 6: methodological and substantive excellent studies without

many implication for a wide range of problems

7: studies that permits theoretical generalisations or relates on a practical problem, less influential

8: theoretically or practically important studies, studies that fill the research gap or has many implications for a wide range of practical problems, long term scholarly value

9: highest range, comparative or continuous research, influential populations, with methodological excellence

0%

20%

40%

60%

80%

100%

Less 6 7 8 9

relevance

includes fullquestions textincludes Codebook

Full StudydescriptionBasic

Study relevance and processing (ADP holding, 2005)

Results of an inquiry • EDAN INVENTORY OF DATA SETS

– East-European Data Archives Network – In 2002 intended to carry out its project under

the 6th FP– part of activities of EDAN members to prepare

a proposal – compile a preliminary list of data sets

• From December 2002 until mid January 2003 we registered 93 data sets

• About twenty different institutions were mentioned among producers of a data set. Most of them are academic research institutes.

• Almost all of the studies cover a period from 1990 onward, paying special attention to last years. Over twenty of them date in year 2000 and late.

– “Economics” “Politics”, “Social stratification and groupings”, “Social welfare”, “General Society and Culture” were among most frequent topics area

– Twelve of studies are multinational comparative data sets. (This was stated as one of the main criteria for selection)

– A selection cover well most known comparative survey of Central and East European region.

• On the time dimension seven are explicitly panel studies, and about twenty are repeated surveys. Rest are one-time cross-section surveys.

• A search for additional continuous studies...

Conclusions

• A response among EDAN archives was great • An inventory of studies is not conclusive. A

purpose of it could be to act as news forum - a first information about a study

• Suggestions:– complete some of the subject categories, about which

we know that there are parallel studies in different countries

– There are still omissions among comparative and continuous studies

Conclusions overall– Filling in an “inquiry form” is already a demanding

task – highly selective list of studies resulted out – Most of the entries are already on a top considering

criteria for selection in acquisition– It helps planning acquisition by showing important

studies in fields that are omitted from current holding – those deserve priority

– Don’t underestimate amount of knowledge and information that we are not covering in an archive – additional efforts should be taken to involve wider circle of scholars in an effort of collecting evidence about studies