61
Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ (www.dames.org.uk). Data Management, Documentation and Workflows for Social Survey Research

Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

Training Workshop, 24-25 November 2010, Univ. Stirling

Organised by the ESRC Node ‘Data Management through e-Social Science’ (www.dames.org.uk).

Data Management, Documentation and Workflows

for Social Survey Research

Page 2: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

2

Page 3: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

3

‘Data Management though e-Social Science’

DAMES – www.dames.org.uk

ESRC Node funded 2008-2011

Aim: Useful social science provisionsSpecialist data topics – occupations; education

qualifications; ethnicity; social care; health

Programme of case studies and provisions – more later

Page 4: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

1. Data Management, Documentation and Workflows for

Social Survey Research

Paul Lambert, 24-25 November 2010

Presented to ‘Documentation and Workflows for Social Survey Research’, a workshop organised by the ESRC ‘Data Management

through e-Social Science’ research Node

(www.dames.org.uk).

Page 5: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

5

Data management, documentation and workflows..

Defining data management, documentation and workflows in survey research

Documentation for replicationDocumentation for replication

Further comments and principles in effective social Further comments and principles in effective social survey researchsurvey research

Page 6: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

6

a) ‘Data management’ means… ‘the tasks associated with linking related data resources, with

coding and re-coding data in a consistent manner, and with accessing related data resources and combining them within the process of analysis’ […DAMES Node..]

Usually performed by social scientists themselvesMost overt in quantitative survey data analysis

• ‘variable constructions’, ‘data manipulations’• navigating abundance of data – thousands of variables

Usually a substantial component of the work process

Here we differentiate from archiving / controlling data itselfHere we differentiate from archiving / controlling data itself

Page 7: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

7

Some components…

Manipulating data Recoding categories / ‘operationalising’ variables

Linking data Linking related data (e.g. longitudinal studies) combining / enhancing data (e.g. linking micro- and macro-data)

Secure access to data Linking data with different levels of access permission Detailed access to micro-data cf. access restrictions

Harmonisation standards Approaches to linking ‘concepts’ and ‘measures’ (‘indicators’) Recommendations on particular ‘variable constructions’

Cleaning data ‘missing values’; implausible responses; extreme values

Page 8: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

8

Example – recoding data

Count

323 0 0 0 0 323

982 0 0 0 0 982

0 425 0 0 0 425

0 1597 0 0 0 1597

0 0 340 0 0 340

0 0 3434 0 0 3434

0 0 161 0 0 161

0 0 0 1811 0 1811

0 0 0 0 2518 2518

0 0 0 331 0 331

0 0 0 0 421 421

0 0 0 257 0 257

102 0 0 0 0 102

0 0 0 0 2787 2787

138 0 0 0 0 138

1545 2022 3935 2399 5726 15627

-9 Missing or wild

-7 Proxy respondent

1 Higher Degree

2 First Degree

3 Teaching QF

4 Other Higher QF

5 Nursing QF

6 GCE A Levels

7 GCE O Levels or Equiv

8 Commercial QF, No OLevels

9 CSE Grade 2-5,ScotGrade 4-5

10 Apprenticeship

11 Other QF

12 No QF

13 Still At School No QF

Highesteducationalqualification

Total

-9.001.00

Degree2.00

Diploma

3.00 Higherschool orvocational

4.00 Schoollevel orbelow

educ4

Total

Page 9: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

9

Example –Linking data Linking via ‘ojbsoc00’ : c1-5 =original data / c6 = derived from data / c7 = derived from www.camsis.stir.ac.uk

Page 10: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

10

‘The significance of data management for social survey research’

The data manipulations described above are a major component of the social survey research workload

Pre-release manipulations performed by distributors / archivists• Coding measures into standard categories; Dealing with missing records

Post-release manipulations performed by researchers • Re-coding measures into simple categories• All serious researchers perform extended post-release management (and have the scars to show for it)

We do have existing tools, facilities and expert experience to help us…but we don’t make a good job of using them efficiently or consistently

So the ‘significance’ of DM is about how much better research might be if we did things more effectively…

Page 11: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

11

Some provocative examples for the UK…

Social mobility is increasing, not decreasing− Popularity of controversial findings associated with Blanden et al (2004)− Contradicted by wider ranging datasets and/or better measures of stratification position− DM: researchers ought to be able to more easily access wider data and better variables

Degrees, MSc’s and PhD’s are getting easier− {or at least, more people are getting such qualifications}− Correlates with measures of education are changing over time − DM: facility in identifying qualification categories & standardising their relative value within

age/cohort/gender distributions isn’t, but should, and could, be widespread

‘Black-Caribbeans’ are not disappearing − As the 1948-70 immigrant cohort ages, the ‘Black-Caribbean’ group is decreasingly

prominent due to return migration and social integration of immigrant descendants − Data collectors under-pressure to measure large groups only− DM: It ought to remain easy to access and analyse survey data on Black-Caribbean’s, such

as by merging survey data sources and/or linking with suitable summary measures

Page 12: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

b) ‘Documentation’ refers to

Here we mean the ‘paper trail’ in the conduct of secondary survey research

For scientists, this is the log book / journal / laboratory notebook

12

Image of Alexander Graham Bell’s 1876 notebook, taken from: http://sandacom.wordpress.com/2010/03/11/the-face-rings-a-bell/

Page 13: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

Thought’s on documentation

Our tasks (organising and analysing electronic data) don’t seem to lend themselves to easy documentation Paper or electronic file summaries Different formats of data Rapid updates over time

Effective documentation is possible, but it requires some effort (Long, 2009)

13

Page 14: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

14

..good levels of documentation are not engrained in the social sciences!

“…Little or nothing is systematically archived from these electronic sources. How many of us routinely keep copies of our old word-processing files once they are no longer of current relevance for research or teaching activities. We have been reminded…of the insecurity and non-survival of departmental and professional files stored in broom cupboards, but how many electronic files even get into that cupboard in the first place?” Scott (2005: 142)

Page 15: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

c) ‘Workflows’

In general, a collection of processes which all link with or contribute to a wider project

The study of workflows involves the systematic organisation of those processes

Storing the elements of the processesNoting inter-dependencies Depicting the overall process (e.g. graphically)Modelling the overall process

15{Not in the dictionary… a made-up word to mean what we like!}

Page 16: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

16

The idea of workflows

Workflow modelling has an exciting future.. Workflow documentation

o MyExperiment [http://www.myexperiment.org/]o Social survey analysis [Dale, 2006; Freese, 2007;

Long, 2009]

At present…Tool development in processDepositing workflows might impose constraints/burdens

Page 17: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

17

Data file specification Variable manipulation & analysis

DAMES most common commands:

Commands invoking other packages

-> usedataset{UKDA_5151}

-> usedatafile{individuals wave A}

-> matchdata{individuals wave A;individuals wave B; link

variable=pid; format=wide}

-> SPSS{match files file=“aindresp.sav” /file=“bindresp.sav”

/by=pid}

-> SPSS{fre var=ajbrgsc}

-> Stata{recode ageb 16/30=1 31/50=2 *=.}

-> R{..}

-> Stata{do $path2\part1_analysis.do}

Model 1:

Graphic

Text interface

Invoked manually or in response to manipulating graphs

BHPS, wave A individuals

BHPS wave B individuals.

Analytical file

Wave C

Gender Current job RGSC

Spouse CAMSIS

Age (yrs) Age

bands

Spouse SOC

Page 18: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

Syntax file image

18

Page 19: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

Example of using MS Excel for workflow documentation

19

Page 20: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

20

A bit of focus…

Most of the DAMES applications aim to facilitate one of two data management activities and their documentation:

1) Variable constructions o Coding and re-coding values

2) Linking datasetso Internal and external linkages

Page 21: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

21

A bit more focus…

The current workshop is concerned with research practices and facilities for social survey data management

To raise for discussion important topics associated with data management

To illustrate effective means of achieving good practice during data management

o Software perspectives – e.g. Treiman 2009; Long 2009; Levesque 2010; Sarantakos 2007

o A focus on ‘Stata’

Page 22: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

22

Why did Stata suddenly come into this?

Data management requirements

Specific tasks Generic approaches

Bespoke database software

Governance models

E-Social Science

Researchers’ database software

(SPSS, Stata, etc)

We see Stata emerging as effective for specific tasks

and compatible with generic approaches

Page 23: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

23

Data management, documentation and workflows..

Defining data management, documentation and Defining data management, documentation and workflows in survey researchworkflows in survey research

Documentation for replicationDocumentation for replication

Further comments and principles in effective social Further comments and principles in effective social survey researchsurvey research

Page 24: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

24

‘Documentation for replication’

..as a reasonable expectation for scientific research that is cumulative and based upon empirical observation…

Steuer, M. (2003). The Scientific Study of Society. Boston: Kluwer Academic.Dale, A. (2006). Quality Issues with Survey Research. International Journal of

Social Research Methodology, 9(2), 143-158.Freese, J. (2007). Replication Standards for Quantitative Social Science: Why

Not Sociology? Sociological Methods and Research, 36(2), 153-171.

…See our first lab session on using software effectively for documentation for replication…

Page 25: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

25

What needs replication? Your own analysis (in response to comments,

revisions, requests for access) Others’ analysis

To build upon – cumulative science To critique / cross-examine

In secondary survey research Complex data is often updated (new related records; revised

and re-released; re-weighted or re-standardardised; new levels of access/linkage)

New analysis feasible - variable operationalisations; new statistical methods

Page 26: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

26

J. Scott Long (2009)

Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press.

1-5: Programming in Stata6: Cleaning your data7: Analysing data and presenting results8: Protecting your work

Page 27: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

27

Treiman (2009)

Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey Bass.

Good professional practice = Suitable choice of analytical methods to test

ideas Documentation of choices and data operations

Page 28: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

28

How to approach documentation for replication in social survey research?

Made easy by secondary access to datasets and standardised software

1) Using software effectively• See our ‘software session 1’

2) Careful syntactical documentation

3) Workflow perspectives / tools

4) Metadata standards

Page 29: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

29

Keep clear records of your DM activities!

Reproducible (for self)Replicable (for all)Paper trail for whole

lifecycleCf. Dale 2006; Freese 2007

In survey research, this means using clearly annotated syntax files (e.g. SPSS/Stata)

Syntax Examples: www.longitudinal.stir.ac.uk

Page 30: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

30

Stata syntax example (‘do file’)

Page 31: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

31

Syntax documentation

Long (2009) is highly prescriptive {may not be wholly attainable}

Key issues: 1. Textual level command specification2. Organisation of syntax files

**Master files and subfiles (and macros)**

3. Setting consistent paths to source data4. Reasonable level of manual annotation of files 5. Use a text editor!!

Page 32: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

32

Page 33: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

33

Page 34: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

34

The idea of workflows

Workflow modelling is exciting future.. Workflow documentation

o MyExperiment [http://www.myexperiment.org/]o Social survey analysis [Dale, 2006; Freese, 2007;

Long, 2009]

At present…Waiting for tool developmentDepositing workflows might impose constraints/burdens

Page 35: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

35

Metadata documents for documentation for replication

Metadata documents can/should be stored / distributed / disseminated

Main relevant types of metadata documents:

a) Annotated syntax files

b) Handwritten workbooks

c) Codebooks and data file metadata

Page 36: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

36

Annotated syntax files Storage:

Supply authorship details, conditions of access, origins and context of data, software version

‘Robustify’ your programme (generic locations; ‘capture drop’) Dissemination:

Available from authors archive Repec – http://ideas.repec.org/ (Economics) GEODE/DAMES – www.dames.org.uk (Occupations, Education) UKDA/ESDS and related data providers (monitored) Personal webpages – e.g.

www.camsis.stir.ac.uk/downloads/data/other/casoc_isco.do

Page 37: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

37

Page 38: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

38

Handwritten workbooks

Key here is that they must be published..• Technical papers• Websites• ….• An emerging payoff - citation indexing!

o Croxford, L. (2004). Construction of Social Class Variables. Edinburgh: Working Paper 4 of the ESRC research project on Education and Youth Transitions in England, Wales and Scotland, 1984-2002, Centre for Educational Sociology,

University of Edinburgh, and http://www.ces.ed.ac.uk/eyt/EYT_papers/WP04.pdf.

Page 39: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

39

“Because claims in published papers that additional materails are “available from author” usually prove false, at least after a few months, the California Center for Population Research at UCLA recently implemented a mechanism by which additional materials, for example, -do- and –log- files, can be attached to papers posted in its Population Working Paper archive. Other research centers are to be encouraged to do the same” (Treiman, 2009: 404)

Page 40: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

40

E-Science and workflow documentation tools..

…seek to capture the full record of the work process and all files relevant for documentation (e.g. http://www.myexperiment.org/)

Page 41: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

41

Codebooks and data file metadata

Codebook log using data_file_name_codebook.log, replace textdisp "DateTime: $S_DATE $S_TIME"notesdatasignaturecodebook, compresscodebookdescribelabelbook, detaillog close

See UKDA: data_dictionary.rtf

Page 42: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

42

Metadata standards

Formal standards for recording data existmost widely used is the ‘DDI’, Data Documentation

Initiative, http://www.icpsr.umich.edu/DDI/)Xml format typewritten or software derived, can be

read by software / browsers Includes options for variable labels, recodes, text

descriptions

See UKDA, study_information.htm NESSTAR

Page 43: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

43

Page 44: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

44

Page 45: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

45

Page 46: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

46

NESSTAR

Page 47: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

47

Summary: Documentation and workflows

Achieving good documentation is facilitated by effective workflows

o File locations / stamps / transferability o Variable metadata o Structured logs of all operations – syntax programs

Page 48: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

48

Data management, documentation and workflows..

Defining data management, documentation and Defining data management, documentation and workflows in survey researchworkflows in survey research

Documentation for replicationDocumentation for replication

Further comments and principles in effective social survey research

Page 49: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

49

Data management components of the survey research process

4 good habits and principles

3 Challenges

Page 50: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

50

(a) Good habit: Keep clear records of your DM activities

Reproducible (for self)Replicable (for all)Paper trail for whole

lifecycleCf. Dale 2006; Freese 2007

In survey research, this means using clearly annotated syntax files (e.g. SPSS/Stata)

Syntax Examples: www.longitudinal.stir.ac.uk

Page 51: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

51

Stata syntax example (‘do file’)

Page 52: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

52

Software and handling variables – our view

Stata is the superior package for secondary survey data analysis:

o Advanced data management and data analysis functionalityo Supports easy evaluation of alternative measures (e.g. est

store)o Culture of transparency of programming/data manipulationo Cf. Scott Long (2009)o But: Not available to all users

Page 53: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

53

(b) Principle: Use existing standards and previous research

Variable operationalisationsUse recognised recodes / standard classifications

• NSI harmonisation standards (e.g. ONS)• Cross-national standards [Hoffmeyer-Zlotnick & Wolf 2003;

Harkness et al. 2005; Jowell et al. 2007] • Research reviews [e.g. Shaw et al. 2007]• Common v’s best practices (e.g. dichotomisations)

Use reproducible recodes / classifications (paper trail)

Other data file manipulations• Missing data treatments• Matching data files (finding the right data)

Page 54: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

54

(c) Principle: Do something, not nothing

We currently put much more effort into data collection and data analysis, and neglect data manipulation

Survey research – the influence of ‘what was on the archive version’

…In my experience, a common reason why people didn’t do more DM was because they were frightened to…

Page 55: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

55

(d) Principle: Learn how to match files (‘deterministic’)

Complex data (complex research) is distributed across different files. In surveys, use key linking variables for... One-to-one matching

SPSS: match files /file=“file1.sav” /file=“file2.sav” /by=pid. Stata: merge pid using file2.dta

One-to-many matching (‘table distribution’)SPSS: match files /file=“file1.sav” /table=“file2.sav” /by=pid .Stata: merge pid using file2.dta

Many-to-one matching (‘aggregation’)SPSS: aggregate outfile=“file3.sav” /meaninc=mean(income) /break=pid. Stata: collapse (mean) meaninc=income, by(pid)

Many-to-Many matches

Related cases matching

Page 56: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

56

Some challenges for data management..

(e) Agreeing about variable constructions

Unresolved debates about optimal measures and variables

Esp. in comparative research such as across time, between countries

In DAMES, we have particular interests in comparability for: Longitudinal comparability

(http://www.longitudinal.stir.ac.uk/variables/) Scaling / scoring categories to achieve ‘meaning equivalence’

or ‘specific measures’

Page 57: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

57

Some challenges for data management..

(f) Worrying about data security

DM activities could challenge data security Inspecting individual cases Multiple copies of related data files Ability to link with other datasets ‘Hands-on’ model of data review

New and exciting data resources • have more individual information• are more likely to be released with stringent conditions• may jeopardize traditional DM approaches

Page 58: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

58

Some routes to secure data

Secure ‘portals’ for direct access to remote data

Secure settings (e.g. safe labs)Data annonymisation and attenuation Emphasis on users’ responsibility rather than

the data provider

Page 59: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

59

Some challenges for data management..

(g) Incentivising documentation / replicability

There is little to press researchers to better document DM, but much to press them not to

• Make DM and its documentation easier?• Reward documentation (e.g. citations)?

Page 60: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

60

Data management, documentation and workflows..

Defining data management, documentation and Defining data management, documentation and workflows in survey researchworkflows in survey research

Documentation for replicationDocumentation for replication

Further comments and principles in effective social Further comments and principles in effective social survey researchsurvey research

Page 61: Training Workshop, 24-25 November 2010, Univ. Stirling Organised by the ESRC Node ‘Data Management through e-Social Science’ ()

61

References

Blanden, J., Goodman, A., Gregg, P., & Machin, S. (2004). Changes in generational mobility in Britain. In M. Corak (Ed.), Generational Income Mobility in North America and Europe. Cambridge: Cambridge University Press.

Dale, A. (2006). Quality Issues with Survey Research. International Journal of Social Research Methodology, 9(2), 143-158.

Freese, J. (2007). Replication Standards for Quantitative Social Science: Why Not Sociology? Sociological Methods and Research, 36(2), 2007.

Harkness, J., van de Vijver, F. J. R., & Mohler, P. P. (Eds.). (2003). Cross-Cultural Survey Methods. New York: Wiley.

Hoffmeyer-Zlotnik, J. H. P., & Wolf, C. (Eds.). (2003). Advances in Cross-national Comparison: A European Working Book for Demographic and Socio-economic Variables. Berlin: Kluwer Academic / Plenum Publishers.

Jowell, R., Roberts, C., Fitzgerald, R., & Eva, G. (2007). Measuring Attitudes Cross-Nationally. London: Sage.

Levesque, R., & SPSS Inc. (2010). Programming and Data Management for SPSS 18.0: A Guide for PASW Statistics and SAS users. Chicago, Il.: SPSS Inc.

Long, J. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press. Sarantakos, S. (2007). A Tool Kit for Quantitative Data Analysis Using SPSS. London: Palgrave MacMillan. Scott, J. (2005). Some principal concerns in the shaping of Sociology. In A. H. Halsey & W. G. Runciman

(Eds.), British Sociology: See from without and within (pp. 136-144). London: The British Academy. Shaw, M., Galobardes, B., Lawlor, D. A., Lynch, J., Wheeler, B., & Davey Smith, G. (2007). The Handbook of

Inequality and Socioeconomic Position: Concepts and Measures. Bristol: Policy Press. Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey

Bass. University of Essex, & Institute for Social and Economic Research. (2009). British Household Panel Survey:

Waves 1-17, 1991-2008 [computer file], 5th Edition. Colchester, Essex: UK Data Archive [distributor], March 2009, SN 5151.