38
1 E-science resources for handling data on occupations, educational qualifications and ethnicity – the DAMES and GEODE projects Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

  • Upload
    zach

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

E-science resources for handling data on occupations, educational qualifications and ethnicity – the DAMES and GEODE projects. Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER. Structure of this talk. What is e-Science What is the Grid - PowerPoint PPT Presentation

Citation preview

Page 1: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

1

E-science resources for handling data on occupations, educational qualifications and ethnicity – the DAMES and GEODE projects

Paul Lambert University of Stirling

Vernon Gayle University of Stirling & ISER

Page 2: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

2

Structure of this talk• What is e-Science

• What is the Grid

• What can e-Social Science do for survey research?

• Grid Enabled Specialist Data Environments

• Special files and resources

• A little on occupations

• Something on education

• Almost nothing on ethnicity (it is only 30 mins)

Page 3: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

3

What is e-Science

• Originally experiments to connect together a few powerful computers

• The ability to connect high powered computers to undertake enormous calculations often on huge datasets

• “The Grid” = the co-ordination of geographically dispersed computing and data resources

Page 4: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

4

What is e-Science• “What is exciting about the Grid is the combination of

extensive connectivity, massive computer power and vast quantities of digitised data – all three of which are still rapidly expanding – making possible new applications that are orders of magnitude more potent than even a few years ago”

• “The term 'e-research' is sometimes used instead of 'e-science', with the advantage that gives more emphasis to the end result of better, richer, faster or new research results, rather than the technologies used to get them” (http://www.ncess.ac.uk/)

Page 5: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

5

The Grid• Grid computing (or the use of a computational grid)

is the application of several computers to a single problem at the same time

– usually to a scientific or technical problem that requires a great number of computer processing cycles or access to large amounts of data

• According to John Patrick, IBM’s vice president for Internet strategies, “the next big thing will be grid computing”

Page 6: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

6

E-Social Science in the UK

• ‘e-Science’ nowadays used as a broader term involving use of technologies associated with the Grid and with other collaborations between computing and software resources

• NCeSS: UK programme of projects looking at e-Science applications in social science projects (e.g. distributed computing; access and analysis of complex data; secure access to sensitive data)

Page 7: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

7

Less obviously some other activities where e-science could potentially help the research process

Data Preparation & Management

• Manipulating data

Recoding categories / ‘operationalising’ variables

These are the focus of the DAMES research Node, www.dames.org.uk

Page 8: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

8

Data Collection

(survey data agencies, academics etc.)

Data Storage & Curation

(data archives etc.)

Data Management and Analysis – “Lone Researcher”

Stand alone computer (usually a PC)

Statistical Software (e.g. SPSS)

The Orthodox Survey Research Process

Page 9: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

9

Statistical Analysis Process

• Awesome increases in desk top computing power (and storage capacity)

• Almost instant data download (from archives etc)

• The time ratio of data preparation to statistical modelling is probably about 10:1

Page 10: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

10

e-Social Science Possibilities

secure accesscomputing networks

Software (e.g. Sabre R)

Data Linking

Harmonisation

Page 11: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

11

e-Social Science Possibilities

secure accesscomputing networks

Software (e.g. Sabre R)

Data Linking

Harmonisation

Specialist Files & Resources

Page 12: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

12

Grid Enabled Specialist Data Environments (‘GE*DE’)

• Programme of activities within the DAMES research Node

• Coordinating access to and exploitation of specialist information resources in the fields of – Occupations– Educational qualifications– Ethnicity and migration

Page 13: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

13

An Example: Specialist Files & Resources

• A researcher has a survey with occupational information and wants to construct an occupation based social class measure

Page 14: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

14

An Example:Special Files & Resources

• Historically, the ‘information’ to construct the measure will be in the following forms

– books (or paper files)– www files (e.g. national statistical agencies)– computer files (e.g. Stata .do files)

Page 15: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

15

Clear problems...

• Access to the information– are the files publicly (or easily) available?– a “.do file” on a single researcher’s hard drive

• Unnecessary re-working– certain sources (e.g. paper) need lots of working to

produce properly coded survey data

An Example:Special Files & Resources

Page 16: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

16

• Replicability

– Is there clear information that allows a secondary researcher to use the resource?

– e.g. clear documentation

– what the information science community call metadata i.e. “data about other data”

– Dale (2006) for a discussion

Page 17: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

17

Motivation for our projects?

• We currently observe inadequate practices in survey data analysis

• Substandard weaknesses in data management is observed in the following areas

• Not keeping adequate records• Not linking relevant data • Not trying out relevant variable operationalisations

Page 18: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

18

Key Variables

• We concentrate (so far) on “key” variables

These are variables that are central to, and commonly found in, survey data analysis

They include... occupation, education, ethnicity, gender, age, income

(some survey variables are easier to deal with than others)

Page 19: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

19

An Example: Occupational Social Class

• As far back as the late 60s Frank Bechhoffer recommends that researchers should use established (and therefore replicable) social class schemes

• Consider a researcher with a survey with occupational information (e.g. SOC 90) and employment status information who wants to construct a social class measure

Page 20: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

20

GEODE -

• Portal to log into

• Searchable – can find resources

• e.g. SPSS file that allows linkage of Survey data to occupational social class scheme

• Further examples is our working paper (2008-1)

Page 21: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

21

Page 22: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

22

Page 23: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

23

Education:

• Education is a key social science measure that is included in an extremely wide variety of substantive analyses

• Education as an explanatory (X) variable:

Education is frequently used in statistical analyses as a key explanatory variable (usually with a number of other explanatory variables)

This is usual in areas such as sociology, social policy and economics

Page 24: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

24

Education:

• Education as an outcome (Y) variable:

In more specialist studies an education measure is itself of interest as an outcome (for example gaining a specific qualification or level of attainment)

This is common in educational studies and within the sociology of education

Page 25: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

25

Education:

“the question of how to measure education and qualifications – or indeed what ‘measure’ means – raises interesting issues…Since there is no agreed standard way of categorising educational qualifications”

(Prandy, Unt & Lambert 2004)

Page 26: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

26

Compare Education with

Occupational information

• Survey starts with textual description

• Translated into Occupational Unit Group

• Agreed standards of data collection & classification

OUG Scheme; Industrial sector; employment status

• No similar consensus with educational data

Page 27: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

27

Obvious issues with Educational variables

• Many measures (not just qualifications)

• Organisation and structure changes

• Changes in distributions over time

• We can learn from international comparisons

Page 28: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

28

Many MeasuresSome Examples of the 41 CategoriesHighest Qualification (General Household Survey 2003)

highest qualification | Freq. ----------------------------------------+------------ 1. higher degree | 669 2. nvq level 5 | 20 3. first degree | 1,416 4. other degree | 278 5. nvq level 4 | 71 6. diploma in higher education | 282 7. hnc/hnd btec higher etc | 551 9. teaching - secondary education | 55 10. teaching - primary education | 69 12. nursing etc | 267 14. other higher education below degree | 151 21. scotish 6th year certificate/csys | 24 28. city & guilds craft/part 2 | 306 29. btec/scotvec first or gen diploma e | 42 30. o level, gcse grase a*-c or equival | 2,460 31. nvq level 1 or equivalent | 102 33. gse below grade 1, gcse below grade | 693 41. dont know | 79 ----------------------------------------+------------

Total | 24,489

Page 29: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

29

Many MeasuresHighest Academic Qualification (British Household Panel Survey 1991 – Wave A)

highest academic | qualification | Freq. Percent Cum.----------------------------+----------------------------------- -9. missing | 19 0.19 0.19-7. proxy respondent | 352 3.43 3.61 1. higher degree | 122 1.19 4.80 2. 1st degree | 598 5.83 10.63 3. hnd,hnc,teaching | 496 4.83 15.46 4. a level | 1,362 13.27 28.73 5. o level | 2,510 24.45 53.19 6. cse | 529 5.15 58.34 7. none of these | 4,276 41.66 100.00----------------------------+----------------------------------- Total | 10,264 100.00

Page 30: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

30

Organisational ChangesType of School Attended by Birth Cohorts British Household Panel Survey 1991 – Wave A(Extract column percentages) cohorts type of school | attended | Prewar 1944 Act Crossland | Total----------------------+---------------------------------+---------- comprehensive sch | - 10.47 53.25 | 25.92 ----------------------+---------------------------------+---------- grammar not fee pa | 9.58 19.14 8.06 | 12.10 ----------------------+---------------------------------+---------- grammar fee-paying | 4.55 1.93 0.97 | 2.25 ----------------------+---------------------------------+---------- public & private | 5.52 5.63 4.68 | 5.22 ----------------------+---------------------------------+---------- elementary | 35.20 2.45 - | 10.35 ----------------------+---------------------------------+---------- secondary modern | - 52.11 24.01 | 33.64 ----------------------+---------------------------------+---------- technical | - 3.49 0.80 | 2.15 ----------------------+---------------------------------+----------1.Suspect errors – potentially misleading measure

Page 31: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

31

GHS 1983 GHS 2003O’Levels GCSE

educational level | Freq. Percent Cum.-------------------------+----------------------------------- -9. does not apply | 3,529 17.59 17.59 1. higher degree | 99 0.49 18.09 2. first degree | 790 3.94 22.03 3. teaching qual | 279 1.39 23.42 4. other higher qual | 651 3.25 26.66 5. nursing qual | 283 1.41 28.07 6. gce a level 2+ | 385 1.92 29.99 7. gce a level 1 | 688 3.43 33.42 8. gce o level 5+ | 1,439 7.17 40.60 9. gce o lev1-4 & cq | 418 2.08 42.68 10. gce o lev1-4 no cq | 1,053 5.25 47.9311. com qual no o levels | 704 3.51 51.44 12. cse grades 2-5 | 595 2.97 54.41 13. apprenticeship | 907 4.52 58.93 14. foreign quals | 154 0.77 59.70 15. other quals | 251 1.25 60.95 16. no quals | 7,734 38.56 99.51 17. no answer | 29 0.14 99.66 18 | 8 0.04 99.7020. never went to school | 61 0.30 100.00-------------------------+----------------------------------- Total | 20,057 100.00

education level - | Freq. Percent Cum.----------------------------------------+----------------------------------- -9. never attended school | 21 0.09 0.09 -8. na | 4 0.02 0.10 -6. child/out age/no int | 9,694 39.59 39.69 1. higher degree | 689 2.81 42.50 2. first degree | 1,765 7.21 49.71 3. teaching qualification | 213 0.87 50.58 4. other higher qualification | 979 4.00 54.58 5. nursing qualification | 259 1.06 55.63 6. gce a level in two or more subjects | 1,752 7.15 62.79 7. gce a level in one subject | 486 1.98 64.77 8. gcse/olevel, standard grades, 5+ | 1,915 7.82 72.59 9. gcse/olevel 1-4 | 1,257 5.13 77.7210. cse below grade 1, gcse below grade | 1,373 5.61 83.33 11. apprenticeship | 144 0.59 83.92 12. other qualification | 654 2.67 86.59 13. no qualification | 3,284 13.41 100.00----------------------------------------+----------------------------------- Total | 24,489 100.00

Changes in Qualification (titles & levels)

Page 32: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

32

Changes in Distributions

02

,000

4,0

006

,000

8,0

00

no quals some quals f.e. h.e.

count of own count of father

British Household Panel Survey (Wave M)Respondent’s Education Level and Father’s Education Level

Page 33: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

33

We can learn from international comparisons

CASMIN Brynin Example of BHPS & GSOEP

Page 34: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

34

Can e-Social science help us?

• Data discipline

• Data matching / merging

• Data access (confidential records)

(future changes in access agreements)

Page 35: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

35

What should we do in DAMES?

• Database of typologies of educational qualifications

– Listings / taxonomies of educational titles• e.g. based on what major social surveys have used

• Enhanced access to specialist data on educational qualifications

• Same model as GEODE?

• User friendly prescriptions for best practice in using educational data

• User friendly support for distributing data (and metadata) on education

Page 36: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

36

Ethnicity and the DAMES project

• Tricky topic to collate information on – Few recognisable ‘ethnic unit groups’ – Limited previous ‘data management’ reflection – Very few published databases on ethnicity– Important question of sparse distributions– Dynamic, & rapidly expanding

• Likely role is to give guidance on existing data / taxonomies and routines to allow their analysis

–category recodings–scaling of categories–{GEODE model with far fewer ‘Ethnicity unit groups’}

Page 37: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

37

Conclusions

• e-Social Science resources can help improve survey research

– assist with access to disparate resources

– help with data management (especially key variables)

– help with data standard and best practice

– help with replicability (and improve incremental science)

Page 38: Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

38

Brynin, M. (2003). Using CASMIN: the effect of education on wages in Britain and Germany, in Hoffmeyer-Zlotnik, J. and Wolf, C., Advances in Cross-National Comparison: A European Working Book for Demographic and Socio-Economic Variables, Kluwer: Amsterdam, 327-44.

Dale, A. (2006). Quality Issues with Survey Research. International Journal of Social Research Methodology, 9(2), 143-158.

Lambert, P. S., Tan, K. L. L., Turner, K. J., Gayle, V., Prandy, K., & Sinnott, R. O. (2007). Data Curation Standards and Social Science Occupational Information Resources.International Journal of Digital Curation, 2(1), 73-91.

Lambert, P.S., Gayle, V., Tan, L., Blum, J., Bowes, A., Jones, S., Turner, K., Warner, G., Sinnott, R., & Bihagen, E. (2008). Grid Enabled Specialist Data Environments: Forward Planning for GE*DE Services for Specialist Data

Occupations, Educational Qualifications, and Ethnicity, Dames Project Technical Paper 2008-1

Prandy, K., Unt, M., & Lambert, P. S. (2004). Not by degrees: Education and social reproduction in twentieth-century Britain. Paper presented at the ISA RC28 Research Committee on Social Stratification and Mobility.