Upload
martha-may
View
217
Download
1
Tags:
Embed Size (px)
Citation preview
July 2008: LDA 1
What is… Quantitative Longitudinal Analysis?
Paul Lambert and Vernon GayleUniversity of Stirling
Prepared for: National Centre for Research Methods, Research Methods Festival, St Catherine’s College, Oxford, 2 July 2008
www.longitudinal.stir.ac.uk
July 2008: LDA 2
So what is quantitative longitudinal analysis?
You already know..Working with (survey) datasets with longitudinal
information (data about time) and the specialist techniques of statistical analysis that are appropriate
You maybe don’t realise..Complex data and data management componentsGroups of techniques and data types Reasons why longitudinal analysis is advocated
July 2008: LDA 3
Quantitative longitudinal research in the social sciences
• Survey resources
• Longitudinal– Data concerned with more than one time point
• [e.g. Taris 2000; Blossfeld and Rohwer 2002]
– Repeated measures over time • [e.g. Menard 2002; Martin et al 2006]
Data analysis is used to give a parsimonious summary of patterns of relations between variables in the survey dataset
July 2008: LDA 4
Motivations for QnLA
• Focus on change / stability• Focus on the life course
Distinguish age, period and cohort effectsCareer trajectories / life course sequences
• Focus on time / durationsSubstantive role of durations (e.g. Unemployment)
• Getting the ‘full picture’Causality and residual heterogeneityExamining multivariate relationshipsRepresentative conclusions
[e.g. Abbott 2006; Mayer 2005; Menard 2002; Baltagi 2001; Rose 2000; Dale and Davies 1994; Hannan and Tuma 1979; Moser 1958]
July 2008: LDA 5
What’s exciting about quantitative longitudinal analysis?
• A personal view:
By and large, the analytical & methodological issues aren’t so exciting: they have mostly been known about for some time
What is exciting is the rapid expansion of secondary quantitative longitudinal data, its quality, its volume and its accessibility
(a) - new data
(b) - new ways of accessing existing data
July 2008: LDA 6
Some comments on quantitative longitudinal analysis
• Working with secondary surveys – Expense of long-term data collection– Complex data files, need good habits in data management (using syntax) – In practical terms – lots of gruelling computer programming
• Resources for supporting researchersEasy access to data, e.g. – http://www.data-archive.ac.uk/ – http://www.esds.ac.uk/longitudinal/ Training in relevant analytical methods and data management, e.g. – http://www.longitudinal.stir.ac.uk/
• Distinctive research traditions and research centres..
July 2008: LDA 7
Research traditions (methodology)
1) Statistical methods for quantitative longitudinal data
• [esp. Dale & Davies 1994]
2) Research on data quality – Variable constructions in longitudinal
research• www.longitudinal.stir.ac.uk/variables/ • Harmonisation, standardisation, comparability
– Missing data and attrition
July 2008: LDA 8
Research traditions (applications)• ‘geographers study space and economists study time’ [adage
quoted in Fotheringham et al. 2000:245] Vast economics literature using techniques for temporal analysis Other social science disciplines are mostly catching up ..we’ll come back to geography later
• Data expansions c1990 -> new substantive applications areas – For example: – [Platt 2005] - ethnic minorities’ social mobility 1971-2001– [Pahl & Pevalin 2005] – Friendship patterns over time– [Verbakel & de Graaf 2008] – spouses effect on careers 1941-2003
• Here, one critical challenge is getting used to talking about time in a more disciplined way: e.g. traditional sociological characterisations of ‘the past’ and ‘social change’ may not be empirically satisfactory
July 2008: LDA 9
Some detail: Five traditions in Quantitative Longitudinal Analysis
cf. www.longitudinal.stir.ac.uk 1. Repeated cross-sections
2. Panel datasets 3. Cohort studies
4. Event history datasets 5. Time series analyses
10
Repeated cross-sectional data: in soc. sci., the most widely used longitudinal analysis
Survey Person Person-level Vars 1 1 1 38 1 1
1 2 2 34 2 2
1 3 2 6 - -
2 4 1 45 1 3
2 5 2 41 1 1
3 6 1 20 2 2
3 7 1 25 2 2
3 8 1 20 1 1
N_s=3 N_c=8
July 2008: LDA 11
Repeated cross sections
Easy to communicate & appealing: how things have changed between certain time points
Partially distinguishes age / period / cohortEasier to analyse – less data management However..
Don’t get other QnLR attractions (nature of changers; residual heterogeneity; causality; durations)
Hidden complications: are sampling methods, variable operationalisations really comparable? cf. http://www.longitudinal.stir.ac.uk/variables/ => measures
are more often robust than not...
July 2008: LDA 12
Example 1.1: UK Census• Directly access aggregate statistics from census
reports, books or web, e.g.:
• Census v’s Surveys: larger scale surveys often have more data points and more reliable measures
Wales: Proportion able to speak Welsh
Year 1891 1981 1991 2001
% 54 19 19 21
July 2008: LDA 13
Example 1.2i: Labour Force Survey yearly stats
Percent of UK workers with a higher degree, by employment category and gender (m / f )
Sample size ~35,000 m / 30,000 f each year
1991 1996 2001
Profess. 14.4 19.9 24.9
Non-Prof. 1.3 2.5 3.5
Profess. 11.0 24.4 28.3
Non-Prof 0.6 2.3 3.2
July 2008: LDA 14
Example 1.2ii: LFS and timeLog regression: odds of being a professional from LFS adult workers in 1991,
1996 and 2001
2.383 .000 10.842
-.955 .000 .385
.777 .000 2.174
-.857 .000 .424
.094 .000 1.098
-.195 .000 .823
-.030 .000 .971
-4.232 .000 .015
Higher degree
Female
Age in years (/10)
Age in years squared (/1000)
Time point 1991
Time point 2001
(Time in years)* (Higher Degree)
Constant
aB Sig. Exp(B)
Nagelkere R2=0.11a.
July 2008: LDA 15
Five traditions in Quantitative Longitudinal Analysis
1. Repeated cross-sections
2. Panel datasets 3. Cohort studies
4. Event history datasets 5. Time series analyses
July 2008: LDA 16
Panel Datasets
– ‘classic’ longitudinal design
– incorporates ‘follow-up’, ‘repeated measures’, and ‘cohort’
– Large and small scale panels are common
Information collected on the same cases at more than one point in time
17
Illustration: Unbalanced panelWave* Person Person-level Vars
1 1 1 38 1 36
1 2 2 34 2 0
1 3 2 6 9 -
2 1 1 39 1 38
2 2 2 35 1 16
3 1 1 40 1 36
3 2 2 36 1 18
3 3 2 8 9 -
N_w=3 N_p=3 *also ‘sweep’, ‘contact’,..
July 2008: LDA 18
Panel data advantages
• Study ‘changers’ – how many of them, what are they like, what caused change
• Control for individuals’ unknown characteristics (‘residual heterogeneity’)
• Develop a full and reliable life history – e.g. family formation, employment patterns
July 2008: LDA 19
Challenges for Panel data analysis• Complex data analysis and data management
• need for training & good habits (syntax programming)
• Data issues• confidential data; time lag until most useful data• variable constructions and comparability
• Unbalanced panels and attrition– Balanced data is still required for many analytical techniques
• transition tables; dynamic effects; trajectory profiles
– Unbalanced cases and attrition as missing data • Complete case analysis = ‘MCAR’• Ad hoc methods and imputatin• Missing data models, e.g. www.missingdata.org.uk
July 2008: LDA 20
Example 2.1: Panel transitions
Young people’s household circumstance changes by subjective well-being between 1994 and 1995.
BHPS youth panel, 11-14yrs in 1994, row percents. Stays happy
Cheers up
Becomes miserable
Stays miserable
N
HH Stable 54% 19% 10% 18% 499
HH Changes 42% 22% 14% 22% 81
21
Analytical approachesPanel data models:
Cases i Year t Variables 1 1 1 17 1 1
1 2 1 18 2 1
1 3 1 19 2 -
2 1 1 17 1 3
2 2 1 18 1 1
3 2 2 20 2 2
Yit = ΒXit + … + Є
July 2008: LDA 22
Panel data model types
• Fixed and random effects – Ways of estimating panel regressions
• Growth curves – Time effect in panel regression (cf. multilevel models)
• Dynamic Lag-effects models – Theoretically appealing...
Analytically complex and often need advanced or specialist software
Econometrics literature Stata / GLLAMM; R; S-PLUS; SABRE / GLIM; LIMDEP;
MLwiN; MPLUS; …
July 2008: LDA 23
Example 2.2: Panel model
BHPS 1994-8: Output from Variance Components Panel model fordeterminants of GHQ scale score (higher = more miserable), by individual
factors for multiple time points per persona
12.69 .168 .000 12.4 13.0
-1.36 .076 .000 -1.5 -1.2
-1.23 .082 .000 -1.4 -1.1
.50 .131 .000 .2 .8
-1.70 .141 .000 -2.0 -1.4
.00 .002 .055 .0 .0
-.07 .076 .356 -.2 .1
.03 .014 .020 .0 .1
ParameterIntercept
Female
In work
Unemployed
FT studying
Age in years
Holds degree ordiploma
Time point
Estimate Std. Error Sig.LowerBound
UpperBound
95% ConfidenceInterval
Variance components : Person level= 46%, individual level = 54%a.
July 2008: LDA 24
Five traditions in Quantitative Longitudinal Analysis
1. Repeated cross-sections
2. Panel datasets 3. Cohort studies
4. Event history datasets 5. Time series analyses
July 2008: LDA 25
Cohort Datasets
– Intuitive type of repeated contact data – e.g. ‘7-up’ series
− Cohort comparisons − e.g. UK Birth cohort studies in 1946, 1958, 1970 and 2000
Information on a group of cases which share a common circumstance, collected repeatedly
as they progress through a life course
July 2008: LDA 26
Cohort data and analysis in the social sciences
• Many circumstances parallel other panel types: Large scale studies ambitious & expensive Small scale cohorts still quite common…
Attrition problems often more severe Considerable study duration limits
Glenn (2005) argues that ‘cohort analysis’ should be specifically directed to understanding effects of ageing/progression over time• Other uses of cohort data are just = panel data• It remains hard - even with extensive cohort data - to
authoritatively understand ageing effects (age = period – cohort)
July 2008: LDA 27
Five traditions in Quantitative Longitudinal Analysis
1. Repeated cross-sections
2. Panel datasets 3. Cohort studies
4. Event history datasets 5. Time series analyses
July 2008: LDA 28
Event history data analysis[esp. Blossfeld et al 2007]
• Alternative data sources: – Panel / cohort (more reliable)– Retrospective (cheaper, but recall errors)
• Aka: ‘Survival data analysis’; ‘Failure time analysis’; ‘hazards’; ‘risks’; ..
Focus shifts to length of time in a ‘state’ -
analyses determinants of time in state
July 2008: LDA 29
Event histories differ:• In form of dataset (cases are spells of time in a state)
Raises data management challenges Comment: data analysis techniques are not well suited to complex
variates; some argue than many Event History applications are artificially simplistic in their variables
• In types of analytical method Many techniques are new (and/or not well known), and specialist
software may be needed
Time to labour market transitions Time to recidivism Time to end of cohabitation
July 2008: LDA 30
Key to event histories is ‘state space’ Episodes within state space : Lifetime work histories for 3 adults born 1935 State space Person 1 FT work PT work Not in work Person 2 FT work PT work Not in work Person 3 FT work PT work Not in work 1950 1960 1970 1980 1990 2000
July 2008: LDA 31
Illustration of a continuous time retrospective dataset Case Person Start
time End time
Duration Origin State
Destination state
{Other vars, person/state}
1 1 1 158 157 1 (FT) 3 (NW) 2 1 158 170 12 3 (NW) 3(NW) 3 2 1 22 21 3 (NW) 1 (FT) 4 2 22 106 84 1 (FT) 3 (NW) 5 2 106 149 43 3 (NW) 2 (PT) 6 2 149 170 21 2 (PT) 2 (PT) 7 3 1 10 9 1 (FT) 2 (PT) . . . . . . .
July 2008: LDA 32
Illustration of a discrete time retrospective dataset Case Person Discrete
Time Approx real time
State End of state
{Other person, state, or time unit level variables}
1 1 1 5 1 FT 0 2 1 2 20 1 FT 0 3 1 3 35 1 FT 0 4 1 4 50 1 FT 0 5 1 5 65 1 FT 0 6 1 6 80 1 FT 0 7 1 7 95 1 FT 0 8 1 8 110 1 FT 0 9 1 9 125 1 FT 0 10 1 10 140 1 FT 1 11 1 11 155 3 NW 0 12 1 12 170 3 NW 1 13 2 1 5 3 NW 0 14 2 2 20 3 NW 1 15 2 3 35 1 FT 0 16 2 4 50 1 FT 1 . . . . . .
July 2008: LDA 33
Event history analysis software
SPSS – limited analysis optionsStata – wide range of pre-prepared methodsSAS – as StataS-Plus/R – vast capacity but non-introductoryGLIM / SABRE – some unique optionsTDA – simple but powerful freewareMLwiN; lEM; {others} – small packages
targeted at specific analysis situations
July 2008: LDA 34
Eg 4.1 : Kaplan-Meir survival
BHPS males 1st job KM
duration in months
7006005004003002001000-100
Cu
m S
urv
iva
l
1.2
1.0
.8
.6
.4
.2
0.0
-.2
agricultural w k
semi,unskilled
skilled manual
foreman,technicians
farmers
sml props w /o
sml props w /e
personal service
routine non-mnl
service class,lo
service class,hi
July 2008: LDA 35
Eg 4.2: Cox’s regression
Cox regression estimates: risks of quicker exit from firstemployment state of BHPS adults
.194 .081 .017
-.617 .179 .001
-.062 .003 .000
.000 .000 .000
-.013 .001 .000
.214 .109 .049
-.003 .002 .061
.000 .004 .897
.006 .001 .000
Female
Self-employed
Age in 1990
Age in 1990 squared
Hope-Goldthorpe scale
Female*self-employed
Female* HG scale
Self-employed*HG scale
Female*Age in 1990
B SE Sig.
July 2008: LDA 36
Sequences / Trajectories: characterise event history progression through states into clusters / sequences / frameworks
• Growing recent social science interest
Optimal matching analysis / Correspondence analysis / log-linear models / Latent growth curves
• Often analyse membership of cluster as an outcome (Problem – neutrality of data, e.g., cluster 1= Men in full time
employment)
July 2008: LDA 37
Five traditions in Quantitative Longitudinal Analysis
1. Repeated cross-sections
2. Panel datasets 3. Cohort studies
4. Event history datasets 5. Time series analyses
July 2008: LDA 38
Time series data
Examples:• Unemployment rates by year in UK• University entrance rates by year by country
Comment: – Panel = many variables few time points
= ‘cross-sectional time series’ to economists– Time series = few variables, many time points
Statistical summary of one particular concept, collected at repeated time points from one or
more subjects
July 2008: LDA 39
Time Series Analysis
i) Descriptive analyses– charts / text commentaries on values by time periods
and different groups
– Widely used (=Repeated X-Sectional analysis)
ii) Time Series statistical models
Advanced methods of modelling data are possible, require specialist stats functions
• Autoregressive functions: Yt = Yt-1 + Xt + e
– Widely employed in business / economics, but limited use in other disciplines
July 2008: LDA 40
….Phew!
1. Repeated cross-sections
2. Panel datasets 3. Cohort studies
4. Event history datasets 5. Time series analyses
July 2008: LDA 41
Summary: Quantitative Longitudinal Analysis
i. Appealing analytical possibilities: e.g. analysis of change, controls for residual heterogeneity
ii. Pragmatic constraints: data access, management, & analytical methods; practical applications often over-simplify topics
iii. Uneven penetration of applications between research fields at present
1) Pro’s and cons to QnL research::
July 2008: LDA 42
Summary: Quantitative Longitudinal Analysis
i. Needs a bit of effort: learn software syntax, data management routines – workshops and training facilities available; exploit UK networks
ii. Remain substantively driven: dangers of ’methodolatry’ (applications ‘forced’ into favourite complex techniques) and simplification (simpler techniques in the more popular & influential reports)
iii. Learn by doing (..with Stata syntax examples..!)
2) Undertaking QnL research:
July 2008: LDA 43
Summary: Quantitative Longitudinal Analysis
i. Process of mainstreaming QLA into social science discourses (so we all need to know ‘what is’!!)
ii. Complex multi-process models: new data & software for complex longitudinal statistical models
iii. More new longitudinal data resources i. More and more micro-data (e.g. UKHLS)ii. Data linking (e.g. administrative datasets)iii. Geographical data over time
3) Some speculation on the future
References• Abbott, A. 2006. 'Mobility: What? When? How?' in Morgan, S.L., Grusky, D.B. and Fields, G.S. (eds.) Mobility and
Inequality. Stanford: Stanford University Press.• Baltagi, B.H. 2001. Econometric Analysis of Panel Data. New York: Wiley.• Blossfeld, H.P. and Rohwer, G. 2002. Techniques of Event History Modelling: New Approaches to Causal Analysis,
2nd Edition. Mawah, NJ: Lawrence Erlbaum Associates.• Blossfeld, H. P., Grolsch, K., & Rohwer, G. (2007). Event History Analysis with Stata. New York:
Lawrence Erlbaum • Davies, R.B. 1994. 'From Cross-Sectional to Longitudinal Analysis' in Dale, A. and Davies, R.B. (eds.) Analysing
Social and Political Change : A casebook of methods. London: Sage.• Fotheringham, A. S., Brunsdon, C., & Charlton, M. (2000). Quantitative Geography: Perspectives on Spatial Data
Analysis. London: Sage.• Glenn, N. D. (2005). Cohort Analysis, 2nd Edition. London: Sage.• Hannan, M. T., & Tuma, N. B. (1979). Methods for Temporal Analysis. Annual Review of Sociology, 5, 303-328.• Lambert, P.S., Prandy, K. and Bottero, W. 2007. 'By Slow Degrees: Two Centuries of Social Reproduction and
Mobility in Britain'. Sociological Research Online 12.• Martin, J., Bynner, J., Kalton, G., Boyle, P., Goldstein, H., Gayle, V., Parsons, S. and Piesse, A. 2006. Strategic Review
of Panel and Cohort Studies. London: Longview, and www.longviewuk.com/• Mayer, K.U. 2005. 'Life courses and life chances in a comparative perspective' in Svallfors, S. (ed.) Analyzing
Inequality: Life Chances and Social Mobility in Comparative Perspective. Stanford: Stanford University Press.• Menard, S. 2002. Longitudinal Research, 2nd Edition. London: Sage, Number 76 in Quantitative Applications in the
Social Sciences Series.• Moser, C. A. (1958). Survey Methods in Social Investigation. London: Heinemann.• Pahl, R., & Pevalin, D. (2005). Between family and friends: a longitudinal study of friendship choice. British Journal of
Sociology, 56(3), 433-450.• Platt, L. (2005). Migration and Social Mobility: The Life Chances of Britain's Minority Ethnic Communities. Bristol:
The Policy Press.• Rose, D. 2000. 'Researching Social and Economic Change: The Uses of Household Panel Studies'. London: Routledge.• Taris, T.W. 2000. A Primer in Longitudinal Data Analysis. London: Sage.• Verbakel, E., & de Graaf, P. M. (2008). Resources of the Partner: Support or Restriction in the Occupational Career
Developments in the Netherlands Between 1940 and 2003. European Sociological Review, 24(1), 81-95.