Visualizing Life Course Sequences: A Comparison of Family Formation between East and West German...

Preview:

Citation preview

Visualizing Life Course Sequences: A Comparison of Family Formation between

East and West German Women

Tim F. Liao, University of Illinois

10 December 2010Institute of Sociology, Academia Sinica

Some Preliminary Considerations

• Elder (1985): "Life patterns are structured by variations in the timing, duration, and order of events."

• Rindfuss, Swicegood, and Rosenfeld (1987) in “Disorder in the Life Course: How Often and Does it Matter?” analyzed NLS of class 72 data, 5 school-employment events of 8 years, if equal probability 58=390,625 sequences. They encouraged researchers to take a more careful look at the life course as it is actually lived

• In this presentation, we will take a formal look at people’s life courses as they were actually lived, through the help of visualization of life course sequences

Recent Developments in Sequence Analysis

• Abbott’s work using optimal matching in the 1990s analysis the sociological article

• Initially critics highly questioned its value for social science research (Wu 2000, Levine 2000)

• In the past decade methodological developments (Lesnard 2006, 2008; Elzinga 2003, 2010; Hollister 2009; Stovel & Bolan 2004) introduced a ‘second wave’ of sequence analysis (Aisenbrey and Fasang 2010) that addresses much of this criticism

• Sequence analysis is becoming increasingly wide-spread and accepted especially in life course and employment research

Visualization of sequences: sequence index plots

• Introduced by Scherer 2001

• Stata.ado by Kohler, Brzinsky-Fay and Luniak 2006

• Each sequence represented by one line across a process time axis, different states indicated by different colors

Alternative Visual Methods

• Tree representation of clusters of transitions to adulthood (Aassve et al 2007)

• MDS Sequence index plot of family formation trajectories (Piccarreta & Lior 2010)

Tufte’s (2001) Graphic Excellence

• (1) Above all else show the data, i.e. tell the truth about the data

• (2) maximize the data-ink ratio,• (3) erase non-data-ink,• (4) erase redundant data-ink, and • (5) revise and edit

Advantages of sequence index plots

• Holistic representation of processes• Taking individual sequence seriously as a unit of analysis• Convey large amounts of information in single graph:

prevalence and timing of states, intermediary states, overall variability within and between sequences etc…

Nu

mb

er

Problems of sequence index plots

Sequence index plots can be deceptive due to the follow reasons:

1. Over-plotting: deceptive in terms of relative importance of individual sequences, problematic for N larger than a couple of hundreds

2. Unfair Comparison: difficult to compare groups of unequal sizes

3. Reliance on Order: lose sight of central tendencies in the timing of more than one transition within sequences

Example data: German Life History Study (GLHS)

• The larger study collected data since early 1980s on the life histories of some 8,500 men and women from 20 selected birth cohorts in West Germany and of more than 2,900 men and women from 13 selected birth cohorts in East Germany

• For this analysis: family formation sequences of 474 East (N=132) and West (N=342) German women born 1971

• Retrospective life history data, collected in 1997/1998 & follow up in 2005; we only use the data available in 2005, thus we can follow family formation until age 34

• Age range: 15-34• seven states of family formation: single, in a relationship, cohabiting,

cohabiting with a child, married, married with a child, and divorced/widowed

Problem I: over-plotting of 474 cases

Problem I: over-plotting of 948 cases

Problem I: over-plotting of 1932 cases

Problem II: comparative difficulty

Relative frequency sequence index plots

• To deal with over-plotting of large N: we draw random subsample and present as relative sequence index plots

• For comparing groups of unequal sizes: we compare differences in relative frequency of specific sequences (and states they contain) across plots of unequal sizes

Relative Sequence index plots of family formation sequences in West and East Germany

width of each line represents relative frequency of sequence in subgroup

Relatively more cohabitation with or without child in East Germany than in West Germany

Sequence index plots of clusters of family formation (based on Lesnard’s dynamic OM distance)

Relative frequency sequence index plots of family formation clusters (sum to 100%)

Relative sequence index plots in cohort comparison

Problem III: Order of Sequences

• One may lose sight of central tendencies in multiple transitions within sequences

• Ordering of sequences can change one’s overall perception of a graph

Ordered by timing of cohabitation or marriage?

Deceptive/not informative on timing/prevalence of marriage when ordered by cohabitation and vice versa

Multiple transition curve plots based on Kaplan Mayer survival curves

Calculate separate transition curves for each transition of interest:

1.in a relationship 2. cohabitation3. marriage4. cohabitation with child5. marriage with child

tt i

ii

in

dntS .)(ˆ

Multiple transition curve plots next to relative frequency index plots

The two new plots for East & West Germany

Multiple transition curves for family formation clusters

The two new plots for the marriage/late child cluster

The two new plots for the marriage/early child cluster

The two new plots for the late marriage cluster

Comments on multiple transition curve plots

• Most applicable for non-recurrent states, unidirectional processes – a lot of social science sequences, especially in life course applications are fairly unidirectional

• Especially informative in combination with relative sequence index plots: allowing us to visualize the variability of specific transitions and central tendencies in their timing jointly

Are they different enough?

Are they different enough?

There should be no significant difference between 2 random 100 sequences

Mean time spent in each state among East and West German Women

CC CN DW RL MC MN SG

east germany

Mea

n tim

e (n

=13

2)

043

8613

017

321

6

CC CN DW RL MC MN SG

west germany

Mea

n tim

e (n

=34

2)

043

8613

017

321

6

cohabiting, child

cohabiting, no child

divorced/widowed

in a relationship

married, child

married, no child

single

State distribution plots comparing East and West German Women

east germany

Fre

q. (

n=13

2)

1 14 29 44 59 74 89 106 125 144 163 182 201

0.0

0.2

0.4

0.6

0.8

1.0

west germany

Fre

q. (

n=34

2)

1 14 29 44 59 74 89 106 125 144 163 182 201

0.0

0.2

0.4

0.6

0.8

1.0

cohabiting, child

cohabiting, no child

divorced/widowed

in a relationship

married, child

married, no child

single

Conclusion & next stepsWe have developed relative frequency index plots and multiple transition

curve plots for examining life courses as they were actually lived

We still have the remaining tasks of testing differences between sequence patterns and shapes

Randomization test

Goal: develop three types of difference measures to test for difference in different aspects of sequences:

(1) Overall difference that can be decomposed into(2) Temporal structure: (a) within and (b) between sequence variation, (3) Qualitative difference in the occurrence of states

Another procedure for constructing relative frequency index plot

• 1. Divide the sample into P relative frequency groups;• 2. Choose a representative sequence pattern by modal

sequence to represent a relative frequency group;• 3. For a relative frequency group that has no modal pattern,

use medoid sequence (smallest sum of distances to all other sequences in each group) as a representative of each group;

• 4. When a sample cannot be neatly allocated into the total number of relative frequency groups P, a case can be proportionally allocated to adjacent groups with allocation probabilities determined by (N-P)/P;

• 5. Steps 2 & 3 are carried out on the reallocated sample;• 6. The choice of P depends on effective visual presentation.

Applying the Medoid Selection Procedure

Male - no

Fre

q. (

n=34

2)

Sep.93 Jul.94 Apr.95 Jan.96 Nov.96 Sep.97 Jul.98 Apr.99

0.0

0.2

0.4

0.6

0.8

1.0

Male - yes

Fre

q. (

n=37

0)

Sep.93 Jul.94 Apr.95 Jan.96 Nov.96 Sep.97 Jul.98 Apr.99

0.0

0.2

0.4

0.6

0.8

1.0

employment

further education

higher education

joblessness

school

training

State distribution plots by gender

Father Unemployed - no

Fre

q. (

n=59

5)

Sep.93 Jul.94 Apr.95 Jan.96 Nov.96 Sep.97 Jul.98 Apr.99

0.0

0.2

0.4

0.6

0.8

1.0

Father Unemployed - yes

Fre

q. (

n=11

7)

Sep.93 Jul.94 Apr.95 Jan.96 Nov.96 Sep.97 Jul.98 Apr.99

0.0

0.2

0.4

0.6

0.8

1.0

employment

further education

higher education

joblessness

school

training

State distribution plots by father’s unemployment

Grammar School - no

Fre

q. (

n=58

3)

Sep.93 Jul.94 Apr.95 Jan.96 Nov.96 Sep.97 Jul.98 Apr.99

0.0

0.2

0.4

0.6

0.8

1.0

Grammar School - yes

Fre

q. (

n=12

9)

Sep.93 Jul.94 Apr.95 Jan.96 Nov.96 Sep.97 Jul.98 Apr.99

0.0

0.2

0.4

0.6

0.8

1.0

employment

further education

higher education

joblessness

school

training

State distribution plots by grammar school background

Recommended