Www.ioe.ac.uk/bedfordgroup Analysing Variability Between Neighbourhoods By Exploiting Survey Design Features Paper for What is Multilevel Modelling? session

www.ioe.ac.uk/bedfordgroup

Analysing Variability Between Neighbourhoods By Exploiting

Survey Design Features

Paper for ‘What is Multilevel Modelling?’ session Research Methods Festival, Oxford, 2 July 2004.

Ian Plewis

Institute of Education, University of London

[email protected]

Sample surveys with a clustered design tend to be more efficient than surveys using simple random samples.

Clustering does, however, introduce complexities in theanalysis because cases within a cluster are more similar,on average, than cases in different clusters. The degreeof similarity is represented by the intra-cluster (orintra-class) correlation. We can adjust standard errors to allow for clusteringwithin a number of statistical packages.

However, the clustering might be informative in the sense that the clusters represent neighbourhoods (orinstitutions) that could exert an independent orcontextual effect on a social or developmental process. In other words, clustering is not necessarily a statisticalnuisance. Rather it can be exploited to throw more lighton social processes.

The Millennium Cohort Study population is a population

of children defined as:

all children born between 1 September 2000 and 31August 2001(for England and Wales), and between 23 November 2000 and 11 January 2002 (for Scotland and Northern Ireland), alive and living in the UK at age nine months, and eligible to receive Child Benefit at that age;

and, after nine months:

for as long as they remain living in the UK at the time of sampling.

The MCS Population

All children living in the selected wards:

ENGLAND: Advantaged 110

ENGLAND: Disadvantaged 71

ENGLAND: Ethnic 19

WALES: Advantaged 23

WALES: Disadvantaged 50

SCOTLAND: Advantaged 32

SCOTLAND: Disadvantaged 30

N.IRELAND: Advantaged 23

N.IRELAND: Disadvantaged 40

TOTAL 398

MCS Target Sample, Sweep 1

Observed Mean cluster size across the UK is 47 but the range is from 7 to 403.

We can generate a measure of the main respondent’s

perceptions of her neighbourhood from a set of five

items about vandalism, pollution etc.

This measure can vary from 0 to 15 and, althoughskewed to the right, will be treated as having a

Normal distribution.

Example from Sweep 1 of MCS:

We will attempt to explain the variation in this measure initially in terms of individual

characteristicsusing a multiple regression model:

Mother’s age

Number of children

Lone parent status

Receiving benefits

Ethnic group (8 categories)

Example from Sweep 1 of MCS:

Figure 1: Within, between, and total regressions.

(Snijders, T and Bosker, R (1999), Multilevel Analysis. London: Sage Publications)

Estimate s.e.

Mother’s age 0.059 0.004

No. of children -0.16 0.023

Lone parent status -0.44 0.067

On benefits -0.80 0.054

Ethnic group:

Mixed -0.41 0.22

Indian 0.44 0.15

Pakistani 0.99 0.12

Bangladeshi 1.0 0.18

Black Caribbean 0.25 0.19

Black African -0.092 0.17

Other 0.28 0.16

The model also includes dummies for stratum to allow for the unequal probabilities of selection.

R2 = 0.18

Table 1: Multiple Regression Estimates

The multiple regression model ignores ‘ward’ and we would expect variation between wards for measures of neighbourhoods.

We first fit a simple two level model, just including a random intercept to estimate variation between wards (level-two variance) and compare that variation with variation within wards (level-one variance). We can represent the relative strengths of the two sources of variation by the intra-cluster correlation. The estimate is 0.26 which is important and there is, therefore a prima facie case for including ‘ward’ in any model.

Estimate (MR)

s.e. Estimate (MLM)

s.e.

Mother’s age 0.059 0.004 0.056 0.004

No. of children -0.16 0.023 -0.17 0.021

Lone parent status -0.44 0.067 -0.36 0.063

On benefits -0.80 0.054 -0.72 0.051

Ethnic group:

Mixed -0.41 0.22 0.10 0.21

Indian 0.44 0.15 0.64 0.15

Pakistani 0.99 0.12 1.1 0.13

Bangladeshi 1.0 0.18 1.2 0.19

Black Caribbean 0.25 0.19 1.0 0.19

Black African -0.092 0.17 0.97 0.17

Other 0.28 0.16 0.83 0.16

Between ward variance

n.a. 1.3 0.11

Within ward variance

8.5 0.090 7.3 0.078

Table 2: Comparing Estimates from a Multiple Regression and a Two Level Model

We have one external measure at the ward level – the Child Poverty Index (part of the Index of Multiple Deprivation or IMD2000).

Does this explain variation between wards in neighbourhood satisfaction?

Table 3: Comparing Estimates from Two Level Models without and with CPI

Estimate (MLM) s.e. Estimate (+CPI) s.e.

Mother’s age 0.056 0.004 0.055 0.004

No. of chn. -0.17 0.021 -0.17 0.021

Lone parent status -0.36 0.063 -0.35 0.063

On benefits -0.72 0.051 -0.71 0.051

Ethnic group:

Mixed 0.10 0.21 0.09 0.21

Indian 0.64 0.15 0.61 0.15

Pakistani 1.1 0.13 1.1 0.13

Bangladeshi 1.2 0.19 1.2 0.19

Black Caribbean 1.0 0.19 0.98 0.19

Black African 0.97 0.17 0.95 0.17

Other 0.83 0.16 0.81 0.16

CPI -0.044 0.0054

Between ward variance

1.3 0.11 1.1 0.092

Within ward variance 7.3 0.078 7.3 0.078

If CPI is included in the single level multiple regression model then the estimate is:

-0.041 with a much lower standard error of 0.0021

Why did the estimated coefficients for some of the ethnic groups change so much when we move from multiple regression (where the estimate is a function of within and between group estimates) to a multilevel model (where the within and between regressions are assumed to be the same)?

Perhaps the ethnic group estimates vary from ward to ward. In other words, perhaps there are random slopes. It would be a little difficult to allow each ethnic group to have its own random slope so instead let us look at a white/non white split.

Table 4: Random Slopes Model

Estimate (MLM)

s.e.

Non White 0.75 0.10

95% Coverage interval -0.39 to 1.9

Between ward variance (intercept)

1.3 0.24

Between ward variance (slope) 0.33 0.15

Correlation: intercept & slope -0.45

Within ward variance 7.3 0.078There are good reasons to suppose that some of the ward variation in both intercept and slope can be explained by the proportion of white respondents in the ward.

Table 5: Random Slopes Model with Proportion White

Estimate (MLM)

s.e.

Non white 1.5 0.24

Proportion white 2.4 0.69

Non white*proportion white -1.0 0.34

Between ward variance (intercept)

1.3 0.23

Between ward variance (slope)

0.26 0.14

Correlation: intercept & slope

-0.51

Within ward variance 7.3 0.078

Conclusions

1. Multilevel modelling, carefully used, can throw light on complex processes and give us a better understanding of within and between group relations.

2. Our results show that there are differences between white and non white respondents in their perceptions of their neighbourhood. However, the differences between the two groups are more marked in wards with a low proportion of white respondents.

Documents

Www.ioe.ac.uk/bedfordgroup Analysing Variability Between Neighbourhoods By Exploiting Survey Design Features Paper for What is Multilevel Modelling? session