Upload
chase-cahill
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
www.ioe.ac.uk/bedfordgroup
Analysing Variability Between Neighbourhoods By Exploiting
Survey Design Features
Paper for ‘What is Multilevel Modelling?’ session Research Methods Festival, Oxford, 2 July 2004.
Ian Plewis
Institute of Education, University of London
Sample surveys with a clustered design tend to be more efficient than surveys using simple random samples.
Clustering does, however, introduce complexities in theanalysis because cases within a cluster are more similar,on average, than cases in different clusters. The degreeof similarity is represented by the intra-cluster (orintra-class) correlation. We can adjust standard errors to allow for clusteringwithin a number of statistical packages.
However, the clustering might be informative in the sense that the clusters represent neighbourhoods (orinstitutions) that could exert an independent orcontextual effect on a social or developmental process. In other words, clustering is not necessarily a statisticalnuisance. Rather it can be exploited to throw more lighton social processes.
The Millennium Cohort Study population is a population
of children defined as:
all children born between 1 September 2000 and 31August 2001(for England and Wales), and between 23 November 2000 and 11 January 2002 (for Scotland and Northern Ireland), alive and living in the UK at age nine months, and eligible to receive Child Benefit at that age;
and, after nine months:
for as long as they remain living in the UK at the time of sampling.
The MCS Population
All children living in the selected wards:
ENGLAND: Advantaged 110
ENGLAND: Disadvantaged 71
ENGLAND: Ethnic 19
WALES: Advantaged 23
WALES: Disadvantaged 50
SCOTLAND: Advantaged 32
SCOTLAND: Disadvantaged 30
N.IRELAND: Advantaged 23
N.IRELAND: Disadvantaged 40
TOTAL 398
MCS Target Sample, Sweep 1
Observed Mean cluster size across the UK is 47 but the range is from 7 to 403.
We can generate a measure of the main respondent’s
perceptions of her neighbourhood from a set of five
items about vandalism, pollution etc.
This measure can vary from 0 to 15 and, althoughskewed to the right, will be treated as having a
Normal distribution.
Example from Sweep 1 of MCS:
We will attempt to explain the variation in this measure initially in terms of individual
characteristicsusing a multiple regression model:
Mother’s age
Number of children
Lone parent status
Receiving benefits
Ethnic group (8 categories)
Example from Sweep 1 of MCS:
Figure 1: Within, between, and total regressions.
(Snijders, T and Bosker, R (1999), Multilevel Analysis. London: Sage Publications)
Estimate s.e.
Mother’s age 0.059 0.004
No. of children -0.16 0.023
Lone parent status -0.44 0.067
On benefits -0.80 0.054
Ethnic group:
Mixed -0.41 0.22
Indian 0.44 0.15
Pakistani 0.99 0.12
Bangladeshi 1.0 0.18
Black Caribbean 0.25 0.19
Black African -0.092 0.17
Other 0.28 0.16
The model also includes dummies for stratum to allow for the unequal probabilities of selection.
R2 = 0.18
Table 1: Multiple Regression Estimates
The multiple regression model ignores ‘ward’ and we would expect variation between wards for measures of neighbourhoods.
We first fit a simple two level model, just including a random intercept to estimate variation between wards (level-two variance) and compare that variation with variation within wards (level-one variance). We can represent the relative strengths of the two sources of variation by the intra-cluster correlation. The estimate is 0.26 which is important and there is, therefore a prima facie case for including ‘ward’ in any model.
Estimate (MR)
s.e. Estimate (MLM)
s.e.
Mother’s age 0.059 0.004 0.056 0.004
No. of children -0.16 0.023 -0.17 0.021
Lone parent status -0.44 0.067 -0.36 0.063
On benefits -0.80 0.054 -0.72 0.051
Ethnic group:
Mixed -0.41 0.22 0.10 0.21
Indian 0.44 0.15 0.64 0.15
Pakistani 0.99 0.12 1.1 0.13
Bangladeshi 1.0 0.18 1.2 0.19
Black Caribbean 0.25 0.19 1.0 0.19
Black African -0.092 0.17 0.97 0.17
Other 0.28 0.16 0.83 0.16
Between ward variance
n.a. 1.3 0.11
Within ward variance
8.5 0.090 7.3 0.078
Table 2: Comparing Estimates from a Multiple Regression and a Two Level Model
We have one external measure at the ward level – the Child Poverty Index (part of the Index of Multiple Deprivation or IMD2000).
Does this explain variation between wards in neighbourhood satisfaction?
Table 3: Comparing Estimates from Two Level Models without and with CPI
Estimate (MLM) s.e. Estimate (+CPI) s.e.
Mother’s age 0.056 0.004 0.055 0.004
No. of chn. -0.17 0.021 -0.17 0.021
Lone parent status -0.36 0.063 -0.35 0.063
On benefits -0.72 0.051 -0.71 0.051
Ethnic group:
Mixed 0.10 0.21 0.09 0.21
Indian 0.64 0.15 0.61 0.15
Pakistani 1.1 0.13 1.1 0.13
Bangladeshi 1.2 0.19 1.2 0.19
Black Caribbean 1.0 0.19 0.98 0.19
Black African 0.97 0.17 0.95 0.17
Other 0.83 0.16 0.81 0.16
CPI -0.044 0.0054
Between ward variance
1.3 0.11 1.1 0.092
Within ward variance 7.3 0.078 7.3 0.078
If CPI is included in the single level multiple regression model then the estimate is:
-0.041 with a much lower standard error of 0.0021
Why did the estimated coefficients for some of the ethnic groups change so much when we move from multiple regression (where the estimate is a function of within and between group estimates) to a multilevel model (where the within and between regressions are assumed to be the same)?
Perhaps the ethnic group estimates vary from ward to ward. In other words, perhaps there are random slopes. It would be a little difficult to allow each ethnic group to have its own random slope so instead let us look at a white/non white split.
Table 4: Random Slopes Model
Estimate (MLM)
s.e.
Non White 0.75 0.10
95% Coverage interval -0.39 to 1.9
Between ward variance (intercept)
1.3 0.24
Between ward variance (slope) 0.33 0.15
Correlation: intercept & slope -0.45
Within ward variance 7.3 0.078There are good reasons to suppose that some of the ward variation in both intercept and slope can be explained by the proportion of white respondents in the ward.
Table 5: Random Slopes Model with Proportion White
Estimate (MLM)
s.e.
Non white 1.5 0.24
Proportion white 2.4 0.69
Non white*proportion white -1.0 0.34
Between ward variance (intercept)
1.3 0.23
Between ward variance (slope)
0.26 0.14
Correlation: intercept & slope
-0.51
Within ward variance 7.3 0.078
Conclusions
1. Multilevel modelling, carefully used, can throw light on complex processes and give us a better understanding of within and between group relations.
2. Our results show that there are differences between white and non white respondents in their perceptions of their neighbourhood. However, the differences between the two groups are more marked in wards with a low proportion of white respondents.