46
Identifying the cash- rich and the cash- poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Identifying the cash-rich and the cash-poor:

Lessons from the Census Rehearsal

Dr Paul Williamson

Department of Geography

ESRC Census Development Programme

Page 2: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

• Most requested addition to 2001 Census

INCOME…

Page 3: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

The 2001 Census Geography of income:

Page 4: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Other sources of data on income

• Benefits data

• Government surveys(e.g. GHS, LFS, FES, FRS, NES)

• Commercially-held data[Postcode sector and postcode unit estimates]

• The Census Rehearsal (1999)

Page 5: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Objectives

Evaluation of:• Extant methods for small-area income

estimation

• New approaches

• Utility of non-census information(e.g. council tax; house price; benefits data)

[ • Methods of imputing income band means ]

Page 6: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Definition of ‘income’

• Income Wealth

• Gross or net income?

• Pre or post housing costs?

• Adult or Household?

• Household?– Total– Equivalised

[Per capita / OECD / McClements]

Page 7: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Surrogates

• Univariate– % unemployed– % 2+ car households– % residents in Social Classes I + II– % owner-occupation

Page 8: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

• Multivariate (deprivation indices)– Carstairs – Townsend– Breadline– DLTR Index of Multiple Deprivation 2000– Green (Wealth)[owning 2+ cars; NS-SEC I or II; High qualifications]

Page 9: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

• Geodemographic– SuperProfiles– MOSAIC– GB Profiles

Page 10: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

• Model

Individual income– Dale (SOC2000; Economic activity; age; sex;

Region]– Lee (SOC2000; Economic activity]– Regression (individual and/or ecological)

Household income– Regression (household and/or ecological)– Bramley & Smart (H/h comp.; earners; tenure;

area level deprivation)

Page 11: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

The 1999 Census RehearsalKey features• full census questionnaire

+ INCOME• Large achieved sample

• Spatially contiguous

– c. 65,000 households– c. 140,000 individuals

Page 12: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Clustered sampling

strategy:– 7 part districts

[Excluding NI] – 38 wards– 650 EDs

Page 13: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

• non-response rate– overall (~ 50%)– income (~15%)– other variables (5-20%)– full responses for ~ 55 % of achieved sample [individuals and households]

• non-response bias

Potential problems

Page 14: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Income No dataBand All missing£0 20.8 16.9<£60 13.2 11.0£60-£119 20.5 18.7£120-£199 15.5 16.1£200-£299 13.3 15.9£300-£479 11.3 14.6£480+ 5.5 6.8Total 100.0 100.0N 125138 67283

Page 15: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Social Class No data(1991) All missingNone 28.2 25.4I 4.0 4.9II 18.9 21.4III(N) 17.5 19.3III(M) 11.5 10.8IV 14.7 13.7V 5.0 4.3Army 0.2 0.2Total 100.0 100.0N 117010 67283

Page 16: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Correlation coefficientIndicators (calculated for 1991 Enumeration districts) Original IdealTownsend index 0.82 0.79% households with

No car 0.89 0.86 2+ cars 0.87 0.83% households

Owner-occupied 0.90 0.87 Social rented 0.94 0.92 Detached 0.98 0.97 Flats 0.92 0.92% of economically active

Unemployed 0.57 0.55 Social Class I+II 0.58 0.56

Rehearsal sub-set

Page 17: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

• Banding of income question

What is your total current gross income from all sources?

Per week or Per year (approximately)

Nil _ NilLess than £60 _ Less than £3,000£60 to £119 _ £3,000 to £5,999£120 to £199 _ £6,000 to £9,999£200 to £299 _ £10,000 to £14,999£300 to £479 _ £15,000 to £24,999£480 or more _ £25,000 or more

Page 18: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

– Only 10% of adults in top band

– but problem compounded when individual incomes aggregated to estimate household income

– band mid-point band mean– value of band means area sensitive?

Page 19: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Income Nationalband (£) Average Band A Band H

Income band mean0 0 0 01-60 34 35 2761-120 91 93 86121-200 156 155 156201-300 245 241 242301-480 375 364 391481+ 765 652 1353

Council Tax

Source: FRS 1998/9 (Crown Copyright)

Page 20: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Digression: modelling income band means

Alternative modelling strategies include:

• National mean

• Sub-group mean (e.g. by council tax band)

• Statistical distributions (log-normal; pareto)

• New variant of log-normal approach with addition of modelled median etc.

Page 21: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Results

• For all bands sub-group mean best– if possible

• For closed-bands, national mean is next best

• For open (top) band, new proposed log-normal approach is best, particularly where there is evidence of strong spatial clustering

Page 22: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

– At what scale does income vary most?

• MAUP– 1991 vs 1998/9 boundaries– zones with <10 households or 25 residents

excluded from analysis

• SOC 2000 / NS-SEC– Lack of alternative SOC2000 coded data– Therefore have to use Census Rehearsal data– Use partitioned data to avoid unduly

advantaging SOC2000 based approaches

• Spatial scale

Page 23: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Results

Page 24: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Census Rehearsal Income Distribution

0

5

10

15

20

25

30

Nil <3 3-5 6-9 10-14 15-24 25+

Annual Gross Income (£ 000s)

Page 25: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

• At ward level the % household reps. in top income-band averaged 9.1%– but ranged from 2.8% to 21.6%

• 89% of EDs contained one or more household reps. in top income-band– i.e. in top income-decile of the population

Heterogeneity rules OK!

Page 26: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Income distribution of household representative (Person 1 on Census Rehearsal form)

All EDs

0

4000

8000

12000

16000

EDs in lowestincome quintile

0

1000

2000

3000

4000

EDs in secondincome quintile

0

1000

2000

3000

4000

EDs in middleincome quintile

0

1000

2000

3000

4000

EDs in fourthincome quintile

0

1000

2000

3000

Nil <3 3-5 6-9 10-14 15-24 25+

Income bracket (£000 p.a.)

EDs in topincome quintile

0

1000

2000

3000

4000

5000

Nil <3 3-5 6-9 10-14 15-24 25+

Income bracket (£000 p.a.)

Page 27: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Missing data

• Missing data have minimal impact on results– From ‘Raw’ to ‘Ideal’ data, most correlations

change by <0.02– Very few values change by >0.05– Exception is NS-SEC 8 [by definition!]– Correlations lower for ‘Ideal’ than ‘Raw’

• Surrogates calculated direct from Rehearsal– circumvents data response bias?

Page 28: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Scale

• Higher correlations at higher geographies

• District effect small but significant– BUT none of districts in SE England

Overfitting• No significant impact

Page 29: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

MAUP

• Correlations vary by up to 0.1 between alternative boundaries at same spatial scale

BUT

• No detectable effect on rankings of surrogate income measures

Page 30: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Adult income (r2) Surrogate

Ward

ED

Post-code

Univariate NS-SEC 1+II 0.81 0.81 0.64

Multivariate Townsend 0.36 0.46 0.38 Green (wealth) 0.57 0.55 0.50

Geodemographic PCA_96 Na 0.82 0.69 Voas 0.83 0.59 0.48

Model Dale 0.91 0.89 0.90 Lee 0.90 0.87 0.88 Voas (individual) 0.91 0.80 0.83 [See final slide for definition of ‘surrogates’]

Page 31: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Caveats• ‘Best’ performing surrogates in danger of

over-fitting?– For Dale, Lee and Voas mean occupational

income calculated directly from Census Rehearsal dataset (no other SOC2000 sources available at time of analysis)

BUT– No significant difference if SOC minor or unit

codes used– No significant difference if data partitioned

Page 32: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Household income (r2) Surrogate

Ward

ED

Post-code

Univariate NS-SEC 1+II 0.82 0.81 0.64

Multivariate Townsend 0.48 0.46 0.44 Green (wealth) 0.61 0.50 0.56

Geodemographic PCA_96 na 0.81 0.67 Voas 0.81 0.60 0.48

Model Dale 0.90 0.85 0.86 Lee 0.87 0.83 0.83 Voas (household) 0.76 0.74 0.74 [See final slide for definition of ‘surrogates’]

Page 33: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Accuracy

• For many purposes relative, rather than absolute, accuracy is most important

ranking

Page 34: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

a) NS-SEC based income surrogate [NSSEC12]

0%

25%

50%

75%

100%

0 100 200 300 400 500 600

Observed mean individual income (£ week)

% o

f ec

onom

ical

ly a

ctiv

e in

NS

SE

C 1

+2

Page 35: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

b) Regression based estimate [VOASIND]

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Observed mean individual income (£ week)

Pre

dict

ed m

ean

indi

vidu

al in

com

e (£

wee

k)

Page 36: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

c) Sub-group mean based estimate [LEINCM]

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Observed mean individual income (£ week)

Pre

dic

ted

me

an

ind

ivid

ua

l in

com

e (

£ w

ee

k)

Page 37: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Surrogate/Estimate % NSSEC

1+2 Individual Regression

Sub-group mean

Ecological Regression

[NSSEC12] [VOASIND] [LEEINCM] % ranked in same decile as income

Overall 36 42 50 46

Within ± 1 decile 82 84 89 92

Page 38: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

• < 1% of unexplained spatial variation in income attributable to area level effects

• House price has no significant impact– could be due to data problems

• Council tax band has small but significant effect [for areas of enumeration district size and below]

• Lack of utility counter-intuitive?– current value purchase price– purchase income current income

Other data sources

Page 39: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Conclusions (I)

• Best approaches capture 80-90% of spatial variation in income, even for smallest spatial units

• But considerable within-area heterogeneity

• Best approaches are regression or sub-group mean based

• Conventional deprivation indices a poor second to % social class / NS-SEC I+II

Page 40: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Conclusions (II)

• Geodemographic classifications at best perform as well as % NS-SEC I+II, and perform best for areas of ward size and above

• Qualified support for use of statistical distributions in modelling top income band means

Page 41: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

ImplicationsMoral for marketers:

• Target people, not places

Moral for policy makers:

• Deprivation indices not the best proxy for income

• ONS ward income estimates (based on ecological regression) likely to perform well

Page 42: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Longer term • Consider external correlates

(e.g. IMD 2000; benefits data)

• Lobby for Census Office to create small-area income estimate– by imputing income on Census microdata– include non-census information (?)

Page 43: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Acknowledgements

• House price data were taken from the Experían Limited Postal Sector Data, ESRC/JISC Agreement.

• Grateful thanks are due to the Census Custodians of England, Wales and Scotland for granting permission to access the Census Rehearsal dataset.

• A debt of gratitude is also owed to a number at the Office for National Statistics, in particular Keith Whitfield and Philip Clarke.

• Finally, thanks are due to David Voas for undertaking some of the preparatory work for this project.

• All analyses and conclusions remain my sole responsibility.

Page 44: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Definitions (I)

• NS-SEC I+II: % persons aged 16-74 in NS-SEC I or II• Townsend: Multiple deprivation indicator based on % economically

active unemployed; % overcrowded households; % households with no car and % of households not owner occupied

• Green (Wealth): Affluence indicator based on % households with 2+ cars; % persons aged 16-74 in NS-SEC I and % adults with high educational qualifications

• PCA_96: Geodemographic classification based on principal components analysis of 20 normalised census variables, individuals in each of 96 area types assumed to have mean income of all persons in area type

• Voas: Alternative geodemographic classification, in which five census variables are divided into above or below median, one variable into thirds; with all cross-tabulated to give a total of 96 discrete area types

Page 45: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Definitions (II)

• Dale: Income imputed given mean income for population sub-group defined by sex, SOC 2000 minor group, economic activity (missing; employed full-time; employed part-time; self-employed; other), age (missing; 0-15; 16-19; 20-29; 30-49; 50+) [Maximum of 4860 valid sub-groups]

• Lee: Income imputed given mean income for population sub-group defined by SOC 2000 minor group, economic activity (child; not applicable; employed full-time; employed part-time; self-employed; unemployed; retired; other inactive) [maximum of 649 valid sub-groups]

Page 46: Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme

Definitions (III)

• Voas (individual): Regression model for adult income (children assumed to have 0 income); INCOME0.5 predicted given: mean income by SOC2000 unit; mean income by Industry category, age, age2, residents, residents2, rooms and cars plus dummy variables for sex, white, full-time student, married, Single/Widowed/Divorced, Long-term ill, No qualifications, GCSE or equivalent, A levels or equivalent, Undergraduate degree or equivalent, employed full-time, employed part-time, self-employed, unemployed, retired, permanently sick, other economically inactive excluding pensioners and students, Semi-detached, terrace, flat, caravan, privately rented, social rented, employed manager or supervisor and district of residence

• Voas (household): Regression model for total household income; HHINC0.5 predicted given same set of predictors as for Voas (individual), but based only upon head of household’s characteristics