Upload
dylan-wiley
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Methods of interpolating data to create long-run time series
Ian Gregory (University of Portsmouth)
&
Paul Ell (Queen’s University, Belfast)
50 Ancient Counties Union/Registration Counties Admin. Counties S/M C
330 D’ricts
650 PL Unions/Registration Districts
1500 Local Govt. Districts
15,000 Parishes/Wards
No
Of U
nits
100,000 EDs 1801 1841 1881 1921 1961
“Minor” changes: Registration Districts (1840-1910): 400 Local Govt. Districts (1890s-1972): 4,000 Parishes (1876-1972): 20,000
Administrative Units in England and Wales from 1801
The Newport area, 1911
Creating a standard geography• Areal Weighting:
– Assumption – Variable y is homogeneously distributed across the source zones
– Using this:
– BUT: Very unrealistic assumption.
s
sst^
tA
yAy
Other sources of information (1)• 1. Dasymetric technique:
– There were 15,000 parishes as opposed to 600/1,500 districts
– Total population is available at this scale
– Assumptions:• The distribution of y follows the distribution of the total population
• Parish-level population is homogeneously distributed
– Problem: • Most districts in towns and cities consist of only one parish.
– 1911, 30% of pop lived in districts that consisted of only one parish
Other sources of information (2)• 2. Data from target districts as ancillary information:
– Can provide information on the distribution of source zone data
– EM algorithm is used
– E.g. • 1. Sub-divide target zones into rural and urban
• 2. Assume that rural and urban targets have the same population densities
• 3. Allocate y to targets using this assumption
• 4. Find the average population density of rural and urban target districts
• 5. Go back to stage three using the new population densities and repeat until the algorithm converges
– Can use y for the target districts or total population at parish level as ancillary information
– Relies on having relevant information for target districts
Other sources of information (3)• 3. Combined technique
– Brings together the dasymetric technique and the EM algorithm
– Makes use of all available information
– Tests all the assumptions
Choice of technique
Total Pop. Tot. Female Males 15-24 No Car Farmers
RMS Max RMS Max RMS Max RMS Max RMS Max
A Weight 26.3 143.3 27.2 149.1 26.3 130.6 52.3 359.4 4.4 28.5
Dasymetric 13.5 83.2 13.8 88.0 13.7 74.9 35.2 280.2 14.0 63.6
EM-District 7.5 28.8 6.4 28.7 6.8 37.8 11.0 55.7 5.4 28.5
EM-Parish 5.6 28.6 5.8 31.0 5.1 19.8 11.4 79.5 14.4 80.2
Comb-Dist 3.6 22.9 4.1 23.5 2.8 14.0 9.9 48.5 13.9 62.8
Comb-Par 3.4 17.8 3.4 16.3 3.9 23.8 6.5 38.3 16.9 73.7
Based on aggregating 1991 EDs to form pseudo-parishes and districts
Conclusions:• No one technique for all variables• Careful choice of technique reduces error significantly Using regression techniques can help determine which is most appropriate• Error will still be appear in the interpolated data
Predicting error• Possible techniques:
1. Space – where target zones consist of many large fragments of source zones they are error prone
2. Attribute – error is most prevalent when data have been allocated from urban zones to rural ones
3. Time – error will cause “unrealistic” changes in population
Using population change to locate error
Total Population of Water Orton, 1851-1951
0.0
200.0
400.0
600.0
800.0
1,000.0
1,200.0
1,400.0
1,600.0
1,800.0
2,000.0
1851 1861 1871 1881 1891 1901 1911 1921 1931 1951
Pop. Change in Water Orford, 1851-1951
-0.100
-0.050
0.000
0.050
0.100
0.150
0.200
0.250
0.300
0.350
0.400
1850
s
1860
s
1870
s
1880
s
1890
s
1900
s
1910
s
1920
s
1930
/40s
Water Orton – parish on the edge of Birmingham 1901-1951, Water Orton (1951: Pop. 1,841, area 2.3km2, pop. den 796 p/km2) 1861-1891, part of Aston: (1891: Pop. 250,000, area 57km2, pop. den 4,300p/km2) 1851, Water Orton: (1851, Pop. 190, area 2.6km2, pop. den 73 p/km2)
Pop. Change = (y2-y1)/(y2+y1)1851: Est. Pop: 182 Actual Pop: 190
Using population change to locate error Birmingham 1951: Pop. 1,100,000, area 210km2, pop. den. 5,235p/km2
1931: Pop. 1,000,000, area 187km2, pop. den. 5,367p/km2
1891: Pop. 246,000, area 12.2km2, pop. den. 20,123p/km2
1851: Pop. 919, area 0.94km2, pop. den. 977p/km2
Total Population of Birmingham, 1851-1951
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1851 1861 1871 1881 1891 1901 1911 1921 1931
Pop. Change in Birmingham, 1851-1951
0.000
0.100
0.200
0.300
0.400
0.500
0.600
Using population change to locate error Castle Bromwich – parish on the edge of Birmingham 1951, Castle Bromwich (1951: Pop. 4,356, area 4.7km2, pop. den 927p/km2) 1921-1931, part of Birmingham: (1931: Pop. 1,000,000, area 187km2, pop. den 5,367p/km2)
1861-1911, part of Aston: (1891: Pop. 250,000, area 57km2, pop. den 4,300p/km2) 1851, Castle Bromwich: (1851, Pop. 6426, area 18.7km2, pop. den 344p/km2)
Total Population of Castle Bromwich, 1851-1951
0
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
4,500
1851 1861 1871 1881 1891 1901 1911 1921 1931
Pop. Change in Castle Bromwich, 1851-1951
-1.000
-0.800
-0.600
-0.400
-0.200
0.000
0.200
0.400
0.600
0.800
1.000
Conclusions• Can interpolate data to create long-run time-series
• Choice of best technique will depend on nature of the variable– No “one size fits all” technique
• All techniques will create some error
• What to do about error:– Attempt to smooth it out
– Explicitly incorporate it into an analysis