Visualising large spatial databases and Building bespoke geodemographics

Visualising large spatial databases and Building bespoke geodemographics

Muhammad Adnan

University College London

About Me

• 2007 – 2009 • Worldnames (http://worldnames.publicprofiler.org)• Onomap (http://www.onomap.org)

• Nov. 2009 – Oct. 2011 (A KTP between UCL and Local Futures Group)

• LFG is a research and strategy consultancy

• Aim of the KTP was to device a better visualisation of the data

http://worldnames.publicprofiler.org/

http://www.onomap.org/

Data

• A database of 1600 indicators around 130 data sources

• Data sources cover social, economic, and environmental change in the UK

• The data is held at 8 spatial levels• Region, Sub region, District 2009, Nuts 3,

District (pre 2009), Ward, LSOA, OA

Visualisation of the data

• A ‘total place maps’ solution using different technologies (Video)

Base Layer Data

On the fly rendering of tiles

Programming in C# and ASP.NET

Data retrieval from database

Building Bespoke Geodemographics

Geodemographics

• “Analysis of people by where they live” or “locality marketing”

(Sleight, 1993:3)

HomeAddressPerson

Area

How a classification is created ?

Data – Census + Other

Experian: Mosaic

• Census data: 54%• Non-Census data: 46%

CACI: Accorn

• Census data: 30%• Non-Census data: 70%

ONS Output Area Classification

• Census data: 100%


Segmentations are created by cluster analysis

Area V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 ...

Area1

Area2

Area3

Area4

Area5

Area6

Area7

Area8

...

Inputs…


Variable 1

Variable 2

Cluster 1Cluster 2

Cluster 3

Cluster Analysis

K-means is used for clustering


Output of Cluster Analysis

Area Cluster

Area1 1

Area2 1

Area3 2

Area4 1

Area5 3

Area6 3

Area7 3

Area8 2

...

Research Issues

• Optimisation of clustering algorithms• K-means• PAM (Partitioning Around Mediods)

• Open Tools ? • OACoder• GeodemCreator

• Bespoke local area classifications• UK’s open data initiative• ONS Neighbour Statistics API• UK’s police API• Barclays cycle hire API

Optimisation of Clustering Algorithms (K-Means)

K-means optimisation

0.46

0.47

0.48

0.49

0.5

0.51

0.52

0.53

0.54

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 121 127 133 139 145

Run

RS

Q

n

x

n

yyxV z

1 1

2

)(

K-means (100 runs of k-means on OAC data set for k=4)

K-means (100 runs of k-means on OAC data set for k=4)

Run k-means multiple times (10,000 times) (Singleton & Longley, 2009)

CUDA & GPUs (Graphical Processing Units)

• Nvidia graphics cards have GPUs (Graphical Processing Units)• Can be used for parallel processing• Nvidia GeForce GT 420M (96 GPUs)• Latest Telsa graphics cards have 1000 GPUs

• CUDA (Computer United Device Architecture)• Parallel computing architecture• C and C++ can be used for programming

• A parallel implementation of k-means (Adnan & Longley, 2011)

K-means vs Parallel K-means

Could be useful for building geodemographics quickly in online environments

Open Tools for Geodemographics

Open Tools - OACoder

• Developed with Alex Singleton

• Assigns UK’s postcodes their corresponding OAC groups

• Download from

http://areaclassification.org.uk/

http://areaclassification.org.uk/

Open Tools – ‘GeodemCreator’

• Allows users to create their local area Geodemographic Classifications• Provides data available in the public domain (but users can use ancillary

data sources)

Open Tools – ‘GeodemCreator’

• Allows users to create their local area Geodemographic Classifications• Provides data available in the public domain (but users can use ancillary

data sources)

Will be available to download from http://publicprofiler.org

http://publicprofiler.org/

Spatially Weighted Geodemographics

Spatially Weighted Geodemographcis

• Geodemographic classifications do not account for spatial weights in the results

• A spatially weighted Geodemographic classification introduces spatial weights in addition to the socio-economic characteristics

• Tobler’s first law of geography• “Everything is related to everything else, but near things are more

related than distant things”

Spatially weighted Geodemographics

Step - 1: Construct a Neighbours Graph


Step - 1: Construct a Neighbours Graph


Step - 2: Apply Moran’s I to the data set

• It is a measure of spatial auto correlation

• Values of spatial auto-correlation range from -1 to 1

• A negative value represents a negative spatial auto-correlation


Step - 2: Apply Moran’s I to the data set


Area V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 VM

Area1

Area2

Area3

Area4

Area5

Area6

Area7

Area8

...

Moran’s I Result

Step - 3: Apply K-means


Result

• Open methods and tools for building geodemographics are important

• A testing of Spatial Weighted Geodemographics technique• On lower spatial levels

• I will be working on the new research grant of Paul Longley on “Uncertainty of Identity”• How behaviours of people in the real-world could be mapped with their

behaviours in the virtual world ?

• Could marketing strategies be devised for targeting online social networks and communities ?

Conclusion and future work

A quick illustration

http://worldnames.publicprofiler.org

• We have a record of 100,000 ‘IP Address’ entries for the last 6 months



http://quova.com

An API to convert “IP addresses” to their corresponding latitude / longitude values




Any Questions ?

Adnan, M., Longley, P.A., Singleton, A.D., Brunsdon, C. (2010) Towards Real-Time Geodemographics: Clustering Algorithm Performance for Large Multidimensional Spatial Databases. Transactions in GIS, 14(3), 283 – 297. Hall, J.D., Hart, J.C. (2004). GPU acceleration of iterative clustering. In: ACM Workshop on General-Purpose Computing on Graphics Processors, p C-6Harris, R., Sleight, P., Webber, R. (2005). Geodemographics, GIS and Neighbourhood Targeting. Wiley, London. Reynolds, A.P., Richards, G., Rayward-Smith, V.J. (2004) The Application of K-Medoids and PAM to the Clustering of Rules. Lecture Notes in Computer Science. 3177/2004, 173-178. Singleton, A.D., Longley, P.A (2008). Creating open source geodemographic classifications for Higher Education applications. Papers in Regional Science, 88(3), 643-666. Takizawa, H., Kobayashi, H. (2006). Hierarchical parallel processing of large scale data clustering on a pc cluster with GPU co-processing. J. Supercomput.,36(3):219–234. Vickers, D.W. and Rees, P.H. (2007). Creating the National Statistics 2001 Output Area Classification. Journal of the Royal Statistical Society, Series A. 170(2), 379-403.

References

Design

Visualising large spatial databases and Building bespoke geodemographics