35
Visualising large spatial databases and Building bespoke geodemographics Muhammad Adnan University College London

Visualising large spatial databases and Building bespoke geodemographics

Embed Size (px)

DESCRIPTION

This presentation outlines my work at the Local Futures and the PhD research. I have been working on a combined project between Local Futures and UCL and the presentation starts by giving an introduction of the project. My PhD investigated the creation of Real-time bespoke geodemographics, and this presentation presents the work i did during the PhD journey.

Citation preview

Page 1: Visualising large spatial databases and Building bespoke geodemographics

Visualising large spatial databases and Building bespoke geodemographics

Muhammad Adnan

University College London

Page 2: Visualising large spatial databases and Building bespoke geodemographics

About Me

• 2007 – 2009 • Worldnames (http://worldnames.publicprofiler.org)• Onomap (http://www.onomap.org)

• Nov. 2009 – Oct. 2011 (A KTP between UCL and Local Futures Group)

• LFG is a research and strategy consultancy

• Aim of the KTP was to device a better visualisation of the data

Page 3: Visualising large spatial databases and Building bespoke geodemographics

Data

• A database of 1600 indicators around 130 data sources

• Data sources cover social, economic, and environmental change in the UK

• The data is held at 8 spatial levels• Region, Sub region, District 2009, Nuts 3,

District (pre 2009), Ward, LSOA, OA

Page 4: Visualising large spatial databases and Building bespoke geodemographics

Visualisation of the data

• A ‘total place maps’ solution using different technologies (Video)

Base Layer Data

On the fly rendering of tiles

Programming in C# and ASP.NET

Data retrieval from database

Page 5: Visualising large spatial databases and Building bespoke geodemographics

Building Bespoke Geodemographics

Page 6: Visualising large spatial databases and Building bespoke geodemographics

Geodemographics

• “Analysis of people by where they live” or “locality marketing”

(Sleight, 1993:3)

HomeAddressPerson

Area

Page 7: Visualising large spatial databases and Building bespoke geodemographics

How a classification is created ?

Data – Census + Other

Experian: Mosaic

• Census data: 54%• Non-Census data: 46%

CACI: Accorn

• Census data: 30%• Non-Census data: 70%

ONS Output Area Classification

• Census data: 100%

Page 8: Visualising large spatial databases and Building bespoke geodemographics

How a classification is created ?

Segmentations are created by cluster analysis

Area V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 ...

Area1

Area2

Area3

Area4

Area5

Area6

Area7

Area8

...

Inputs…

Page 9: Visualising large spatial databases and Building bespoke geodemographics

How a classification is created ?

Variable 1

Variable 2

Cluster 1Cluster 2

Cluster 3

Cluster Analysis

K-means is used for clustering

Page 10: Visualising large spatial databases and Building bespoke geodemographics

How a classification is created ?

Output of Cluster Analysis

Area Cluster

Area1 1

Area2 1

Area3 2

Area4 1

Area5 3

Area6 3

Area7 3

Area8 2

...

Page 11: Visualising large spatial databases and Building bespoke geodemographics

Research Issues

• Optimisation of clustering algorithms• K-means• PAM (Partitioning Around Mediods)

• Open Tools ? • OACoder• GeodemCreator

• Bespoke local area classifications• UK’s open data initiative• ONS Neighbour Statistics API• UK’s police API• Barclays cycle hire API

Page 12: Visualising large spatial databases and Building bespoke geodemographics

Optimisation of Clustering Algorithms (K-Means)

Page 13: Visualising large spatial databases and Building bespoke geodemographics

K-means optimisation

0.46

0.47

0.48

0.49

0.5

0.51

0.52

0.53

0.54

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 121 127 133 139 145

Run

RS

Q

n

x

n

yyxV z

1 1

2

)(

Page 14: Visualising large spatial databases and Building bespoke geodemographics

K-means (100 runs of k-means on OAC data set for k=4)

Page 15: Visualising large spatial databases and Building bespoke geodemographics

K-means (100 runs of k-means on OAC data set for k=4)

Run k-means multiple times (10,000 times) (Singleton & Longley, 2009)

Page 16: Visualising large spatial databases and Building bespoke geodemographics

CUDA & GPUs (Graphical Processing Units)

• Nvidia graphics cards have GPUs (Graphical Processing Units)• Can be used for parallel processing• Nvidia GeForce GT 420M (96 GPUs)• Latest Telsa graphics cards have 1000 GPUs

• CUDA (Computer United Device Architecture)• Parallel computing architecture• C and C++ can be used for programming

• A parallel implementation of k-means (Adnan & Longley, 2011)

Page 17: Visualising large spatial databases and Building bespoke geodemographics

K-means vs Parallel K-means

Could be useful for building geodemographics quickly in online environments

Page 18: Visualising large spatial databases and Building bespoke geodemographics

Open Tools for Geodemographics

Page 19: Visualising large spatial databases and Building bespoke geodemographics

Open Tools - OACoder

• Developed with Alex Singleton

• Assigns UK’s postcodes their corresponding OAC groups

• Download from

http://areaclassification.org.uk/

Page 20: Visualising large spatial databases and Building bespoke geodemographics

Open Tools – ‘GeodemCreator’

• Allows users to create their local area Geodemographic Classifications• Provides data available in the public domain (but users can use ancillary

data sources)

Page 21: Visualising large spatial databases and Building bespoke geodemographics

Open Tools – ‘GeodemCreator’

• Allows users to create their local area Geodemographic Classifications• Provides data available in the public domain (but users can use ancillary

data sources)

Will be available to download from http://publicprofiler.org

Page 22: Visualising large spatial databases and Building bespoke geodemographics

Spatially Weighted Geodemographics

Page 23: Visualising large spatial databases and Building bespoke geodemographics

Spatially Weighted Geodemographcis

• Geodemographic classifications do not account for spatial weights in the results

• A spatially weighted Geodemographic classification introduces spatial weights in addition to the socio-economic characteristics

• Tobler’s first law of geography• “Everything is related to everything else, but near things are more

related than distant things”

Page 24: Visualising large spatial databases and Building bespoke geodemographics

Spatially weighted Geodemographics

Step - 1: Construct a Neighbours Graph

Page 25: Visualising large spatial databases and Building bespoke geodemographics

Spatially weighted Geodemographics

Step - 1: Construct a Neighbours Graph

Page 26: Visualising large spatial databases and Building bespoke geodemographics

Spatially weighted Geodemographics

Step - 2: Apply Moran’s I to the data set

• It is a measure of spatial auto correlation

• Values of spatial auto-correlation range from -1 to 1

• A negative value represents a negative spatial auto-correlation

Page 27: Visualising large spatial databases and Building bespoke geodemographics

Spatially weighted Geodemographics

Step - 2: Apply Moran’s I to the data set

Page 28: Visualising large spatial databases and Building bespoke geodemographics

Spatially weighted Geodemographics

Area V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 VM

Area1

Area2

Area3

Area4

Area5

Area6

Area7

Area8

...

Moran’s I Result

Step - 3: Apply K-means

Page 29: Visualising large spatial databases and Building bespoke geodemographics

Spatially weighted Geodemographics

Result

Page 30: Visualising large spatial databases and Building bespoke geodemographics

• Open methods and tools for building geodemographics are important

• A testing of Spatial Weighted Geodemographics technique• On lower spatial levels

• I will be working on the new research grant of Paul Longley on “Uncertainty of Identity”• How behaviours of people in the real-world could be mapped with their

behaviours in the virtual world ?

• Could marketing strategies be devised for targeting online social networks and communities ?

Conclusion and future work

Page 31: Visualising large spatial databases and Building bespoke geodemographics

A quick illustration

http://worldnames.publicprofiler.org

• We have a record of 100,000 ‘IP Address’ entries for the last 6 months

Page 32: Visualising large spatial databases and Building bespoke geodemographics

A quick illustration

http://quova.com

An API to convert “IP addresses” to their corresponding latitude / longitude values

Page 33: Visualising large spatial databases and Building bespoke geodemographics

A quick illustration

Page 34: Visualising large spatial databases and Building bespoke geodemographics

A quick illustration

Page 35: Visualising large spatial databases and Building bespoke geodemographics

Any Questions ?

Adnan, M., Longley, P.A., Singleton, A.D., Brunsdon, C. (2010) Towards Real-Time Geodemographics: Clustering Algorithm Performance for Large Multidimensional Spatial Databases. Transactions in GIS, 14(3), 283 – 297. Hall, J.D., Hart, J.C. (2004). GPU acceleration of iterative clustering. In: ACM Workshop on General-Purpose Computing on Graphics Processors, p C-6Harris, R., Sleight, P., Webber, R. (2005). Geodemographics, GIS and Neighbourhood Targeting. Wiley, London. Reynolds, A.P., Richards, G., Rayward-Smith, V.J. (2004) The Application of K-Medoids and PAM to the Clustering of Rules. Lecture Notes in Computer Science. 3177/2004, 173-178. Singleton, A.D., Longley, P.A (2008). Creating open source geodemographic classifications for Higher Education applications. Papers in Regional Science, 88(3), 643-666. Takizawa, H., Kobayashi, H. (2006). Hierarchical parallel processing of large scale data clustering on a pc cluster with GPU co-processing. J. Supercomput.,36(3):219–234. Vickers, D.W. and Rees, P.H. (2007). Creating the National Statistics 2001 Output Area Classification. Journal of the Royal Statistical Society, Series A. 170(2), 379-403.

References