Towards a Higher Education Profiler

Preview:

DESCRIPTION

CASA, London - 11/2/09

Citation preview

Towards a Higher Education Profiler

Alex Singleton, Paul Longley, Alan Wilson

Introduction

• Views on the transition from PhD to Post Doctorate– ESRC First Grant Scheme

• Some new data sources• Some new insights• Beta Educational Profiler• From description to prediction

My transition from an unconventional PhD an unconventional Post Doc

• A Spatio-Temporal Analysis of Access to Higher Education (aprox. 1 year ago)– Three Themes

• Momentum• Unfinished Business• Future Directions

– Unconventional• PhD – Spent 2 years in Cheltenham at UCAS – KTP• Post-Doc – It isn’t really one

Momentum

• Keep it up – you are used to writing lots– Use this to write papers, grant / book proposals

• Disillusion with PhD topic– Good to put it down for a while and do something else

• 2 Papers on E-Society– Online validation– Digital deprivation V material deprivation

• 2 Papers on Neo-Geography

Unfinished Business

• There will be things in your thesis which you wanted to cover but didn’t have time / room– Methodological

• Alternate algorithms to k-means in creation geodemographics• Geographic representations of cluster instability – related to initial

seed locations• Alternate optimisation procedures –measures of spatial rather than

social similarity

– Domain Specific• Course Clusters

– Overarching Themes• Future of area classification

– Real Time Geodemographics

Future Directions• Like it or not, your PhD is what you are known for!

– Chart hits matter:• Brunsdon = GWR• Dorling = Catograms

– Unless you start again, you will alwaysreturn to you PhD themes

• It is what you know most about!

– Research is driven by funding• Funding for PhD students• Funding for research grants• Building a research team enables

– You to do more– Efforts shift from “doing” to “guiding” / “organising”

» Don’t drown in this – you still need to keep “doing”

Doctoral

Post - Doctoral

AcademicsRA

ESRC First Grants

• Scheme: Enables early career researchers to apply for a small grant – essentially 1 researcher @ FEC funding + expenses

• Very competitive– Over 200 applications last year – ~13% success

• Spatial interaction modelling, geodemographics and widening participation in the Higher Education sector?

Start Oct. 2008

End Sept. 2009

Feb2009

Time PlanApproximately ¼ Way Through

Stage 1: Data Acquisition and Insight

Stage 2: Higher Education Profiler

Stage 3: Higher Education Modeller

Pet Project....(Facebook)

The HE Problem: Acceptances 1962 - 2003

Occupational Group: 1968-1978

Social Class: 1980 - 2001

Socio-Economic Group: 2002 - 2007

End point of my thesis

Investigated a variety of different aspects of HE participation, from a geodemographics / geographers perspective

Distance travelled to HE by Mosaic

Mosaic Profile– UCAS Acceptances – 2004 (Base All Adults)

HE Acceptances

2006 PLASC Data - DCSF

School Profiles – Selective Schools

2006 PLASC Data - DCSF

GCSE Grades

School Catchment Areas

Stage 1: Data Acquisition and Insight

• Data Sources– University and Colleges Admissions Service (UCAS)– Higher Education Statistics Agency (HESA)– Department for Children Schools and Families (DCSF)

• A-Level & Equiv (Key Stage 5)• GCSE & Equiv (Key Stage 4)

– DCSF & HESA now link at individual level• Map a student through time!

– Previously – had to consider each key stage separately

Caveat – These data only arrived last week!

Insight 1: Entry Rates (DCFS & HESA)

DCSFKey Stage 5

HESA (0)

HESA (+1)

HESA (+2)

2004 ~50%

~20%

~5%

Direct Entry

Gap Year

Gap Years

National Targets = 18-30 Age Range

Insight 2: Course Choice Behaviour (UCAS)

Applicant

C1 C2 C3 C4 C5 C6

Percentages – Row ∑100%

Thus, for applicants with at least one choice in “A1 - Pre-Clinical Medicine”, 76.9% of applications from other applicants are within the same JACS Line.

A1 is quite homogeneous!

Extract of the full table

Insight 3: Model of Private Characteristics

State / KS5 FE Colleges Private

Demographics – inc.spatial reference x

Higher Education (HESA)

Insight 4: Participation Flows (based on HESA data)

Stage 2: Higher Education Profiler

• Integrate insights from my thesis• UK HE Atlas• Platform for decision support for a range of

stakeholders in HE

Map Generation – Dependent Solution

Issues – Dependent on Google API, Limited to Google Cartography, potential issues with data ownership

Original Site (Google)

Map Generation – Independent Solution

Shuttle Radar Topography Mission (SRTM)~100m resolution

OpenStreetMap (via Cloudmade)

great_britain.osm (.xml) PostgreSQL

DB with PostGIS

osm2pgsql

NASA’s

SRTM DEMs

GDAL ToolsUKBorders

English MSOAs and Postcodes

DCSF. gov.uk

National Pupil Database

mySQL DB

PublicProfilerSchools Atlas

DCSF.gov.uk EduBase and

IDACI

OpenLayers (.js)

HEFCE.ac.uk POLAR

OAC

Mapnik

Shapefiles

OSM Tiling Script (.py)

Stylesheets(.xml)

ArcGIS

ColorBrewer

PerryGeo Hillshading

AJAX Requests

OSM Tiles

PVCs (.kml)

OAC

Hawths Tools

ArcGIS

Google Chart API

Tiles

Chart Cache

Architecture Diagram

great_britain.osm (.xml) PostgreSQL

DB with PostGIS

osm2pgsql

Mapnik

OSM Tiling Script (.py)

Stylesheets(.xml)

OpenStreetMap (via Cloudmade)

Shapefiles

Tiles

PublicProfilerSchools Atlas

NetworkLayer

great_britain.osm (.xml) PostgreSQL

DB with PostGIS

osm2pgsql

NASA’s

SRTM DEMs

GDAL Tools

Mapnik

OSM Tiling Script (.py)

Stylesheets(.xml)

PerryGeo Hillshading

Shapefiles

HillshadingLayer

PublicProfilerSchools Atlas

Tiles

OpenStreetMap (via Cloudmade)

great_britain.osm (.xml) PostgreSQL

DB with PostGIS

osm2pgsql

NASA’s

SRTM DEMs

GDAL ToolsUKBorders

English MSOAs and Postcodes

DCSF. gov.uk

National Pupil Database

OAC

Mapnik

Shapefiles

OSM Tiling Script (.py)

Stylesheets(.xml)

ArcGIS

ColorBrewer

PerryGeo Hillshading

PublicProfilerSchools Atlas

ChoroplethLayers

Tiles

OpenStreetMap (via Cloudmade)

great_britain.osm (.xml) PostgreSQL

DB with PostGIS

osm2pgsql

NASA’s

SRTM DEMs

GDAL ToolsUKBorders

English MSOAs and Postcodes

DCSF. gov.uk

National Pupil Database

mySQL DB

DCSF.gov.uk EduBase and

IDACI

HEFCE.ac.uk POLAR

OAC

Mapnik

Shapefiles

OSM Tiling Script (.py)

Stylesheets(.xml)

ArcGIS

ColorBrewer

PerryGeo Hillshading

OAC

SchoolStatistics

PublicProfilerSchools Atlas

Tiles

OpenStreetMap (via Cloudmade)

great_britain.osm (.xml) PostgreSQL

DB with PostGIS

osm2pgsql

NASA’s

SRTM DEMs

GDAL ToolsUKBorders

English MSOAs and Postcodes

DCSF. gov.uk

National Pupil Database

mySQL DB

DCSF.gov.uk EduBase and

IDACI

HEFCE.ac.uk POLAR

OAC

Mapnik

Shapefiles

OSM Tiling Script (.py)

Stylesheets(.xml)

ArcGIS

ColorBrewer

PerryGeo Hillshading

PVCs (.kml)

OAC

Hawths Tools

ArcGIS

School Catchment

Areas

PublicProfilerSchools Atlas

Tiles

OpenStreetMap (via Cloudmade)

great_britain.osm (.xml) PostgreSQL

DB with PostGIS

osm2pgsql

NASA’s

SRTM DEMs

GDAL ToolsUKBorders

English MSOAs and Postcodes

DCSF. gov.uk

National Pupil Database

mySQL DB

DCSF.gov.uk EduBase and

IDACI

HEFCE.ac.uk POLAR

OAC

Mapnik

Shapefiles

OSM Tiling Script (.py)

Stylesheets(.xml)

ArcGIS

ColorBrewer

PerryGeo Hillshading

PVCs (.kml)

OAC

Hawths Tools

ArcGIS

ServingTiles

PublicProfilerSchools Atlas

Tiles

OpenLayers (.js)

OpenStreetMap (via Cloudmade)

great_britain.osm (.xml) PostgreSQL

DB with PostGIS

osm2pgsql

NASA’s

SRTM DEMs

GDAL ToolsUKBorders

English MSOAs and Postcodes

DCSF. gov.uk

National Pupil Database

mySQL DB

DCSF.gov.uk EduBase and

IDACI

HEFCE.ac.uk POLAR

OAC

Mapnik

Shapefiles

OSM Tiling Script (.py)

Stylesheets(.xml)

ArcGIS

ColorBrewer

PerryGeo Hillshading

PVCs (.kml)

OAC

Hawths Tools

ArcGIS

Other TileSources

PublicProfilerSchools Atlas

Tiles

OSM Tiles

OpenLayers (.js)

OpenStreetMap (via Cloudmade)

great_britain.osm (.xml) PostgreSQL

DB with PostGIS

osm2pgsql

NASA’s

SRTM DEMs

GDAL ToolsUKBorders

English MSOAs and Postcodes

DCSF. gov.uk

National Pupil Database

mySQL DB

DCSF.gov.uk EduBase and

IDACI

HEFCE.ac.uk POLAR

OAC

Mapnik

Shapefiles

OSM Tiling Script (.py)

Stylesheets(.xml)

ArcGIS

ColorBrewer

PerryGeo Hillshading

AJAX Requests

PVCs (.kml)

OAC

Hawths Tools

ArcGIS

Google Chart API

Chart Cache

PublicProfilerSchools Atlas

Tiles

OSM Tiles

OpenLayers (.js)

Showing the Statistics

OpenStreetMap (via Cloudmade)

great_britain.osm (.xml) PostgreSQL

DB with PostGIS

osm2pgsql

NASA’s

SRTM DEMs

GDAL ToolsUKBorders

English MSOAs and Postcodes

DCSF. gov.uk

National Pupil Database

mySQL DB

DCSF.gov.uk EduBase and

IDACI

HEFCE.ac.uk POLAR

OAC

Mapnik

Shapefiles

OSM Tiling Script (.py)

Stylesheets(.xml)

ArcGIS

ColorBrewer

PerryGeo Hillshading

AJAX Requests

PVCs (.kml)

OAC

Hawths Tools

ArcGIS

Google Chart API

Chart Cache

OSM

Tiles

Tiles

OpenLayers (.js)

PublicProfilerSchools Atlas

Showing Catchments

OpenStreetMap (via Cloudmade)

great_britain.osm (.xml) PostgreSQL

DB with PostGIS

osm2pgsql

NASA’s

SRTM DEMs

GDAL ToolsUKBorders

English MSOAs and Postcodes

DCSF. gov.uk

National Pupil Database

mySQL DB

DCSF.gov.uk EduBase and

IDACI

HEFCE.ac.uk POLAR

OAC

Mapnik

Shapefiles

OSM Tiling Script (.py)

Stylesheets(.xml)

ArcGIS

ColorBrewer

PerryGeo Hillshading

AJAX Requests

PVCs (.kml)

OAC

Hawths Tools

ArcGIS

Google Chart API

Chart Cache

PublicProfilerSchools Atlas

OSM Tiles

OpenLayers (.js)

Tiles

Production Systems

OpenStreetMap (via Cloudmade)

Stage 3: Higher Education Modeller

• m: geodemographic group• i: area of residence• j: university (or university location)• a: attainment level• n: school type• x: subject group• h: university type

Potential student groupings

The flow array

• We write the array of interest as

Sij(m, a, n, x, h)

• on the basis that we are always going to want to model Sij with some subset of (m, a, n, x, h).

• The challenge arises from the number of cells in this 7-dimensional array.

Numbers in each category

• m: 7 (Output Area Classification)• i: 30 (NUTS2 areas)• j: 171 HEIs; may be reduced to 30 locations• a: 3 attainment levels; or a continuum of UCAS points• n: ideally 5 school/college types: independent, state

selective, state non-selective, Sixth Form College, FE College; reduce to 3?

• x: 8, or 4?• h: 5 – Oxbridge/UCL/Imperial, major civic, other research,

large other, small other; or reduce to 3?

Flow array cells

• If we take the largest suggested numbers, the number of cells in the array would be:

7x30x171x3x5x8x5 = 21,546,000

• which is a ludicrously large number given that we are handling roughly 500,000 students in a year. Most of the cells would have zero entries.

Revised flow array cells

• If we take the lower category numbers, we get

7x30x30x3x3x4x3 = 680,400

• which is still too large. It is useful to look at this as 30x30 = 900 geographic dimensions and 7x3x3x4x3 = 756 other dimensions. (900x756 = 680,400)

Visualising the real data

• The next step is to visualise the data to guide us towards further aggregation.

School type

Output Area Classification

Attainment

University NUTS2 Area

Flow of student(s)Height = flow size

Comparing London Universities

UCL(Flow size > 2 students)

London Metropolitan University (Flow size > 2 students)

Comparing Leeds Universities

University of Leeds(Flow size > 4 students)

Leeds Metropolitan University(Flow size > 4 students)

OAC: Constrained By Circumstances (Flow size > 5 students )

OAC: City Living (Flow size > 5 students)

London to University of Oxford

All flows out of Cornwall (Flow size > 5 students)

Model Equation• The model would take the form:

Sij(m, a, n, x, h)

= Ai(m, a, n, x, h)Bj(a, x, h)ei(m, a, n, x, h)Oi(m)Dj(a, n, x, h)exp[-β(m)cij(m)]

• This conceptual model would suffer from having too many cells, but we will use the experience of examining the data to find ways of aggregating.

• Initially, the model would be run on a doubly constrained basis for calibration purposes.

• It would then be possible to replace Dj(a, n, x, h) by a set of attractiveness factors, Wj(a, n, x, h). This would provide a ‘What if?’ capability. The model could then be used to test various future policy options.

Recommended