21
Developing Geographical Information Systems In A Cohort Study Andy Boyd ALSPAC, Social Medicine University of Bristol

Developing Geographical Information Systems In A Cohort Study Andy Boyd ALSPAC, Social Medicine University of Bristol

Embed Size (px)

Citation preview

Developing Geographical Information Systems In A Cohort

Study

Andy Boyd

ALSPAC, Social Medicine

University of Bristol

2

Geographical Data Matching- the ALSPAC resource -

• Overview of our data, the issues involve and our plan for the future

• Time for questions

• Time for discussion on how other studies have developed their GIS data resource

Defining GIS

GIS combine mapping and a record of location with database technology. This can be used in the storage, analysis, management or presentation of data.

3

E.W.Gilbert‘s1955 version of John Snow’s 1855 Soho Cholera Outbreak Map

Scope of this presentation

• Not about GIS tools

• Not about GIS analysis or techniques

• It is about the capture and storage of data in an accessible manner to allow future GIS analysis

• Uses ALSPAC as an example

4

5

The ALSPAC GIS dataset

• Geographic identifiers collected directly from the cohort

• Data collected via external data sources

• Geographical data linkage

• Precision of geographic variables – accuracy

• Precision of geographic variables – ethics

• Providing the data as an integral part of the resource

• Current data availability

6

ALSPAC administered data collection

Residential Address (~50000 address points)

• updated from cohort (self reported)

• team who tracks lost cases

• email

• second contacts

• database searches (osis, electoral roll)

School the young person attends / wishes to attend

• via questionnaire (ALSPAC questionnaires/assessments administered in schools, primary to secondary transition questionnaire)

• clinic attendance interview

• collected from the school

7

Linkage to external data sources

Validation / Cleaning• Validation and cleaning of self reported data using data collected

via record linkage (NSTS – NHS Tracing, NPD – National Pupil DB, Royal Mail/OS products)

Missing Data• Enhancing the resource through record

linkageData collection via geographical identifiers• Accessing existing data organised

around geographical IDs (census data, neighbourhood data)

• Primary data collection (distance to overhead power lines, air quality, commuting, school selection)

8

Data Collection through Record Linkage

• Office National Statistics (ONS) Tracing

• Health Authority

• Embarkation

• NSTS (NHS Strategic Tracing Service)

• Address registered with GP

• National Pupil Database (DCSF, DIUS*, UCAS*)

• School Address

• Pupil Residential Address

• DWP*

• Home Office* * Linkage currently being investigated

9

G.I.S – ALSPAC Resource

• ~50,000 ALSPAC residential address points, associated with a date range which can then be linked to ALSPAC data collection

• Schools attendance data from NPD ~17000• Schools attendance data from ALSPAC collection ~ 10000

The geographic relation between household

income and polluting factories – FoE 1999

10

G.I.S Precision

• Spatial data held at many geographic levels• Geographies range in scale from 0.1 meters to regional/national

data• Tied together via address, postcode or grid reference as central

ID• Key resources include:

– NSPD ( was All Fields Postcode Directory) - geo linking database

– Deprivation & Socio Economic indices (IMD, Townsend, Acorn)

– Census data

11

G.I.S – How we link cases to data

• Master file of Postcodes (NSPD)• Postcodes linked to grid reference• Grid references of various scales• PCs/GridRef mapped to:

– Electoral geographies– Census geographies

• Ethics:– We don’t generally identify

residence at PC or equivalent level

Ordinance Survey – The National Grid

12

G.I.S – How we link geographies

Current Situation• Use Postcode / postcode centroid grid reference as our highest

precision variable• Link geographies using NSPD/AFPD appropriate to the measure

required

Proposed Method• Use property reference number (UPRN) / property centroid grid

reference as highest precision variable

13

G.I.S Problems

• Shifting geographies across time points• Royal Mail change postcode areas (and therefore postcode

centroids)• Postcodes are ‘recycled’• Postcode not precise enough in some cases• Postcode boundaries are not contiguous with other geographic

boundaries

14

Accuracy issues with analysis at postcode level

Address level Postcode level

15

Accuracy issues with analysis at postcode level

Address level Postcode level

16

Accuracy issues with analysis at postcode level

Address level Postcode level

17

Linkage problems with the cohort data

• Missing data– Especially problematic for the cases who

didn’t enrol in the original recruitment– Gaps in the address data– Move date often date we were informed

not the actual move date• However…

– ONS matched 99.7% mothers, so we have their old & new NHS numbers and cleaned data (original recruitment cases only)

18

GIS Data Availability

• Collected as administrative resource• Not yet cleaned, documented and

presented to usual ALSPAC standards• Initiatives under way to validate and fill

gaps in record• Schools GIS data in the main not

processed• Aim to build into standard ALSPAC

resource

19

GIS Ethics• Postcode level or greater accuracy treated as

a personal identifier• Research proposals to use these data need

ALSPAC Law & Ethics Approval• Broader geographical data can be released in

normal manner• A two-stage process is used to collect and

process precise data• Data collected via linkage not available for all

cases due to ethical decisions

20

GIS Data Access

Step 1 – Postcodes (or full address) provided to researcher with unique collection ID with no other data attached

Step 2 – Researcher attaches their data and returns file to ALSPAC

Step 3 – ID converted to the appropriate collaborator ID, postcode data removed

Step 4 – Requested ALSPAC data added to the file and data sent to the researcher