21

Data Anonymisation and Linkage

  • Upload
    halle

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

Data Anonymisation and Linkage. Alison Bell Senior Data Analyst / Programmer Health Informatics Centre (HIC) University of Dundee. What is HIC ?. - PowerPoint PPT Presentation

Citation preview

Page 1: Data Anonymisation and Linkage
Page 2: Data Anonymisation and Linkage

Alison Bell

Senior Data Analyst / Programmer

Health Informatics Centre (HIC)

University of Dundee

Data Anonymisation and Linkage

Page 3: Data Anonymisation and Linkage

The Health Informatics Centre (HIC) is a partnership between the University of Dundee, NHS Tayside and the Information Services Division of NHS National Services (ISD). It’s a shared research resource with strong scientific traditions, built on MEMO work since early 1980s.

HIC provides authorised researchers and others with anonymised extracts of information derived from person-specific data sets captured by the NHS, University of Dundee researchers and others, to help them answer research questions and address important quality and patient safety issues.

What is HIC ?

Page 4: Data Anonymisation and Linkage

• Staff and facilities managed by HIC Executive

• User input: HIC User Group

• Governance- Confidentiality & Privacy Advisory Committee (HICCPAC)- Users Forum- Annual External Audit

HIC Structures

Page 5: Data Anonymisation and Linkage

Issues that HIC addresses

Governance: linkage then anonymisation carried out in NHS domain

Trust in access to NHS data through approved SOPs, Privacy Advisory Committee, “Clinical Information Bureau”

Deterministic linkage via single patient identifier

Continually improving data quality through clinical use of data & HIC Users’ Group

Ecological fallacy: person, not practice, based data

Page 6: Data Anonymisation and Linkage

Information governancePhysical security: • Isolation of servers holding identifiable data and staff working with

it• Reliable backup and recovery mechanisms• Separation of functions on NHSNet, JANET

Governed by Confidentiality & Privacy Advisory Committee • Members include lawyer, GP, Caldicott Guardians, Director Public

Health

Management tools:• Standard Operating Procedure• Adverse incident reporting mechanism on intranet• Project management system enforces SOP• Annual external audit by information security experts & table of

issues reviewed monthly by HIC Exec

Page 7: Data Anonymisation and Linkage

HIC Standard Operating Procedure

Covers:• Acquisition & anonymisation of datasets• Requesting access to data• Project level anonymisation (Pro-CHI)• Release & archival of datasets• Reversal of anonymisation

Includes:• Definitions• Appendix summarising 8 data protection

principles• Declaration & signature

HIC has Caldicott & Ethics approval to supply anonymised data to approved research projects

Page 8: Data Anonymisation and Linkage

HIC project management system

• Allocates each project a unique ID• Captures:

– Identity & contact details of “approved researcher”

– Project funder– Project abstract– Copies of approval from Ethics & Caldicott (if

required), NHS R&D, protocol– Data sources and versions– Exact syntax used to generate & link data

extracts– Audit trail of all data releases– Exact location of archived datasets once project

complete

Page 9: Data Anonymisation and Linkage
Page 10: Data Anonymisation and Linkage

Available HIC Data

• HIC hosts a large number of Tayside data sets received from various sources (ISD, PSD, GRO, Ninewells Labs etc.)

• These cover various populations, time periods and use a variety of coding systems

• Each of these patient-specific data sets contain the patient CHI number allowing linkage across multiple data sets

• HIC currently has approval to provide Tayside data only, but seeking to extend to Fife & Glasgow soon

Page 11: Data Anonymisation and Linkage

Drug data-CHI

Lab data-CHI

Drug data-CHI

Lab data-CHI

How data are linked and anonymised

Find andenter CHI

Drug data, lab data

Fully anonymised but

linked data

CHI labelled data

Paper prescription - ID

Lab result - ID

Drug data, lab data

Drug data, lab data

Paper prescription - ID

Paper prescription-ID

Lab result - IDLab result-ID

Drug data-CHI

Lab data-CHI

Drug data, lab data-

CHI

Analysis

Find CHI

Link using CHI

AcademiaClinical Information BureauData Provider -

mainly NHS

Drug data, lab data-

CHI

Drug data, lab data-

CHI

Delete CHIAdd Pro-CHI

CHI labelled data

Page 12: Data Anonymisation and Linkage

Anonymisation Process• Every research dataset has its own project level

anonymisation (Pro-CHI) applied to the data before being released to a researcher.

• Purpose written software generates the Pro-CHI based on the Project Management unique ID & the CHI

– A 3-digit alphabetic code is generated based on the PM ID (to base26) eg. 165 translates to agj

– The last 7 digits are randomly generated– Eg. (CHI)1212345678 = (Pro-CHI) agj8394601 under project 165

• All research data relating to a specific project will have the same 3-digit code.

• All other patient identifiers are removed (eg name, address etc)

• Other anonymisations are performed – anon DOB, anon GP code

• If any identifiable data is required, specific Caldicott approval must be granted

Page 13: Data Anonymisation and Linkage

A bit more about the prescribing data set …..

• The Tayside prescribing data set is unique to the UK.

• It is a database of all Tayside encashed prescriptions, including CHI, date prescribed and drugs dispensed.

• Prior to 2005, paper prescriptions were scanned by the data entry clerks and all prescription details were entered manually using a purpose-built application.

• Since 2005, PSD have been automatically sending HIC the scanned prescription images and associated data. – 300,000 prescriptions per month (total 14.5m in dbase from 2005)

– 13 GB .tif images per month (front and back)

– 17% (50,000) still require data entry (CHI) each month

Page 14: Data Anonymisation and Linkage

Users of HIC data 2004-9

93 projects totalling £16m (£3.2m pa), inc:– Diabetes research– Maternal & Child Health– Dental Health Services Research– Cardiovascular– Genetics– Health Informatics– Drug Safety– Scottish Longitudinal Studies Centre

Page 15: Data Anonymisation and Linkage

Examples of recent studies using

prescription data• Influence of apo-e & other genotypes on

response to statins (Louise Donnelly, GSK studentship)

• Adherence: to insulin (Morris et al, Lancet); to sulphonylureas (Donnan et al Diab Med, Evans et al Diab Med)

• Drug safety studies: corticosteroids and risk of fracture (Donnan et al); statins (Li Wei); methadone (Fahey); methotrexate (Guthrie)

• Markers for co-morbidity, eg. emergency admissions study (Donnan)

Page 16: Data Anonymisation and Linkage

Future plans

• Enhanced HIC service including– Programming, statistical, Clinical Trials Unit support, data

management

• Scaling up to a Scotland-wide Health Programme (SHIP)• Rolling out novel research data mechanism to further

improve information governance: MILA• Pilot study – obtaining identifiable retinal images from

Ninewells eye clinic (300 images @ 5 MB each) & anonymise for research

Page 17: Data Anonymisation and Linkage

Recipient

Trusted repository

(PAC Oversight and SOPs)

Conventional Record-Linkage

Generate identifier substitutionsand deliver to recipient

Data sources

Data sources

Confidentiality?Governance?Scalability?

Page 18: Data Anonymisation and Linkage

Recipient

Linker (holds identifiers)

MILA: Multi-Institutional Linkage & Anonymisation

B

A

Data sources

(17 -> 2)

(89 -> 2)

Person (IDA, IDB, …)Person 1 (17, 89, …)Person 2 (…)…

(89)

(17)

Confidentiality Governance Scalability

(17 -> 2)(89 -> 2)

Page 19: Data Anonymisation and Linkage

Some research data mechanisms

Mechanism Pros Cons

Project specific, ad hoc data collections

Simple, personal, researcher in control

Ad hoc – no governance, re-use

Data warehouse Copies of all data in one place

Threat to trust, privacy

GRID computing & eScience techniques

No copies of data Is it trustworthy ?Is it scalable ?

Multi Institution Linkage & Anonymisation (MILA)

Transparent, data owners retain control

In development – pilot complete

Page 20: Data Anonymisation and Linkage

How MILA matches the requirements

Stakeholder Requirements

Patients, the public

• Trust that mechanism respects consent & privacy ?• Data used once, for intended purpose only • Promotes research and knowledge creation

Data owners, eg . NHS

• Trust that mechanism always secure, follows law ?

• No work to provide or update dataset (a benefit ?)• Due credit given

Researchers • Trust in data provenance, quality, completeness ?• Wide range of datasets (data owners trust mechanism) ?• Dataset descriptions, scoping searches • Data anonymised but linkable • Simple, rapid, cheap data extracts • Long term data curation

Page 21: Data Anonymisation and Linkage

Sir Alan Langlands, September 2005