29
UK e-Science 2008 All Hands Me eting. Edinburgh. Data Sharing e- Infrastructure David Rodriguez 1 , Trevor Carpenter 2 , Jano van Hemert 1 & Joanna Wardlaw 2 . On behalf of the SINAPSE Collaboration. 1. National e-Science Centre. School of Informatics, University of Edinburgh. 2. SFC Brain Imaging Research Centre. Department of Clinical Neurosciences, University of Edinburgh.

UK e-Science 2008 All Hands Meeting. Edinburgh. Data Sharing e-Infrastructure David Rodriguez 1, Trevor Carpenter 2, Jano van Hemert 1 & Joanna Wardlaw

Embed Size (px)

Citation preview

UK e-Science 2008 All Hands Meeting. Edinburgh.

Data Sharing e-Infrastructure

David Rodriguez1, Trevor Carpenter2, Jano van Hemert1 & Joanna Wardlaw2.

On behalf of the SINAPSE Collaboration.

1. National e-Science Centre. School of Informatics, University of Edinburgh.

2. SFC Brain Imaging Research Centre. Department of Clinical Neurosciences, University of Edinburgh.

UK e-Science 2008 All Hands Meeting. Edinburgh.

Contents

The SINAPSE project Data Protection &

pseudonymisation Data sharing Components Status

UK e-Science 2008 All Hands Meeting. Edinburgh.

Contents

The SINAPSE project Data Protection &

pseudonymisation Data sharing Components Status

UK e-Science 2008 All Hands Meeting. Edinburgh.

The SINAPSE Project

Stands for Scottish Imaging Network: a Platform for Scientific Excellence.

Pooling initiative of six Scottish universities: Aberdeen, Dundee, Edinburgh, Glasgow, St. Andrews and Stirling.

Main objectives: develop imaging expertise, support multi-centre clinical research in conjunction

with the Clinical Research Networks, improve the ability of neuroscientists to collaborate on

clinical trials, have a direct impact on patient health.

UK e-Science 2008 All Hands Meeting. Edinburgh.

Data Sharing e-Infrastructure

For enabling multi-centre clinical research through data sharing.

The main objectives of the SINAPSE e-infrastructure project are: Anonymisation, automatic compliance with data

protection policies; Security, advanced authentication and authorisation

within projects; Usability, providing a user friendly environment to

access data and applications; Modularity, conforming to relevant standards and use

of existing components; Centralisation, leveraging existing compute clusters

and storage.

UK e-Science 2008 All Hands Meeting. Edinburgh.

Benefits

Easier Data Protection compliance for users

Enables secure data sharing Coherent view of available data

(single point of access) Roadmap for end-of-project data

publication & data curation

UK e-Science 2008 All Hands Meeting. Edinburgh.

Key Features

Single sign-on: identify once per session for all the services. Delegated authentication to home universities

Permission management using groups and roles Data Catalogue:

Files Catalogue Metadata Catalogue: storing relevant information to

allow users find the desired data

Modularity Reuse existing components Allows future updates/changes

UK e-Science 2008 All Hands Meeting. Edinburgh.

Access Levels

Different access levels for different users/use cases

From only file access to encrypted files for site operators

Researchers sometimes just need access to decrypted images and associated basic image metadata, other will access to more clinical information

and metadata.

UK e-Science 2008 All Hands Meeting. Edinburgh.

Contents

The SINAPSE project Data Protection &

pseudonymisation Data sharing Components Status

UK e-Science 2008 All Hands Meeting. Edinburgh.

Data Protection

Data Protection Act (1998). Other legislation applies. Personal data must be processed in a fair and lawful manner. Projects to be run in SINAPSE shall have a proper consent form for the

processing to be done. All ethical approval.

Pseudonymous identifier to substitute the CHI (Community Health Index). Linked using a database.

Anonymisation of other fields. Full destruction of the information for some data like name or

address. Depending on the project some might be transformed into less

informative representations: Postal Code -> Deprivation Index or partial Postal Code Date of birth -> Age (with different precisions).

Any later access to personal data will be granted by the corresponding Data Controller.

All personal data processing will be logged for auditing.

UK e-Science 2008 All Hands Meeting. Edinburgh.

Data Pseudonymisation

National PACS

CHI Transformation

Service Pseudonymisation Application

Local Storage

Anonymous research data

Link Table

NHS Research Centre

UK e-Science 2008 All Hands Meeting. Edinburgh.

Pseudonymisation Tool

Implemented in Java. To be deployed as near as possible to the data

acquisition. Can be configured for each site. Configurable using XML documents.

Different projects can apply different policies. The policy specifies the classes that will execute the

transformation of the data. Graphical tool for editing the policies.

These classes will be distributed in signed jars, and their authenticity will be checked using their hash. For data provenance checks and auditing purposes the

classes’ version will be tracked.

UK e-Science 2008 All Hands Meeting. Edinburgh.

CHI Transformation Service CHI (Community Health Index) is the National

unique identifier for NHS (Scotland) patients Used in any health related communication As it identifies the patient it is sensitive information

It is composed of 10 digits that include Date of birth Gender Control digit

Possibilities Reversible / Irreversible transformation Unique for all Sinapse / Unique for each Data

Controller

UK e-Science 2008 All Hands Meeting. Edinburgh.

Contents

The SINAPSE project Data Protection &

pseudonymisation Data sharing Components Status

UK e-Science 2008 All Hands Meeting. Edinburgh.

Data Sharing

Centralised model adopted: cheaper, easier, allows to reduce the IT burden undertaken by research staff. Although there are several grid projects that

provide DICOM functionalities. The research data will be encrypted

before storing it. Data organised per project

Access control using groups & roles. Authentication using Shibboleth due to

usability concerns regarding X.509 certificates.

UK e-Science 2008 All Hands Meeting. Edinburgh.

Data Files

University Authentication Service

VOMS

Metadata Catalogue

SINAPSE

Storage

Data Catalogue

Uploading Data

Local Storage

Portal

Data Upload Service

Metadata extraction

Data Encryption

Data Storage

UK e-Science 2008 All Hands Meeting. Edinburgh.

Centralised Architecture

Simpler Deployment Easier middleware release control Lesser impact in participant centres Easier to manage and use No default resilience

A second centre would be needed But this is only necessary for critical services With a good support a reasonable service can

be provided using a single centre

UK e-Science 2008 All Hands Meeting. Edinburgh.

Deployment Plan

ECDF (http://www.is.ed.ac.uk/ecdf/) A singular facility along Scotland

Disk space and CPU time will be rented depending on the necessities.

1456 CPU cores 275 TB of disk

Also SINAPSE owned server to be hosted by ECDF: ECDF will provide basic hardware + software support SINAPSE services to be hosted in it:

Portal Data Catalogue Research Data encryption service OGSA-DAI Projects’ customised databases RAPID…

UK e-Science 2008 All Hands Meeting. Edinburgh.

Contents

The SINAPSE project Data Protection &

pseudonymisation Data sharing Components Status

UK e-Science 2008 All Hands Meeting. Edinburgh.

Components

UK e-Science 2008 All Hands Meeting. Edinburgh.

Portal

A gridsphere based portal will give access to the resources.

Basic functionality to be provided by SINAPSE Data uploading Catalogues querying …

The projects will customise the portal for their needs providing their own portlets.

UK e-Science 2008 All Hands Meeting. Edinburgh.

Authentication

Shibboleth federated authentication Single sign-on. Delegated to home universities. Users will continue using a method

they are already familiar with. X.509 certificates are usual in Grids

But can be a handicap for some users.

UK e-Science 2008 All Hands Meeting. Edinburgh.

Authorization

Dynamic Virtual Organisations Members should be added/removed

easily New VOs creation for new

projects/studies VO role management

Role based access Allows different access levels to

information for different users

UK e-Science 2008 All Hands Meeting. Edinburgh.

Communications

Encrypted communications for all the services: GridFTP SSH HTTPS for web services

UK e-Science 2008 All Hands Meeting. Edinburgh.

Images Encryption

These keys are to protect research data, not personal data Not so sensitive.

Keys accessible from all the SINAPSE sites

Access to the keys based on groups and roles Project/study dependent

UK e-Science 2008 All Hands Meeting. Edinburgh.

Catalogues

Data Catalogue for keeping track of the files in the system

Metadata Catalogue storing key attributes extracted from the DICOM headers.

Clinical Information databases and additional metadata databases can be deployed by the different projects.

OGSA-DAI will be used to provide access to this resources.

UK e-Science 2008 All Hands Meeting. Edinburgh.

Contents

The SINAPSE project Data Protection &

pseudonymisation Data sharing Components Status

UK e-Science 2008 All Hands Meeting. Edinburgh.

Status

Proposal endorsed by the SINAPSE IT & Image Analysis committee last July.

Grant application for machines & storage resources to be sent soon.

Pseudonymisation tool being tested.

UK e-Science 2008 All Hands Meeting. Edinburgh.

Questions