Bioinformatics Services in Australia – a collaboration with the European Bioinformatics Institute...

Preview:

DESCRIPTION

Bioinformatics is crucial to all life science research. The European Bioinformatics Institute (EBI) is one of a few major centres in the world that provide data and services for bioinformatics and, with Australia’s membership of EMBL, a natural collaborator of Australia. In 2010 a project was launched to mirror EBI services from the University of Queensland (UQ). The goal was to improve Australian bioinformatics by removing barriers of geographical remoteness. We have revisited the Mirror’s mission in light of experience and with input from a survey of Australian bioinformatics needs, and are creating the Bioinformatics Resource Australia – EMBL (BRAEMBL) with a mission to:  enable optimal exploitation of the tools and data of bioinformatics by Australian scientists  contribute to the global biomolecular information infrastructure in a way which showcases Australian science.  engage in Australia-wide training in support of these goals Key findings of the survey and the rationale for the BRAEMBL project will be presented. BRAEMBL will work with the EBI to create a part of the EBI in Australia and to ensure that Australian scientists have access to the data and methods of bioinformatics and the necessary IT resources, though integrated high-quality services to rival those available anywhere in the world. This will draw on the support of Australian partners including BioPlatforms Australia (BPA) and the existing eResearch infrastructure. It will work with UQ’s Research Computing Centre to be early adopters of modern IT methodologies, in particular cloud computing. The evolving plan for the BRAEMBL and its contribution to Australian bioinformatics will be presented.

Citation preview

EMBLAustralia

Bioinformatics Resource Australia EMBL

Bioinformatics Services in Australia – a

collaboration with the European

Bioinformatics Institute

BRAEMBL

Bioinformatics Resource Australia – EMBL

Bioinformatics

Focus – central dogma

Molecular information and its phenotypic correlates

Genomes–Genes–Transcripts–Proteins–Structures–Interactions–Pathways–Systems

EMBLAustralia

Bioinformatics Infrastructure

• Shared data and tools of bioinformatics

• Global databases and systems to explore and exploit them

• E.g.

– GenBank, PDB, UniProt, Ensembl etc.

EMBLAustralia

You can’t do biology without exploiting this information infrastructure

EMBLAustralia

Global Information Ecosystem

• data collection

• data curation

• service

• EBI (European Bioinformatics Institute)

• NCBI

• SIB (Swiss Institute of Bioinformatics)

• etc.

EMBLAustraliaThe EBI:

European Bioinformatics Institute

• Part of EMBL – The European Molecular Biology Laboratory

• About 500 staff and $80 million p.a.

• Australia is an Associate Member of EMBL

• Special relationship with the EBI

EMBLAustralia

EMBL, EBI, Mirror

• Australian science needs bioinformatics

• Perceived disadvantage in Australia

– Geography

– Size

– Infrastructure

• Exploit the EBI ?

EMBLAustralia

EBI Mirror Project

• Copy EBI data and software

• Offer services directly to Australia

• Funding from various government schemes

EMBLAustralia

Beyond Mirror

• Did mirror some EBI services

• Across-the-board mirroring impossible

• Alternatives?

• Re-examine the mission

This talk is about what I am trying to achieve

EMBLAustralia

Back to basics - Mission

• Optimal exploitation of shared tools and data of bioinformatics

• Show case Australian science in global databases

• Training in support of these goals

EMBLAustralia

Back to basics - Mission

• Optimal exploitation of shared tools and data of bioinformatics

• Show cases Australian science in global databases

• Training in support of these goals

EMBLAustralia

Surveying the community

• February 2013

• Solicited input from 500 – 1000 individuals

• 210 responses

• 50% Wet

• 50% Dry

EMBLAustralia

Demography

New South Wales, 47

Victoria, 63

Queensland, 54

Western Australia, 12

South Australia, 18

Tasmania, 3ACT, 7

Northern Territory, 0

New Zealand, 1

Figure 1. Geographical source of responses Figure 2. Sector of respondents

Academic institute/university,

165

CSIRO, 16

Gov. State, 5Gov. Commonwealth, 3

Large commercial, 5

SME, 4

Health, 1

EMBLAustralia

Bioinformatics ubiquitous

Dry

Wet0

5

10

15

20

25

30

35

40

45

50

Full-timebioinformatician Use

bioinformaticsas a core tool

Usebioinformatics

toolsoccasionally

Rarely/neveruse

bioinformaticstools, but would

like to

Figure 4. Use of Bioinformatics

EMBLAustralia

Normal methods of bioinformatics

0 50 100

Biochemistry

Bioinformatics…

Cell biology

Developmental biology

Ecology

Evolutionary biology

Genetics

Genomics

Livestock biology

Marine biology

Metabolomics

Microbiology

Molecular biology

Neurobiology

Pathology

Plant biology…

Pharmacology

Physiology

Proteomics

Systems biology

Taxonomy

Transcriptomics

Main Plus

Figure 5. Scientific domains

0% 50% 100%

Images

3D Structures

Small molecules

Molecular interactions

Pathways

Proteomics

Protein Motifs

Protein Sequences

Gene expression

Genomes

Genes

Nucleic Acids

Very useful Somewhat useful Not useful

Figure 6. Usefulness, percentage

of respondents

Disadvantaged access ?

0 20 40 60

Australian …

New South Wales

New Zealand

Queensland

South Australia

Tasmania

Victoria

Western Australia

0 20 40 60

Australian …

New South Wales

New Zealand

Queensland

South Australia

Tasmania

Victoria

Western Australia

Data

IT resources Expertise 0 20 40 60

Australian …

New South Wales

New Zealand

Queensland

South Australia

Tasmania

Victoria

Western Australia

Somewhat

Not at all

A lot

-30 -10 10 30 50

In group

In organisation

From collaborators

External

None, want some

Don't need

-30 -10 10 30 50

In group

In organisation

From collaborators

External

None, want some

Don't need

-30 -10 10 30 50

In group

In organisation

From collaborators

External

None, want some

Don't need

-30 -10 10 30 50

In group

In organisation

From collaborators

External

None, want some

Don't need

High performance compute

Databases

Software Bioinformatics support staff

Inadequate Adequate or

Training

0 100 200

Database Searching

Sequence Alignment

Sequence clustering/phylogeny

NGS analysis

Statistical analysis

Network and pathway analysis

Structure analysis

Very useful Somewhat useful Not at all useful

Figure 9. Usefulness of training

• Three quarters of respondents indicated “very useful” for at least one topic

• Only four indicated no interest in any training

• Demand – Programming

– Statistics

0 20 40 60

Compute Infrastructure

Data Quality

Data Quantity

Network

Data Complexity

Data Access

Compute

Software

Storage

Community

Funding

My Speciality

Training

Expertise

0 20 40 60 80

Funding

Create or improve software

Access to data

Compute power

My Speciality

Community building

Be a hub for bioinformatics

Access to Expertise

Offer training

Figure 10. Areas of greatest difficulty Figure 11. Areas where BRAEMBL

could make greatest contribution

EMBLAustralia

Survey Conclusions

• Bioinformatics is important • “central dogma” • Wet and dry • Geographic disadvantage not crippling • Scientists like it in their own group • Lack of (access to) expertise • Training and community building • Programming and statistics

EMBLAustralia

Back to basics - Mission

• Optimal exploitation of shared tools and data of bioinformatics

• Show cases Australian science in global databases

• Training in support of these goals

EMBLAustralia

Spectrum of service – tides of change

Style of usage Historic/traditional Today

Search-and-browse In the distant past done

locally

All done on remote

information centres

Molecular searching Commonly local 15 years

ago

Through web forms

submitted to data centres

Programmatic access All local up to about 6

years ago

Extensive use of

programmatic access to

remote machines (REST)

Methods development Still almost all done

locally

Emerging possibility of

virtual machines at

remote data centres.

EMBLAustralia

Spectrum of service – tides of change

Style of usage Historic/traditional Today

Search-and-browse In the distant past done

locally

All done on remote

information centres

Molecular searching Commonly local 15 years

ago

Through web forms

submitted to data centres

Programmatic access All local up to about 6

years ago

Extensive use of

programmatic access to

remote machines (REST)

Methods development Still almost all done

locally

Emerging possibility of

virtual machines at

remote data centres.

EMBLAustralia

-30 -10 10 30 50

In group

In organisation

From collaborators

External

None, want some

Don't need

-30 -10 10 30 50

In group

In organisation

From collaborators

External

None, want some

Don't need

-30 -10 10 30 50

In group

In organisation

From collaborators

External

None, want some

Don't need

-30 -10 10 30 50

In group

In organisation

From collaborators

External

None, want some

Don't need

High performance compute

Databases

Software Bioinformatics support staff

Inadequate Adequate or

EMBLAustralia

Ensure access to:

• Data

• Software methods

• Hardware

• Expertise

Research needs bioinformatics

Bioinformatics expertise

Software methods

Shared databases

Computers and stuff

Bioinformatics

Outsourcing

We need more

Too much

Made possible by SOA’s Virtualisation Cloud computing

Increased Outsourcing

Users find outsourcing hard

BRAEMBL’s job it to make it easy

EMBLAustralia

The IT forecast – generally cloudy

• Move the method to the data not the data to the method

• Why own computers?

• Why own storage?

• Buy naked compute from a vendor (e.g., Amazon)

• Make data visible to the cloud

EMBLAustralia

Back to basics - Mission

• Optimal exploitation of shared tools and data of bioinformatics

• Show case Australian science in global databases

• Training in support of these goals

EMBLAustralia

Back to basics - Mission

• Optimal exploitation of shared tools and data of bioinformatics

• Show case Australian science in global databases

• Training in support of these goals

EMBLAustralia

Projects of iconic Australian interest

• Barrier reef species

• Koala

• (Sheep)

EMBLAustralia

Sea-quence project • Sea-quence project unites

Great Barrier Reef and Red Sea scientists

• Supported by Rio Tinto, Bioplatforms Australia (BPA) and ReFuGe 2020

• Convened by the Great Barrier Reef Foundation

• Sequence – 10 corals

– algal symbionts

– bacteria and viruses

EMBLAustralia

Collaboration with the EBI

• Get the data into the best possible database • At the best possible quality • Quickly • Identifiably Australian Ensembl - Ensembl Genomes - ENA Do the things the EBI won’t prioritise Mini-team in place at BRAEMBL

EMBLAustralia

Back to basics - Mission

• Optimal exploitation of shared tools and data of bioinformatics

• Show case Australian science in global databases

• Training in support of these goals

EMBLAustralia

Back to basics - Mission

• Optimal exploitation of shared tools and data of bioinformatics

• Show case Australian science in global databases

• Training in support of these goals

EMBLAustralia

Expertise building

• Short courses on bioinformatics services in collaboration with the EBI, BPA, CSIRO and others

• Australian Bioinformatics Network to build Community

• User support

EMBLAustralia

Currently on a crusade to persuade Australia to turn BRAEMBL into a sustainable infrastructure

• As part of EMBL-Australia

• With an annual budget of $3 to $5 million

• With substantial security (~5 years) for about 40% of that budget

• With a truly infrastructural mindset

• $5 million is at best 2% of the global budget for such centres

EMBLAustralia

Infrastructure mindset

• Academic institutions value – Publications – High quality graduates – Grants won

• This is different – The mission is to serve researchers throughout Australia – It only makes sense as a long-term project – It must support careers of engineers – It needs sustained and talented leadership

EMBLAustralia

Service Mission

• BRAEMBL will flourish best in a research context

• It will produce some publications

• Its staff won’t seem so different from researchers

• Don’t take comfort in those similarities as the reason to do this

• The project will only do well if its unique service mission is embraced enthusiastically

A part of the EBI in Australia

Recommended