22
Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder

Data Curation and Management activities within the UCT Computational Biology Group

  • Upload
    daire

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Data Curation and Management activities within the UCT Computational Biology Group. Dr Nicky Mulder. Outline. Activities at UCT: High-throughput biology data Sequence annotation DAS annotation development Issues we face A note on standards and ontologies. High-throughput biology data. - PowerPoint PPT Presentation

Citation preview

Page 1: Data Curation and Management activities within the UCT Computational Biology Group

Data Curation and Management activities within the UCT

Computational Biology Group

Dr Nicky Mulder

Page 2: Data Curation and Management activities within the UCT Computational Biology Group

Outline

Activities at UCT:– High-throughput biology data– Sequence annotation– DAS annotation development

Issues we face A note on standards and ontologies

Page 3: Data Curation and Management activities within the UCT Computational Biology Group

High-throughput biology data

Close ties with CPGR Microarray data storage –BASE Proteomics data:

– Annotation –pipeline required– Storage –LIMS required

Page 4: Data Curation and Management activities within the UCT Computational Biology Group

BASE

BioArray Software Environment Open source database for storage of array-

type data Manages raw data (images) and annotations Has limited LIMS options Can include specifications for MIAME

compliance

Page 5: Data Curation and Management activities within the UCT Computational Biology Group

BASE Sample Information

Page 6: Data Curation and Management activities within the UCT Computational Biology Group

BASE Sample Information

Page 7: Data Curation and Management activities within the UCT Computational Biology Group

BASE experimental info

Page 8: Data Curation and Management activities within the UCT Computational Biology Group

Proteomics Data

Still in progress Peptide identification programs Additional cross-linking from results to public

database annotations Storage of experimental data and resulting

identifications Include MIAPE compliance Linking to genomics data –standards required

Page 9: Data Curation and Management activities within the UCT Computational Biology Group

Sequence Annotation 1

Paeano pipeline for annotation of cDNAs from non-model organisms

Uses collection of publicly available and custom software

Results are stored under projects Links provided to array data in BASE

Page 10: Data Curation and Management activities within the UCT Computational Biology Group

Sequence Annotation 2

Glossina (Tsetse) EST annotation project Held annotation jamboree at UWC Worked with Twiki tool developed by JBIRC Data to be submitted to public databases

Page 11: Data Curation and Management activities within the UCT Computational Biology Group

Twiki system

Page 12: Data Curation and Management activities within the UCT Computational Biology Group

Twiki system

Page 13: Data Curation and Management activities within the UCT Computational Biology Group

DAS Annotation Tool

Distributed Annotation System –allows viewing of annotation from different sources

Can overlay your own data/annotation Facilitates information sharing without issue of updates Repositories distributed in different geographical

locations Extension of DASTy2 –developed at NBN Development of DAS annotation tool underway

Page 14: Data Curation and Management activities within the UCT Computational Biology Group

DASTy

Page 15: Data Curation and Management activities within the UCT Computational Biology Group

Links to other DAS viewers

Page 16: Data Curation and Management activities within the UCT Computational Biology Group

DAS annotation tool

Collaborative visual annotation tool- Annotation- Comments

- Sequences - Features- Non positional features

- Methodology of trust on a collaborative annotation process

Page 17: Data Curation and Management activities within the UCT Computational Biology Group

Data curation and management issues

HTB software licenses are expensive Open Source not always maintained Ensuring regular backups (data size) Keeping data up to date Researchers leave data after project –not updated to

new versions Privacy –researchers share data only with

collaborators, patient data is private Sharing and linking data

Page 18: Data Curation and Management activities within the UCT Computational Biology Group

Standards and ontologies

Use a controlled vocabulary (controlled list of terms) or ontology (set of terms with relations)

Enables easy data retrieval and sharing Easy comparison of results from different labs Compatibility with other labs/databases world-

wide Ease of uploading data into public databases Unambiguous report of research

Page 19: Data Curation and Management activities within the UCT Computational Biology Group

Open Biomedical Ontologies

Central location for accessing well-structured controlled vocabularies and ontologies for use in the biological and medical sciences

Provides simple format for ontologies Scope include anatomy, phenotype,

development, disease, “omics”, experiment, etc.

http://obo.sourceforge.net

Page 20: Data Curation and Management activities within the UCT Computational Biology Group

Data exchange standards

Microarray standards –MIAME and MAGE Proteomics Standards Initiative (PSI) Systems Biology Markup Language (SBML) –

computer-readable format for representing models of networks

Biological Pathways Exchange (BioPAX) –format for representing pathways

Page 21: Data Curation and Management activities within the UCT Computational Biology Group

Conclusions

Some tools in place for curation and management of different data types

Need better education of researchers to encourage this

Ontologies and standards are important in digital data curation and management, need to encourage compliance with international standards

Page 22: Data Curation and Management activities within the UCT Computational Biology Group

Acknowledgements

Funding:

Collaborations:– CPGR– Researchers at UCT