A Prlic - BioJava update

Preview:

DESCRIPTION

Presentation by Prlic at BOSC2012 "BioJava Update"

Citation preview

How to use BioJavato calculate one billion protein structure alignments at

the RCSB PDB website

Andreas Prlić

My Two Hats

RCSB PDBBioJava

www.pdb.org

Overview N

umbe

r of r

elea

sed

entr

ies

Year

Some of the things you can do at the RCSB PDB site

• Advanced queries

• Custom reports

• Visualization

• Education section

• Comparisons across PDB, based on sequence and 3D structure similarities

Jmol

LigandExplorer

Custom report

www.pdb.org

Systematic Structural AlignmentObjective: Find novel relationships

Example: Green Fluorescent Protein§ Nidogen-1: similar 11-stranded § beta-barrel and internal helices§ 3 Å RMSD, only 9% sequence identity§ Nidogen-1: component of basement membrane, no chromophore§ GFP and NID-1 may share common ancestor

Open Science Grid

based on the FATCAT (rigid) algorithm Yuzhen Ye & Adam Godzik. Flexible structure alignment by chaining aligned fragment pairs allowing twists. 2003. Bioinformatics vol.19 suppl. 2. ii246-ii255.

Systematic comparisons of representative chains from 40% sequence identity clusters

22000 sequence clusters33000 representative domains

PDBCustom Job Management

Java Clients can run anywhere

Open Science

Grid

Sends out instructionsto clients

Writes resultsto disk

.

.

.

Initial calculation of frozen snapshot of PDB

~170k CPU hourson OSG

Incremental weekly updates(~1-2 million alignments)

<1000 CPU hours

Code www.biojava.org

1 billion alignmentsavailable freely at

www.rcsb.org

BioJava

• Major rewrite - BioJava 3

BioJava 1 BioJava 3

core data model

symbols/alphabets, counts, distributions

Genome/sequencing

Mult. seq. align

Structure alignment

Modfinder

AA Properties

Protein Disorder

Hmmer3 WS

NCBI WS

Parsers: Genbank/Embl/Blast

Acknowledgments

• Spencer Bliven

• Peter Rose

• Phil Bourne

• all contributors

• A. Yates, J. Jacobsen, P. Troshin, M. Chapman, J. Gao, C.H. Koh, S. Foisy, R. Holland, G. Rimsa, M. Heuer, H. Brandstaetter-Mueller, S. Willis

RCSB PDB BioJava

FundingRCSB PDBGoogle Summer of Code Open Science Grid

Recommended