12
PORTING HMMER AND INTERPROSCAN TO THE GRID Daniel Alberto Burbano Sefair ([email protected] ) Michael Angel Pérez Cabarcas ([email protected]) University of The Andes Information Technology Division Colombia November 2008

PORTING HMMER AND INTERPROSCAN TO THE GRID

  • Upload
    duena

  • View
    52

  • Download
    0

Embed Size (px)

DESCRIPTION

PORTING HMMER AND INTERPROSCAN TO THE GRID. Daniel Alberto Burbano Sefair ( [email protected] ) Michael Angel Pérez Cabarcas ( [email protected] ) University of The Andes Information Technology Division Colombia November 2008. Topics. Introduction HMMER InterProScan - PowerPoint PPT Presentation

Citation preview

Page 1: PORTING HMMER AND INTERPROSCAN TO THE GRID

PORTING HMMER AND INTERPROSCANTO THE GRID

Daniel Alberto Burbano Sefair ([email protected])

Michael Angel Pérez Cabarcas ([email protected])

University of The Andes

Information Technology DivisionColombia

November 2008

Page 2: PORTING HMMER AND INTERPROSCAN TO THE GRID

Topics

• Introduction• HMMER• InterProScan• What do we have?• What do we want with your help?• Questions

Page 3: PORTING HMMER AND INTERPROSCAN TO THE GRID

INTRODUCTION

• Our users, from Biologic department, want to use HMMER and InterProScan by an easy way saving processing time.

– Graphic User Interface instead of command line interface.

– They are few users that submit many jobs (1000 - 3000).

– Submit jobs with files upper than 10 MB.

– Reduce the processing time using other computers.

– Depend of the job, the time could be 1 h to 12 h.

– Some jobs from InterProScan fail, and must be submited again.

Page 4: PORTING HMMER AND INTERPROSCAN TO THE GRID

1. What is HMMER?

- “HMMER is a sequence analysis tool using profile Hidden Markov Models”.

- It is a set of 9 applications used by command line:

hmmpfam, hmmsearch, hmmalign, hmmbuild, hmmconvert, hmmcalibrate, hmmemit, hmmindex, hmmfetch.

The above definition is taked from: ftp://selab.janelia.org/pub/software/hmmer/CURRENT/Userguide.pdf

Home page: http://hmmer.janelia.org/

HMMERProfile Hidden Markov Models

Page 5: PORTING HMMER AND INTERPROSCAN TO THE GRID

2. How can I use HMMER by command, PBS, and JDL?HMMER is a command line application, this is an example

hmmsearch file.hmm MySequence.fasta >> output

HMMER

Page 6: PORTING HMMER AND INTERPROSCAN TO THE GRID

1. What is InterProScan?

The following definition is taked from Europan Bioinformatic Institute: http://www.ebi.ac.uk/2can/tutorials/function/InterProScan.html

“InterProscan is a tool that combines different protein recognition methods into one resource. It scans a given protein sequence against the protein signatures of the InterPro member databases (PROSITE, PRINTS, Pfam, ProDom, SMART, TIGRFAMMs.”

Home Page: http://www.ebi.ac.uk/Tools/InterProScan/

InterProScan

Page 7: PORTING HMMER AND INTERPROSCAN TO THE GRID

2. How does InterProScan work?

1. The User submit a protein sequence.

2. Protein sequence applications are launched and search against specific databases.

3. Each application returns a list of hits.

4. The results are combined.

5. The information returned to the user

1

2 3

4

InterProScan

Infomration and Sshema are taken from: http://www.ebi.ac.uk/2can/tutorials/images/scan_schema.gif

Page 8: PORTING HMMER AND INTERPROSCAN TO THE GRID

3. How can I use InterProScan by command, PBS, and JDL?

InterProScan is a command line application, this is an example

iprscan -cli –I input.seq -o test.out -format raw -goterms -iprlookup

InterProScan

Page 9: PORTING HMMER AND INTERPROSCAN TO THE GRID

What do we have?

• Bioinformatic Grid Wrapper (BGW) for HMMER and InterProScan that is a Command Line Interface (CLI)

Page 10: PORTING HMMER AND INTERPROSCAN TO THE GRID

What do we want with your help?

Architecture

Page 11: PORTING HMMER AND INTERPROSCAN TO THE GRID

Thanks

?

Page 12: PORTING HMMER AND INTERPROSCAN TO THE GRID

• “Profile hidden Markov models (profile HMMs) can be used to do sensitive database searching using statistical descriptions of a sequence family's consensus. HMMER is a freely distributable implementation of profile HMM software for protein sequence analysis.”