29
Introducing EMBOSS/Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

Embed Size (px)

Citation preview

Page 1: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

Introducing

EMBOSS/JembossEuropean Molecular Biology

Open Software Suite

Dr. Erik Bongcam-Rudloff

Page 2: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

History

In the beginning was EGCG

1988 - EGCG is started to provide extensions to the GCG package.

EGCG is used by up to 10,000 users at 150 centers as an addition to the GCG package.

EGCG sought to support the needs of major sequencing initiatives such as the human genome project.

Late 1990's - GCG/EGCG is the de facto standard sequence analysis package worldwide.

Oxford Molecular (then commercial owners of GCG) close access to the program code preventing further development of EGCG past version 8.1

Page 3: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

The Birth of EMBOSSSpurred by the continuing demand for new sequence analysis programs the EMBOSS project is started.

Core development is funded by the UK research councils and the Welcome Trust as part of their commitment to the Human Genome Project

The experience of the EGCG team is used to write an entirely new package from the ground up.

EMBOSS has been designed from scratch by scientists for scientists, so it can readily be integrated with the web or other packages.

EMBOSS has been licensed as 'Open Source' to ensure continued access to the program code. This prevents anyone from taking your programs away.

Page 4: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

EMBOSS today

Core libraries of routines for sequence manipulation, database access, and so on are available.

These libraries are prewritten functions that any programmer can use. They cover simple things like extracting subsequences to complex things like sequence alignments and comparisons. These make writing new programs much easier.

More than 80 programs have been written, replacing greater than 90% of the functionality of GCG and adding many functions you will find in no other package.

Programs are being contributed at an impressive rate from all over the world and EMBOSS is installed in many laboratories worldwide.

Open source means that you have permission to modify and customise the programs to do what you need, without constraint.

Page 5: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

EMBOSS present and futureEMBOSS under development

Training courses and documentation

These are being actively developed by users and EMBnet.

Graphical/Web interfaces.

Now the initial EMBOSS release is stable, graphical interfaces are being developed Web-based: W2H, Pise and othersJava: JEMBOSS

Your own programs

Writing an EMBOSS application is quick and easy for a C programmer.

Page 6: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

Comparing EMBOSS and GCG

Some examples:•DISTANCES -PHYLIP package•EXTRACTPEPTIDE -transeq•MAP -Restrict Remap•MOTIFS -Patmamotifs•PEPDATA -getorf

Page 7: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

Using EMBOSSAll EMBOSS programs can be run from the command line. There is no need to specifically initialise EMBOSS.

You can specify everything with options or have EMBOSS prompt you for the inputs to the program

By default EMBOSS programs will not ask you lots of questions, just the minimum needed to run the program.

If you put the '-opt' option on the command line then EMBOSS will ask you for more detailed options.

You can get help on any program with the '-help' option on the command line.

This will list all the inputs a program needs in order to run.

Page 8: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

Writing EMBOSS programs

Fully GPL

No purchase

necessaryEMBOSSInstant bioinformatics!

Just add science and 'make'!

1. Write the ACD file to describe the input to your program.

Three steps to a new program:

2. Write the program code to initialise your program in EMBOSS using the templates provided. Retrieve the parameters.

You can test that you have your program described correctly with the command 'acdc'

int param1;void main(int argc, char * argv) {embInit("program",argc, argv);param1 = ajAcdGetInt("param1");...

3. Now just add the science. Write the code to do the manipulations you need.

EMBOSS has many common bioinformatic functions in the AJAX and NUCLEUS libraries.

Page 9: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

EMBOSSINTERFACES

Page 10: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

Interfaces Web

EMBOSS- W2H PISE EMBOSSS-GUI

X-Windows STADEN- SPIN, (+ others coming)

Ssh/xterm/Character-based emnu

Page 11: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

Web interface details Many are being developed:

W2H (http://www.hgmp.mrc.ac.uk/Registered/Webapp/emboss-w2h/)

Pise details (http://www-alt.pasteur.fr/~letondal/Pise/)

wEMBOSS (http://liv.bmc.uu.se/EMBOSS)

Page 12: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

Web interface details

Page 13: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

X-Windows interfaces At least three are being developed:

Spin (Staden package) Kaptain

(http://userpage.fu-berlin.de/~sgmd/) Arka

(http://www.bioinformatics.org/genpak/)

Page 14: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

Staden package with EMBOSS

Page 15: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

EMBOSS/Jemboss Jemboss is the new Graphical User

Interface (GUI) to EMBOSS, designed to facilitate the use of programs. It is written in the programming language Java, enabling the interface to be used in both PC and UNIX environments.

Page 16: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

EMBOSS/Jemboss The older Mac platform does not

support this GUI, and only Macs running MacOS X can also run Jemboss.

Web-start installed by default

Page 17: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

EMBOSS/Jemboss The interface has been written at the

HGMP-RC in collaboration with the EMBOSS team

First release January 2002

Page 18: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

EMBOSS/Jemboss A web launch tool (Java Web Start)

must be installed on the client (i.e. user's computer) before Jemboss can be accessed

to allow this Java program to be downloaded and launched from the web

Page 19: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

EMBOSS/Jemboss The Jemboss server has been installed

under linux, AIX, MacOSX, irix, Solaris and HP-UX.

The server setup is very much dependent on the local environment and the level of security necessary for a site.

Page 20: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

EMBOSS/Jemboss It is possible to set up a basic non-

authenticated and non-encrypted server.

This may be suitable for sites in which the server is only available internally.

A more secure server can be set up which uses SSL for data encryption.

Page 21: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

EMBOSS/Jemboss SOAP is used to communicate

between the client and the server, Apache-Tomcat is used to deploy the

Jemboss services.

Page 22: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

EMBOSS/Jemboss

Page 23: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

EMBOSS/Jemboss

Page 24: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

EMBOSS/Jemboss

And now all this in practiceAnd now all this in practice!!

Page 25: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

Concluding remarks If you want to install Central server with system manager Pros and Cons of the EMBOSS

package

Page 26: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

The EMBOSS-Coktail jakarta-tomcat-*.tar.gz SOAP (Simple Object Access Protocol) Apache-x.x.tar.gz Libpng-tar.gz Z-lib.tar.gz EMBOSS-2..x.x.tar.gz The latest Java

Page 27: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

EMBOSS minus The major deficiencies in the EMBOSS

package are: BLAST, FASTA, ASSEMBLY You should use the publicly available

software: Blast - NCBI, HGMP, many other sites Fasta - HGMP Assembly - Staden package

Page 28: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

EMBOSS plus Much effort is put into removing arbitrary

limits.E.g. Max. sequence length: 2Gb Many programs limited only by available

memory Source code available for inspection, change

and writing your own programs EMBOSS is FREE! GNU Public License Open Source Software

Page 29: Introducing EMBOSS/ Jemboss European Molecular Biology Open Software Suite Dr. Erik Bongcam-Rudloff

THE END Questions?