View
225
Download
4
Embed Size (px)
Citation preview
GCG vs EMBOSSGary Williams
Which is better GCG or EMBOSS?
You must decide for yourselves You may find other packages that do what you
want Use the tools that do the job This is a comparison of GCG and EMBOSS to
help you decide
Interfaces
Web W2H available for both EMBOSS W2H still has rough edges PISE Others under development
X-Windows GCG - Seqlab EMBOSS - SPIN, (+ others coming)
Telnet/xterm/Character-based emnu
Command line is very similar
The UNIX command line interfaces of GCG and EMBOSS are very similar.
You type the name of the program You can add any options you want to the
command-line Press the RETURN key Any mandatory information that was not on the
command-line will be prompted for.
GCG command-line
% name -other=thing
This is the name program that reads a sequence and writes out something.
NAME what sequence ? embl:hsfau1
Begin (* 1 *) ?
End (* 2016 *) ?
Reverse (* No *) ?
What should I call the output (* hsfau.name *) ?
EMBOSS command-line% name -other thing
Reads in sequences and writes a thing
Input sequence(s): embl:hsfau1
Output data [hsfau1.name]:
Use ‘-ask’ to make EMBOSS programs prompt for the start and end of sequences
Some common options
Running in scripts, don’t prompt, just fail if command-line is insufficient GCG: -default EMBOSS: -auto
Help on options GCG: -check EMBOSS: -help or -help -verbose
Boolean options (Yes/No, True/False) GCG: -thing, -nothing EMBOSS: -thing, -nothing, -thing=T, -thing=F,
-thing=1, -thing=0, -thing=Y, -thing=N
Sequence options in EMBOSS
"-sequence" related qualifiers -sbegin integer first base used
-send integer last base used, def=seq length
-sreverse bool reverse (if DNA)
-sask bool ask for begin/end/reverse
-slower bool make lower case
-supper bool make upper case
-sformat string input sequence format -ufo string UFO features
Sequence options in EMBOSS
"-outseq" related qualifiers
-osformat string output sequence format
-ossingle bool separate file for each entry
EMBOSS general options -debug bool write debug output to program.dbg
-auto bool turn off prompts
-stdout bool write standard output
-filter bool read standard input, write standard output
-options bool prompt for required and optional values
-verbose bool report some/full command line options
-help bool report command line options
Data files GCG uses ‘..’ to divide comments from data EMBOSS does not use ‘..’ In general, EMBOSS uses ‘#’ to mark a comment
line Use ‘embossdata’ to extract and check on data
files. As in GCG, data files copied into the current or
home directory are used in preference to the originals.
List files (files of file names) Similar to GCG lists files, but no ‘..’ Comment lines start with ‘#’ Can contain the names of other list files:
# This is my list file
embl:hsfau
embl:ggg*
myfile.seq:clone10
file.seq
@list2
File formats
GCG only GCG format, MSF and RSF
EMBOSS many formats automatically recognised can specify using ‘::’ or ‘-osf’ eg:
clustal::globin.aln
-osf gcg
One file, many sequences GCG
Only one sequence per GCG file EMBOSS
One or more sequences per file Default is to write all sequences to one file -ossingle will change to writing many files GCG, Staden and plain format files can only hold
one sequence per file.
Features
GCG No concept of feature tables
EMBOSS Many programs now write out results as GFF Soon, all programs that find things will write the
results as GFF GFF will become another sequence format Programs to manipulate and display sets of
features are planned c.f. showfeat, coderet, maskfeat, diffseq
Databases
EMBOSS is poor at grouping many databases under one name
E.G. Need a way of referring to ‘embl’ and ‘emblnew’ as one database.
This will be done, but currently, a list file containing the following seems best:
embl:*
emblnew:*
Command line wildcards
GCG: embl:* - no problem
EMBOSS: embl:* - UNIX complains it can’t find the files solution is to quote it: “embl:*” or: embl:\*
HELP
GCG: genman, genhelp
EMBOSS tfm
What program does what? See David Martin’s list of equivalences:http://www.no.embnet.org/Programs/SAL/EMBOSS/fromGCG.php3
NB this doesn’t list EMBOSS programs with no equivalent in GCG!
What EMBOSS does NOT do The major deficiencies in the EMBOSS package
are: BLAST, FASTA, ASSEMBLY You should use the publicly available software:
Blast - NCBI, HGMP, many other sites Fasta - HGMP Assembly - Staden package
What EMBOSS does do
Giving ‘stdout’ as the output file name makes output go to the screen.
Much effort is put into removing arbitrary limits. E.g. Max. sequence length: 2Gb Many programs limited only by available memory
Source code available for inspection, change and writing your own programs
EMBOSS is FREE! GNU Public Licence Open Source Software
THE END