Tools and Datasets Exploring the tools of the trade

Preview:

Citation preview

Tools and Datasets

Exploring the tools of the trade

Sequence Databases

● Understanding EMBL Entries

● Understanding SWISS-PROT Entries

Understanding EMBL Entries

Understanding SWISS-PROT Entries

General Concepts and Methods

● Predictions and Validation

Maxim 17.1

Recognise the difference between the validation of a model and the testing of it for

self-consistency

True/False/Negative/Positive

Maxim 17.2

Generally, False Negative predictions are considered more acceptable than False

Positives

Assessment/Validation Procedure and Possible Outcomes

figOUTCOME.eps

Balancing the errors

Maxim 17.3

With False Negatives we could come back next year and find the ones we missed, and these

are preferred to False Positives, where we can waste time studying them this year, only to find out that the time was wasted. It all depends on

the circumstances

Maxim 17.4

Sometimes all those false positives are maybe, just maybe, trying to tell you something. So, if

you aspire to a Nobel prize ...

Using multiple algorithms to improve performance

Maxim 17.5

Use a fast if inaccurate algorithm to protect your slow, accurate second-stage algorithm

An overview of tRNA: 2D, 3D and Gene Structure

figTRNA.eps

http://www.ncbi.nlm.nih.gov/Education/

Introducing Bioinformatics Tools

http://www-igbmc.u-strasbg.fr/BioInfo/

ftp://ftp.ebi.ac.uk/pub/software

ClustalW

ClustalX operating under Windows XP

figCLUSTALX.eps

$ gzip -d clustalw1.83.UNIX.tar.gz

$ tar -xvf clustalw1.83.UNIX.tar

$ cd clustalw1.83

$ make

$ ./clustalw

$ ./clustalw -h

$ ./clustalw -INFILE=../MerAHMAs_MerP.swp -OUTFILE=../Mer.aln

Algorithms and Methods

Substitution/scoring matrices

BLAST

Maxim 17.6

Exactly which BLAST is best depends on the circumstances

$ cd

$ mkdir blast

$ cp blast-2.2.6-ia32-linux.tar.gz blast

$ cd blast

$ gzip -d blast-2.2.6-ia32-linux.tar.gz

$ tar -xvf blast-2.2.6-ia32-linux.tar

[NCBI]

Data="/home/michael/blast/data"

Installing NCBI-BLAST

$ mkdir databases

$ cd databases

$ mv ../All_Mer_Proteins.fsa .

$ ../formatdb -i All_Mer_Proteins.fsa -p T -o T -n Merproteins

$ blastall -p blastp -d databases/Merproteins -i test_seq.fsa

$ sed 's/sw|/sp|/' All_Mer_Proteins.fsa > Mer_db.prot

$ ../formatdb -i Mer_db.prot -p T -o T -n Merproteins

Preparation of database files for faster searching

$ fastacmd -d databases/Merproteins -I

$ fastacmd -d databases/Merproteins -s MERA_SHIFL

$ blastclust -d databases/Merproteins | head

The different types of BLAST search

Where To From Here

Recommended