Upload
mprasadnaidu
View
47
Download
0
Embed Size (px)
DESCRIPTION
BIOCHEMISTRY - PowerPoint PPT Presentation
Citation preview
M.Prasad NaiduMSc Medical Biochemistry, Ph.D,.
IntroductionTwo algorithms are there in these
methods BLAST FASTA
FastA is an algorithm developed by Pearson and Lipman. Its more sensitive than Blast.
Blast is an algorithm developed by Altschul et al., in 1990. It provides tools for high scoring local alignment between two sequences. Now a days, a gapped versions are available.
BLASTP algorithm Blast Algorithm involves the following
steps.1. Breaking of the sequence into defined word size.2. Finding a match or HSP (High Scoring Pair).3. Alignment of the word and extending the alignment.
Breaking of the sequence into defined word size
Query : AILDTGATGDAWord size : 4
AILDTGATGDAAILDAILD
ILDTILDT
LDTGLDTG
DTGADTGA
TGATTGAT
GATGGATG
ATGDATGD
TGDATGDA
Finding a High scoring Pair
MQVWGWAILDTVATDAAMLL
AILD
Extending the alignmentMQVWGWAILDTVATDAAMLL
……………..AILDTGATGDA……
Parameters in BLAST result
Percentage of Homology
Scoring of the alignment
No of residues aligned
E-value
FastA algorithmThe word size in FastA algorithm is
defined as K-tuple.Generally the K-tuple for the algorithm is
either 3 or 4 for nucleotide sequences and 1 or 2 for protein sequences.
FastA algorithm also involves the steps similar to that of the BLAST tool. But the alignment generation procedure is different.
Breaking of the sequence into defined k-tuple
F A M L G F I K Y L P G C M1 2 3 4 5 6 7 8 9 10 11 12 13 14
AA BB CC DD EE FF GG HH II KK LL MM
22 1313 11 55 77 88 44 33
66 1212 1010 1414
NN PP QQ RR SS TT VV WW YY ZZ
1111 99
AA BB CC DD EE FF GG HH II KK LL MM
22 1313 11 55 77 88 44 33
66 1212 1010 1414
NN PP QQ RR SS TT VV WW YY ZZ
1111 99
TT
11GG
22FF
33II
44KK
55YY
66LL
77PP
88GG
99AA
1010CC
1111TT
1212
33 -2-2 33 33 33 -3-3 33 -4-4 -8-8 22
1010 33 33 33
The most occuring number in the algorithm is 3, so the alignment starts after leaving three characters or residues
Alignment of the sequencesF A M L G F I K Y L P G C M
T G F I K Y L P G A C T
Parameters in FASTA result
Percentage of Homology
Scoring of the alignment
No of residues aligned
P-Score
Scoring schemesIdentity scoring matrixResidue to residue scores are represented here in the form of similarity.A 4 X 4 matrix is built for the nucleotides and 20 X 20 matrix for the amino acids.For match score is +1 and mismatch is -1
AA TT GG CC
AA 11 00 00 00
TT 00 11 00 00
GG 00 00 11 00
CC 00 00 00 11
PAM Matrices These were first developed by Margaret Dayhoff and co-workers
in 1978. This model assumes that evolutionary changes follow the markov
model i.e. residual changes occur independent on the previous mutation. One PAM is a unit of evolutionary divergence in which there is 1% amino acid change but it doesn’t imply that 100 PAM results in different aminoacids.
Dayhoff and coworkers have calculated the frequencies of accepted mutations for 1PAM by analyzing closely related families of sequences.
The scores are represented as log odd ratios. The 1PAM can be extended to any no of PAMS. For example,
1PAM table is extended to N X 1PAM. For closely related protein sequences, lower distance PAM is used
and higher PAM is used for variying proteins. PAM 30 is used for closer proteins and PAM 250 for divergent
ones.
BLOSUM MatricesThese matrices are developed by Heinkoff and
Heinkoff in 1991.The matrices have been constructed in a
similar fashion as PAM matrices.The data was derived for local alignment of
distantly related proteins deposited in the BLOCKS database.
BLOSUM 30 is used for comparing highly divergent sequences and BLOSUM 90 is used for closely related proteins.
Commonly used BLOSUM matrix is BLOSUM 62 that is used for proteins with 62% identities.