Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
1
Bi04b_1
© Copyright W. Schreiner 2005
Multiple Multiple sequencesequence alignmentalignment bybyGeneticGenetic AlgorithmsAlgorithms
Unit 04b:
Bi04b_2
© Copyright W. Schreiner 2005
The Darwinian principle of survival of the fittestasexual mutation operationsexual recombination (crossover) operationinversion operationgene regulationgene duplicationgene deletionembryosdevelopment of embryo into organism
9 Biological ideas used in Genetic Algorithms (GA) and Genetic Programming (GP)
from Koza (1993)
„„......WhatWhat BiologyBiology cancan do do forfor Computer Science...Computer Science...““
2
Bi04b_3
© Copyright W. Schreiner 2005
Definition
The GA is a very general computational approach thatcan be tailored to solve optimization - or search tasks forvery different problems and settings.
The genetic algorithm is a mathematical algorithm thattransforms a set (population) of mathematical objects(typically fixed-length binary character strings), each withan associated fitness value, into a new set (new generation of the population) of offspriing objects, using operationspatterned after naturally-occurring genetic operations and the Darwinian principle of reproduction and survival of thefittest.
from Koza (1993)
GeneticGenetic AlgorithmAlgorithm (GA)(GA)
Bi04b_4
© Copyright W. Schreiner 2005
GeneticGenetic AlgorithmAlgorithm, , ConceptsConcepts
GENETIC OPERATIONSindividuals
Parent population
loop
Fitness score
3
1
4
5
6
2
offspring population
Fitness score
4
2
2
4
7
2
3
Bi04b_5
© Copyright W. Schreiner 2005
011011AZKLMN
beginloopadd
end
beginaddmet
end
computer programs GAs on programs:„genetic programming“
binary stringscharacter stringstree structuretechnicalconstructions, differing in detail
representations directlyusable by GA
TypesTypes of of individualsindividuals forfor GAGA
Bi04b_6
© Copyright W. Schreiner 2005
particular (special) fold of a protein
a particular journeyfor the travellingsalesman
a particularalignment of N „msa via GAs“sequences
EFB
C
DA
TypesTypes of of individualsindividuals forfor GA, GA, ctdctd..
4
Bi04b_7
© Copyright W. Schreiner 2005
vascular treesupplying N sitesof tissue
Preparatory Step 1 to implement a GA:
Recast the representation of individuals to strings or trees
TypesTypes of of individualsindividuals forfor GA, GA, ctdctd..
Bi04b_8
© Copyright W. Schreiner 2005
Fitness Fitness scoresscores forfor typestypes of of individualsindividuals
GENETIC OPERATIONSindividuals
Parent population
loop
Fitness score
3
1
4
5
6
2
offspring population
Fitness score
4
2
2
4
7
2
5
Bi04b_9
© Copyright W. Schreiner 2005
maximum load - weight of material
beginloopadd
end
beginaddmet
end
rate of successful perceptions, TP, FP, TN, FN
- Energy of protein
Problem to optimize Fitness Score, example
Fitness Fitness scoresscores forfor typestypes of of individualsindividuals, , ctdctd..
Bi04b_10
© Copyright W. Schreiner 2005
alignment score
- Length of journey
- Blood volume
EFB
C
DA
Preparatory Step 2 for implementing a GA:
Define algorithm to compute fitness score
Fitness Fitness scoresscores forfor typestypes of of individualsindividuals, , ctdctd..
6
Bi04b_11
© Copyright W. Schreiner 2005
GeneticGenetic operationsoperations forfor GAGA
GENETIC OPERATIONSindividuals
Parent population
loop
Fitness score
3
1
4
5
6
2
offspring population
Fitness score
4
2
2
4
7
2
Bi04b_12
© Copyright W. Schreiner 2005
Darwinian reproduction (copy operation):
Individuals with higher fitness (F) are stochasticallychosen more likely, e.g. via p ≅ 1-e-kF
best individuals are not necessarily chosenworst individual is not necessarily excludedA certain fraction of population undergoes reproduction(either exact or randomly selected)
sexual crossoverasexual Mutation
ACKBDF ACKBDF
GeneticGenetic operationsoperations
7
Bi04b_13
© Copyright W. Schreiner 2005
Darwinian reproduction (asexual copy operation)
sexual crossover
asexual mutation
from Brown (1999)
GeneticGenetic operationsoperations, , ctdctd..
high resolution
Bi04b_14
© Copyright W. Schreiner 2005
Darwinian reproduction (asexual copy operation)
sexual crossover
asexual mutation
GeneticGenetic operationsoperations, , ctdctd..
highresolution
8
Bi04b_15
© Copyright W. Schreiner 2005
Darwinian reproduction (asexual copy operation)
sexual crossover
asexual mutation
The predominant operation with GAsA certain fraction of individuals goes into „matingpool“ based on fitness. Or: tournament selection: matebest bull with best cow.Two parental individuals (strings, trees) are chosenbased on fitnessPick a point in the genome (the same for both parents) to become the recombinant joint
GeneticGenetic operationsoperations, , ctdctd..
Bi04b_16
© Copyright W. Schreiner 2005
Darwinian reproduction (asexual copy operation)
sexual crossover
asexual mutation
BranchBranch migrationmigration shiftsshifts recombinantrecombinant jointjoint
high resolution
9
Bi04b_17
© Copyright W. Schreiner 2005
for string representation:
pick position of recombinant joint stochastically between 1 and L-1 (L = length of genome representation string)
join recombinants
parential„string chromosomes“
offspring„string chromosomes“
sexual sexual crossovercrossover, , detailsdetails
father
mother 1 2 ... ... L-1 L
1 2 ... ... L-1 L
... L-1 L1 2 ...
... L-1 L1 2 ...
Bi04b_18
© Copyright W. Schreiner 2005
color, hair length in this example: inherited together (linked features)
color, # of legs in this example: inherited separately (independent)
sexual sexual crossovercrossover with with stringsstrings
red color long hair
A K B L Y A B L M M L L K AR T S S R R A A A L L B M N N L
A K B L Y A B L M M M A A KR T S S A A L A B B L M M N A L
B L L L B S A L M M L L K AA R T S R R A A A L L B M N N L
B L L L B S A L M M M A A KA R T S A A L A B B L M M N A L
gene 1 gene 2 gene 3 gene 4 gene 5 gene 6
3 legs
blue color short hair 4 legs
blue color
red color
short hair
long hair 4 legs
3 legs
10
Bi04b_19
© Copyright W. Schreiner 2005
features on genes close to each other: likely to be transmitted linkedto each other
features on genes lying far apart: more likely to be disruptedand transmitted independentlyfrom each other
sexual sexual crossovercrossover with with stringsstrings, , ctdctd..
red color long hair
A K B L Y A B L M M L L K AR T S S R R A A A L L B M N N L
A K B L Y A B L M M M A A KR T S S A A L A B B L M M N A L
B L L L B S A L M M L L K AA R T S R R A A A L L B M N N L
B L L L B S A L M M M A A KA R T S A A L A B B L M M N A L
gene 1 gene 2 gene 3 gene 4 gene 5 gene 6
3 legs
blue color short hair 4 legs
blue color
red color
short hair
long hair 4 legs
3 legs
Bi04b_20
© Copyright W. Schreiner 2005
From model we can see:features on genes close to each other: likely to be transmitted linked
to each other
features on genes lying far apart: more likely to be disruptedand transmitted independentlyfrom each other
For „real“ genetics:
reverse the argument to define gene-distance:
observe frequency for linked VS: independent transmittence of features
→ derive a measure of gene-distance (CM: centi Morgan) withingenome-maps (physical maps)
sexual sexual crossovercrossover with with stringsstrings, , ctdctd..
11
Bi04b_21
© Copyright W. Schreiner 2005
Gene Distance and Gene Gene Distance and Gene MapsMaps
high resolutionfrom Brown (1999)
Zwei Gene, die relativ eng benachbart auf einem Chromosom liegen, werden durch ein Crossing-over mit geringerer Wahrscheinlichkeit entkoppelt als solche, die weiter voneinander entfernt sind. Weiße Augen (w) und gelbe Körper (y) rekombinierendeshalb seltener als weiße Augen und kleine Flügel (m).
Sturtevants Karte für fünf Gene des X-Chromosoms von Drosophila. Abkürzungen: y, gelber Körper; w, weiße Augen; v, zinnoberrote Augenm, kleine Flügel; r, rudimentäre Flügel.
Bi04b_22
© Copyright W. Schreiner 2005
from Koza (1993)
GAsGAs & & CoCo--adaptedadapted Sets of genesSets of genes
• Genes close to each other on the chromosome are less likely to besaparated by a crossover. Therefore place things adjacent if they area good combination (e.g. long legs and long neck).
→ ability of the GA to solve the problem depends on this kind of choices.
• In nature: if cooperative beneficial features get together close (dueto crossover) they are from then on inherited more effectivelytogether (called „inversion“).
• General idea how GAs work: they generate coadapted pairs that tendto get commoted in the population
12
Bi04b_23
© Copyright W. Schreiner 2005
real genes usually stand for a feature (color) which isexpressed/ not expressed (but they don‘t toggle betweenfeatures, e.g. colors). GA-genomes normally toggle.
In addition to crossover - modeled after nature-in GAs many other artificial genetic operators may befreely designed to fit the representation and performspecifically suitable jobs in optimization (will be shown in examples)
CaveatsCaveats regardingregarding analogyanalogy to natural to natural crossovercrossover
Bi04b_24
© Copyright W. Schreiner 2005
VERY occasional - maybe 1 bit/character per generationChoose one parental string (asexual) based on fitness.Pick point from 1 to L (using a uniform randomdistribution)
Mutation is a localized search, changing one factor only!Similar to Monte Carlo Move in Gibbs sampling mode!Point 3 chosen and mutated
A L L M A A K
A L K M A A K
parent
offspring
from Koza (1993)
Mutation OperationMutation Operation
13
Bi04b_25
© Copyright W. Schreiner 2005
individual Generation 0 Mating pool for sexual crossover
Generation 1
genome fitness prob genome fitness genome fitness prob
1 011 3 .25 011 3 111 7 0.39
2 001 1 .08 110 6 010 2 0.11
3 110 6 .50 110 6 110 6 0.33
4 010 2 .17 010 2 011 3 0.17
Total 12 17 18
Worst 1 2 2
Average 3 4.25 4.5
Best
6
6
7
Selected
Select individuals of population for mating mating pool by chance, according to fitness
NOT selected
selected selected twice from Koza (1993)
GA GA exampleexample runrun (4 (4 individualsindividuals, , 33--dimensional dimensional optimizationoptimization))
Bi04b_26
© Copyright W. Schreiner 2005
individual Generation 0 Mating pool for sexual crossover
Generation 1
genome fitness prob genome fitness genome fitness prob
1 011 3 .25 011 3 111 7 0.39
2 001 1 .08 110 6 010 2 0.11
3 110 6 .50 110 6 110 6 0.33
4 010 2 .17 010 2 011 3 0.17
Total 12 17 18
Worst 1 2 2
Average 3 4.25 4.5
Best
6
6
7
Perform genomic operations on members of mating pool by chance, according to fitness
copy
mutation
crossover
from Koza (1993)
GA GA exampleexample runrun (4 (4 individualsindividuals, , 33--dimensional dimensional optimizationoptimization))
14
Bi04b_27
© Copyright W. Schreiner 2005
Creation of the initial random population (generation 0) (uniform distribution)Probabilistic selection of participant(s) for the genetic operation(unequal probabilities, based on fitness)Probabilistic selection of the type of operation (unequalprobabilities)Probabilistic selection of crossover or mutation point (equal orunequal probabilities)(Often) probabilistic selection of fitness cases (uniform distribution)In each run you get a different answer: A GA is a multi-run-algo
from Koza (1993)
GeneticGenetic AlgorithmsAlgorithms areare probabilisticprobabilistic
Bi04b_28
© Copyright W. Schreiner 2005
Problem areas involving many variables that areinterrelated in highly non-linear waysProblem areas involving many variables whose inter-relationship is not well understoodProblem areas where a good approximate solution issatisfactory (and no one is expecting a perfect solution)
designcontrolclassification, pattern recognition, image processingforecastingmodel building and data mining
from Koza (1993)
PromisingPromising GA GA ApplicationApplication AreasAreas
15
Bi04b_29
© Copyright W. Schreiner 2005
Problem areas where discovery of the size and shape of thesolution is a major part of the problemProblem areas where large computerized databases areaccumulating and computerized techniques are needed to analyze the data
genome and protein sequencessatellite dataastronomypetroleumfinancial databasesmarketing databasesWorld Wide Web
from Koza (1993)
PromisingPromising GA/GP GA/GP AreasAreas, , ctdctd..
Bi04b_30
© Copyright W. Schreiner 2005
Problem areas for which human find it very difficult to write good programs.................
from Koza (1993)
PromisingPromising GA/GP GA/GP ApplicationApplication AreasAreas, , ctdctd..
16
Bi04b_31
© Copyright W. Schreiner 2005
Initialisation
Evaluation
Breeding
End
1. create G0
2. evaluate the population of generation n (Gn)
3. if the population is stabilised then End
4. select the individuals to replace
5. evaluate the expected number of offspring (EO) for each individual (fitness based)
6. select the parent(s) from Gn
7. select the operator
8. generate the new child
9. keep or discard the new child in Gn+1
10. goto 6 until all the children have been successfully put into Gn+1
11. n = n+1
12. goto Evaluation
13. end
Multiple Multiple sequencesequence alignmentalignment byby GA GA ProgramProgram SAGASAGA((afterafter NotredameNotredame C, Higgins DG: SAGA: C, Higgins DG: SAGA: sequencesequence alignmentalignment byby geneticgenetic algorithmalgorithm. .
NucleicNucleic AcidsAcids Res. 1996;24:1515Res. 1996;24:1515--1524)1524)
Bi04b_32
© Copyright W. Schreiner 2005
generate population of 100 random alignments
each individual (alignment) contains only terminal gaps
chose random offset (e.g. between 0 and 50 for each sequence)
pad with leading and trailing gaps to make all sequences equally long.
Initialisation:
1. 1. InitialisationInitialisation in SAGAin SAGA
17
Bi04b_33
© Copyright W. Schreiner 2005
find length Lmax of longest sequence
choose unique length of allalignments LA > Lmax
for each sequence with length Li in alignment: toss for offset from equal probabilities between1 ≤ offset ≤ (LA-Li)+1
possible realization:---WGKVNVDEVGGEAL---WDKVNEEEVGGEAL---WGKVGAHAGEYGAEAL---WSKVGGHAGEYGAEAL
A1
--WGKVNVDEVGGEAL--WDKVNEEEVGGEAL-----WGKVGAHAGEYGAEAL---WSKVGGHAGEYGAEAL
A2
WGKVNVDEVGGEAL-------WDKVNEEEVGGEAL-WGKVGAHAGEYGAEAL---WSKVGGHAGEYGAEAL-
Am≈100
offset =1
offset =4
offset =2
InitialisationInitialisation, , exampleexample
Bi04b_34
© Copyright W. Schreiner 2005
1
2 1
1 ( )
1 . ( )−
= =
= =
= ∑∑N i
ij iji j
Fitness Alignment cost A
W cost A
Given a multiple alignment A of N sequences:
cost(Ai j) is computed from substitution matrices (PAMs,BLOSUMs) with affine gap penalities (variants exist for visà vis gaps)
cost of pairwise alignment
weight
2.2.--5. Evaluation in SAGA5. Evaluation in SAGA
18
Bi04b_35
© Copyright W. Schreiner 2005
choose the best 50% of alignments of Gn for direct copy operation
(i.e.
evaluate for each alignment in Gn the expected number of offspring (EO) based on fitness. Usually 0 ≤ EO ≤ 2
4
5
will make up half of the next generationthis is a method of overlapping generationsindividuals not selected for copy will be replaced)
2.2.--5. Evaluation in SAGA, 5. Evaluation in SAGA, ctdctd..
Bi04b_36
© Copyright W. Schreiner 2005
stochastically (proportional to EO, unequal probabilities) select parents from Gn for the mating pool
stochastically select an operator (operators are specifically designed for a chosen representation scheme, see below).
apply operator to (1 or 2) parent(s) and generate children
check children for duplicats within generation Gn+1: if duplicates occur then discard parents & children and repeat from item 6.
6
7
8
9
repeat until enough children (e.g. 50% of population) are generated
6.6.--9. 9. BreedingBreeding in SAGA in SAGA ((geneticgenetic OP OP otherother thanthan copycopy))
19
Bi04b_37
© Copyright W. Schreiner 2005
there is no theoretically proven criterium for convergence
heuristic stop, if no improvement found over the last 100 generations
3. 3. TerminationTermination condition in SAGA condition in SAGA
Bi04b_38
© Copyright W. Schreiner 2005
crossover (2 different types)gap insertionblock shufflingblock searchinglocal optimal or sub-optimal rearrangement
[ ]traditional operators: sexual crossoverasexual mutation
GA-operators specifically designed for multiple sequence alignments
2 modes of usage for each operator:
stochasticsemi hill-climbing
CloseupsCloseups on Operators in SAGA on Operators in SAGA
20
Bi04b_39
© Copyright W. Schreiner 2005
acts on 2 parent alignments (sexual)1st parent is cut straight at random position2nd parent is tailored to let pieces fit together2 different children may be producedspaces at junction are filled with gaps
at random (stochastic mode)keep child
with better score (semi hill climbing mode)(this) operator is both: crossover + mutationoperator may disrupt coherent parts of a sequence
One point One point crossovercrossover in SAGA in SAGA
Bi04b_40
© Copyright W. Schreiner 2005
WGKVN---VDEVGGEAL-WDKVNEEE---VGGEAL-WGKVG--AHAGEYGAEALWSKVGGHA--GEYGAEAL
--WGKVN---VDEVGGEAL-WD--KVNEEE---VGGEAL-WGKV--G--AHAGEYGAEALWSKV--GGHA--GEYGAEAL
WGKV--NVDEVG-GEALWDKV--NEEEVG-GEALWGKVGA-HAGEYGAEALWSKVGGHAGE-YGAEAL
--WGKVNVDEVG-GEALWD--KVNEEEVG-GEALWGKVGA-HAGEYGAEALWSKVGGHAGE-YGAEAL
Parent Alignment 1 Parent Alignment 2
Child Alignment 1 Child Alignment 2
Chosen Child Alignment
+
WGKV--NVDEVG-GEALWDKV--NEEEVG-GEALWGKVGA-HAGEYGAEALWSKVGGHAGE-YGAEAL
from Notredame (1996)
One point One point crossovercrossover in SAGA, in SAGA, ctdctd. .
21
Bi04b_41
© Copyright W. Schreiner 2005
acts on 2 parent alignments (sexual)designed after natural crossover(multiple) exchanges between parents are promoted between zones of homologyimplemented modes: stochastic or semi-hill-climbing
Uniform Uniform crossovercrossover in SAGA in SAGA
Bi04b_42
© Copyright W. Schreiner 2005
Parent Alignment 1
WG K VNVDEV-- G GEALWD K VNEEEV-- G GEALWG K VGAHAGEY G AEALWS K VGGHAGEY G AEAL
* *Parent Alignment 2
WG- K V--NVDEV G GE-ALW-D K V--NEEEV G G-EALW-G K VGAHAGEY G AEA-L-WS K VGGHAGEY G AEAL-
**
K * Position consistent
between the two parents
from Notredame (1996)
Uniform Uniform crossovercrossover in SAGA, in SAGA, ctdctd. .
Child Alignment 1 Child Alignment 2
WG K V--NVDEV G GEALWD K V--NEEEV G GEALWG K VGAHAGEY G AEALWS K VGGHAGEY G AEAL
**WG- K VNVDEV-- G GE-ALW-D K VNEEEV-- G G-EALW-G K VGAHAGEY G AEA-L-WS K VGGHAGEY G AEAL-
**
WGKV--NVDEVG-GEALWDKV--NEEEVG-GEALWGKVGA-HAGEYGAEALWSKVGGHAGE-YGAEAL
Chosen Child Alignment
+
22
Bi04b_43
© Copyright W. Schreiner 2005
acts on 1 parent alignment (asexual)split sequences into 2 groups (G1, G2)(ideally derived from phylogenetic tree of sequences)choose insertion point P1 randomlyinsert random number of gaps at P1 into all sequences ∈G1
chooe insertion point P2 randomlyinsert same # of gaps at P2 into all sequences ∈G2
Above stochastic mode of operator can be made semi hill climbing by selecting everything randomly as above, except for P1. Try all possible P1 and take best.
⇒ all sequences increase in length by chosen # of gaps
GapGap insertioninsertion operatoroperator in SAGA in SAGA
Bi04b_44
© Copyright W. Schreiner 2005
Insertion of the gaps in the parent alignment (stochastic mode)
WGKVNVDEVGGEA-GLWDKVNEEEVGGEA-GLWGKVGAHAGEYGAEALWSKVGGHAGEYGAEALWAKVEADVAGHGQDIL
seq1seq2seq3seq4seq5
P1
P2
WGKV--NVDEVGGEA-GLWDKV--NEEEVGGEA-GLWGKVGAHAGEYGAEAL--WSKVGGHAGEYGAEAL--WAKVEADVAGHGQDIL--
gaps in G1
gaps inG2
from Notredame (1996)
GapGap insertioninsertion operatoroperator in SAGA, in SAGA, ctdctd. .
G1
G2
23
Bi04b_45
© Copyright W. Schreiner 2005
Insertion of the gaps in the parent alignment (semi hill climbing mode)
WGKVNVDEVGGEA-GLWDKVNEEEVGGEA-GLWGKVGAHAGEYGAEALWSKVGGHAGEYGAEALWAKVEADVAGHGQDIL
seq1seq2seq3seq4seq5
„sliding P1“
P1
P2
For each possible position of P1 generate 1 child!
from Notredame (1996)
GapGap insertioninsertion operatoroperator in SAGA, in SAGA, ctdctd. .
G1
G2
sele
ct o
ptim
um c
hild
--WGKVNVDEVGGEA-GL--WDKVNEEEVGGEA-GLWGKVGAHAGEYGAEAL--WSKVGGHAGEYGAEAL--WAKVEADVAGHGQDIL--
...WGKVNVDEVGGEA-GL--WDKVNEEEVGGEA-GL--WGKVGAHAGEYGAEAL--WSKVGGHAGEYGAEAL--WAKVEADVAGHGQDIL--
W--GKVNVDEVGGEA-GLW--DKVNEEEVGGEA-GLWGKVGAHAGEYGAEAL--WSKVGGHAGEYGAEAL--WAKVEADVAGHGQDIL--
Bi04b_46
© Copyright W. Schreiner 2005
block definition, modified for shuffling operators:block = set of overlapping stretches of residues, each being delimited by a gap or by an end of sequence
WGKVN--VDEVGGEALWGKVGAHAGEYGAEALWDKV--NEEEVGGEALWSKVGGHAGEYGAEALWAKVEADVAGHGQDIL
WGKVN--VDEVGGEALWGKVGAHAGEYGAEALWDKV--NEEEVGGEALWSKVGGHAGEYGAEALWAKVEADVAGHGQDIL
from Notredame (1996)
Block Block shufflingshuffling in SAGAin SAGA
24
Bi04b_47
© Copyright W. Schreiner 2005
Shuffling type 1: Move a full block of gaps(or a full block of residues).
WGKVN--VDEVGGEALWGKVGAHAGEYGAEALWDKV--NEEEVGGEALWSKVGGHAGEYGAEALWAKVEADVAGHGQDIL
WGKV--NVDEVGGEALWGKVGAHAGEYGAEALWDK--VNEEEVGGEALWSKVGGHAGEYGAEALWAKVEADVAGHGQDIL
from Notredame (1996)
Block Block shufflingshuffling in SAGA, in SAGA, ctdctd..
Example shown: Move block of gaps to left by 1.
Bi04b_48
© Copyright W. Schreiner 2005
Shuffling type 2: Split the block horizontally and moveone of the sub blocks to the left or tothe right. The subdivision of a block ismade according to the tree (cf. gapinsertion operator).
WGKVN--VDEVGGEALWGKVGAHAGEYGAEALWDKV--NEEEVGGEALWSKVGGHAGEYGAEALWAKVEADVAGHGQDIL
from Notredame (1996)
Block Block shufflingshuffling in SAGA, in SAGA, ctdctd..
WGKV--NVDEVGGEALWGKVGAHAGEYGAEALWDKV--NEEEVGGEALWSKVGGHAGEYGAEALWAKVEADVAGHGQDIL
Example: Shift left by 1 for block in group G1Shift nothing for block in group G2
seq1seq2seq3seq4seq5
G2
G1
25
Bi04b_49
© Copyright W. Schreiner 2005
Shuffling type 3:Split the block vertically and move one half to the left orto the right.The move can be made stochastic or in a semi-hill climbing way, looking for the best position.
WGKVN--VDEVGGEALWGKVGAHAGEYGAEALWDKV--NEEEVGGEALWSKVGGHAGEYGAEALWAKVEADVAGHGQDIL
WGKVNV--DEVGGEALWGKVGAHAGEYGAEALWDKV-N-EEEVGGEALWSKVGGHAGEYGAEALWAKVEADVAGHGQDIL
from Notredame (1996)
Block Block shufflingshuffling in SAGA, in SAGA, ctdctd..
Bi04b_50
© Copyright W. Schreiner 2005
refer to conventional definition of block]select substring of (random length) in 1 sequencesearch all other sequences for best match to substringadd substring found to the old one to generate a profilein each of the other sequences find string best matching to profile and add it to profile. Search extends only over a window around profilemove strings inside sequences to reconstruct block
[
Block Block searchingsearching & & movingmoving operatoroperator, , ctdctd..
...Therefore we designed a crude method that ...
26
Bi04b_51
© Copyright W. Schreiner 2005
⌫
This block searching mutation generates more dramatic changes than any of the other operators.
No further description givenin original paper
Block Block searchingsearching & & movingmoving operatoroperator, , ctdctd..
...Therefore we designed a crude method that ...
„...Inspecting the above derivation one can easily see that ...“
Bi04b_52
© Copyright W. Schreiner 2005
optimize gaps inside a given blockexhaustive (all possibilities) examinationlocal alignment via genetic algorithm (LAGA)
LocalLocal optimal & suboptimal optimal & suboptimal rearrangementrearrangement
Authors suggest additional and very heuristic manipulations, such as:
27
Bi04b_53
© Copyright W. Schreiner 2005
22 operators in totalinitially each operator has probability 1/22computes running averages of efficiency for each operator based on improvement achievedinclude last operator, second last, etc. with decreasing weightsoperator usage probability: total improvement
number of children createdp = max (p, pmin), i.e. apply each operator at least with minimum usage probability
DynamicDynamic schedulingscheduling of of operatorsoperators in SAGAin SAGA
Bi04b_54
© Copyright W. Schreiner 2005
control chart to monitor usage
from Notredame (1996)
SelfSelf tuningtuning of of operatoroperator selectionselection in SAGAin SAGA
28
Bi04b_55
© Copyright W. Schreiner 2005
very satisfactory on test cases compared toother programs (e.g. CLUSTALW)satisfying and even superior to others when checked with alignments based on 3D-structure(golden Standard)the more sophisticated operators are essential, SAGA does not work with simple mutation &crossover!
Performance of SAGA, Performance of SAGA, SummarySummary