Upload
patkar-college
View
215
Download
0
Embed Size (px)
Citation preview
8/3/2019 GA by Prashant & Vivek
1/23
New Approach for Classification Of
Protein By Genetic Algorithm
By
Prashant Makwana And
Vivek Soni
1
8/3/2019 GA by Prashant & Vivek
2/23
Introductiony WhyWe need Classification?
Huge amount of Protein Data and still increasing
To know Structure And Function
Database Management(DBMS) purpose
2
8/3/2019 GA by Prashant & Vivek
3/23
What is GA?yA class of probabilistic optimization algorithms
y Inspired by the biological evolution process
y Uses concepts ofNatural Selection and Genetic
Inheritance
(Darwin 1859)
y Originally developed by John Holland (1975)
3
8/3/2019 GA by Prashant & Vivek
4/23
Conty Definition:
class of Evolutionary Algorithm(EA), which generate
useful solutions to optimization problems usingtechniques inspired by natural evolution, such asinheritance, mutation, selection, and crossover.
y Genetic algorithms find application in bioinformatics,phylogenetics, computational science, economics,manufacturing, mathematics, physics and other fields.
4
8/3/2019 GA by Prashant & Vivek
5/23
Classes of Search TechniquesSearch Techniques
Calculus BaseTechniques
Guided random searchtechniques
EnumerativeTechniques
BFSDFS DynamicProgramming
Tabu Search Hill
Climbing
Simulated
Annealing
EvolutionaryAlgorithms
GeneticProgramming
GeneticAlgorithms
Fibonacci Sort
5
8/3/2019 GA by Prashant & Vivek
6/23
GA Overview
y GA terms
Population of string(Chromosomes/genotypes ofgenome)
Fitness function
y Initialization
y Selection
y Reproduction
y Termination
6
8/3/2019 GA by Prashant & Vivek
7/23
Cont..y Initialization
Random process of selecting population/chromosome
y Selection
a proportion of the existing population is selected to breeda new generation
Based on Fitness Function/Fitness
y Reproduction
The next step is to generate a second generation populationof solutions from those selected through genetic operators:Crossover (also called recombination), and/or Mutation.
7
8/3/2019 GA by Prashant & Vivek
8/23
Conty Termination
This generational process is repeated until a termination
condition has been reached. Common terminatingconditions are:1. A solution is found that satisfies minimum criteria2. Fixed number of generations reached3. Allocated budget (computation time/money) reached
4. The highest ranking solution's fitness is reaching or hasreached a plateau such that successive iterations nolonger produce better results
5. Manual inspection6. Combinations of the above
8
8/3/2019 GA by Prashant & Vivek
9/23
GA procedure1. Choose the initial population of individuals
2. Evaluate the fitness of each individual in thatpopulation
3. Repeat on this generation until termination (timelimit, sufficient fitness achieved, etc.):
y Select the best-fit individuals for reproduction
y Breed new individuals through crossover and mutationoperations to give birth to offspring
y Evaluate the individual fitness of new individuals
y Replace least-fit population with new individuals9
8/3/2019 GA by Prashant & Vivek
10/23
Our Worky Protein
Olfactory receptor
y OrganismHuman
y Database
Uniprot
y Datasets
5 sets each of 50 fasta sequences
10
8/3/2019 GA by Prashant & Vivek
11/23
Cont1. Find Conserved Domains for each protein sequence
using prosite
2. Multiple sequence alignment(MSA) for each set3. Develop PSSM matrix for each set
4. Use GA operators: Mutation & Crossover
5. Generate new sets by GA operators
6. Calculate Determinant for each set7. Calculate fitness for each set
8. Repeat steps 4,5,6,7 till termination
11
8/3/2019 GA by Prashant & Vivek
12/23
new set sequencefitness
Conty Final constant values(Fitness) which then compared
with a new set of sequence fitness to check for family
criteriaSet 1
Set 2
Set 3
Set n
Final Constant Values range for
given protein family
New sequencebelongs to givenprotein family
12
8/3/2019 GA by Prashant & Vivek
13/23
Reason for each stepsy Use of Prosite
to determine conserved domain for given sequence
Conserved domain selected as a parameter forclassification
y Use of PSSM
To obtain numerical values for given sets, whichfurther used for GA
y Use of mutation & crossover
to check for maximum possible chances of variationsfor given sets
13
8/3/2019 GA by Prashant & Vivek
14/23
Why GA/Advantage of GA?y It solves problems with multiple solutions.
y Genetic algorithm is a method which is very easy tounderstand and it practically does not demand theknowledge of mathematics.
y Genetic algorithms are easily transferred to existingsimulations and models.
14
8/3/2019 GA by Prashant & Vivek
15/23
Disadvantagey Certain optimization problems cannot be solved by
means of genetic algorithms. This occurs due to poorly
known fitness functions .
yAnother drawback that GAs require large number ofresponse (fitness) function evaluations depending on
the number of individuals and the number ofgenerations.
y GA is usually slower than traditional techniques.15
8/3/2019 GA by Prashant & Vivek
16/23
Future of GA in Bioinformaticsy Future approach for our work will be to consider more
parameter for protein classification .
Protein structure prediction by combining GA withother algorithm
y Immune system models: GAs have been used to modelvarious aspects of the natural immune system,including somatic mutation during an individualslifetime and the discovery of multi-gene familiesduring evolutionary time.
16
8/3/2019 GA by Prashant & Vivek
17/23
Conty Ecological models: GAs have been used to model
ecological phenomena such as host-parasite co-evolutions, symbiosis.
17
8/3/2019 GA by Prashant & Vivek
18/23
Conclusiony
18
8/3/2019 GA by Prashant & Vivek
19/23
Referencey jin-xiong : Essential bioinformatics
y http://en.wikipedia.org/wiki/Genetic_algorithm
y http://www.informatics.indiana.edu/fil/CAS/PPT/Davis/sld001.htm
y http://fasta.bioch.virginia.edu/fasta_www2/chaps.cgi
19
8/3/2019 GA by Prashant & Vivek
20/23
Uniprot(Oflactory Receptor)
20
8/3/2019 GA by Prashant & Vivek
21/23
CHAPS
21
8/3/2019 GA by Prashant & Vivek
22/23
PSSM
22
8/3/2019 GA by Prashant & Vivek
23/23
23