View
44
Download
0
Category
Preview:
DESCRIPTION
Space Efficient Alignment Algorithms and Affine Gap Penalties. Dr. Nancy Warter-Perez. Outline. Algorithm complexity Complexity of dynamic programming alignment algorithms Memory efficient algorithms Hirschberg’s Divide and Conquer algorithm Affine gap penalty. Algorithm Complexity. - PowerPoint PPT Presentation
Citation preview
Space Efficient Alignment Algorithms and Affine Gap Penalties
Dr. Nancy Warter-Perez
Space Efficient Alignment Algorithms 2
Outline Algorithm complexity Complexity of dynamic programming
alignment algorithms Memory efficient algorithms Hirschberg’s Divide and Conquer
algorithm Affine gap penalty
Space Efficient Alignment Algorithms 3
Algorithm Complexity Indicates the space and time (computational) efficiency
of a program Space complexity refers to how much memory is required to
execute the algorithm Time complexity refers to how long it will take to execute
(compute) the algorithm Generally written in Big-O notation
O represents the complexity (order) n represents the size of the data set
Examples O(n) – “order n”, linear complexity O(n2) – “order n squared”, quadratic complexity
Constants and lower orders ignored O(2n) = O(n) and O(n2 + n + 1) = O(n2)
Space Efficient Alignment Algorithms 4
Complexity of Dynamic Programming Algorithms for
Global/Local Alignment Time complexity – O(m*n)
For each cell in the score matrix, perform 3 operations
Compute Up, Left, and Diagonal scores O(3*m*n) = O(m*n)
Space complexity – O(m*n) Size of scoring matrix = m*n Size of trace back matrix = m*n O(2*m*n) = O(m*n)
Where, m and n are the lengths of the sequences being aligned. Since m n, O(n2 ) – quadratic complexity!
Space Efficient Alignment Algorithms 5
Memory Requirements For a sequence of 200-500 amino
acids or nucleotides O(n2) = 5002 = 250,000 If store each score as a 32-bit value =
4 bytes, it requires 1,000,000 bytes to represent the scoring matrix!
If store each trace back symbol as a character (8-bit value), it requires 250,000 bytes to represent the trace back matrix
Space Efficient Alignment Algorithms 6
Simple Improvement for Scoring Matrix In reality, the space complexity of
the scoring matrix is only linear, i.e., O(2*min(m,n)) = O(min(m,n)) O(min(m,n)) O(n) for sequences of
comparable lengths 2,000 bytes (instead of 1 million) But, trace back still quadratic space
complexity
Space Efficient Alignment Algorithms 7
Hirschberg’s “Divide and Conquer” Space Efficient Algorithm
Compute the score matrix(s) between the source (0,0) and (n, m/2). Save m/2 column of s. Compute the reverse score matrix (sreverse) between the sink (n, m) and (0,m/2). Save the m/2 column of sreverse.
Find middle (i, m/2) satisfies max 0 in {s(i, m/2) + sreverse(n-i, m/2)}
Recursively partition problem into 2 subproblems
middle
m/2 m(0,0)
(n,m)n
i
m/2 m(0,0)
(n,m)n
middle
m/2 m(0,0)
n
middle
middle
(n,m)
m(0,0)
(n,m)n
m(0,0)
n (n,m)
m(0,0)
n (n,m)
Source
Sink
Space Efficient Alignment Algorithms 8
Pseudo Code of Space-Efficient Alignment Algorithm
Path (source, sink)If source and sink are in consecutive columns
output the longest path from the source to the sink
Elsemiddle middle vertex between source and sinkPath (source, middle)Path (middle, sink)
Space Efficient Alignment Algorithms 9
Complexity of Space-Efficient Alignment Algorithm
Time complexity Equal to the sum of the areas of the rectangles
Area + ½ Area + ¼ Area + … 2*Areawhere, Area = n*m
O(2n*m) = O(n*m) Quadratic time/computation complexity (same as before)
Space complexity Need to save a column of s and sreverse for each computation
(but can discard after computing middle) O(min(n,m)) – if m < n, switch the sequences (or save a row
of s and sreverse instead) Linear space complexity!!
Reference: http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Dynamic/Hirsch/
Space Efficient Alignment Algorithms 10
Gap Penalties Gap penalties account for the introduction
of a gap - on the evolutionary model, an insertion or deletion mutation - in both nucleotide and protein sequences, and therefore the penalty values should be proportional to the expected rate of such mutations.
http://en.wikipedia.org/wiki/Sequence_alignment#Assessment_of_significance
Space Efficient Alignment Algorithms 11
Space Efficient Alignment Algorithms 12
Source: http://www.apl.jhu.edu/~przytyck/Lect03_2005.pdf
Space Efficient Alignment Algorithms 13
Space Efficient Alignment Algorithms 14
Space Efficient Alignment Algorithms 15
Space Efficient Alignment Algorithms 16
Space Efficient Alignment Algorithms 17
Space Efficient Alignment Algorithms 18
Space Efficient Alignment Algorithms 19
Project Verification - Use EMBOSS Pairwise Alignment Tool http://www.ebi.ac.uk/Tools/emboss/align/index.html
Space Efficient Alignment Algorithms 20
Project Verification – LALIGNhttp://www.ch.embnet.org/software/LALIGN_form.html
Recommended