Upload
others
View
7
Download
2
Embed Size (px)
Citation preview
Data structures and complexity
Complexityn Computational complexity refers to how much computing
is required to solve different problems. n Spatial complexity refers to how much memory is
required to solve different problems.n Chose the right algorithm and the right data structure
and your code could run in seconds. Chose the wrong algorithm or the wrong data structure and your code could run for days.
Search: I’m thinking of a word...n Given a finite list of words, how do you find out which
one I’m think of ?n Ground rules
n The word has to be in a dictionary, e.g. a dictionary with 60,000 words.n You can only ask me questions with YES/NO answers
n Sequential searchn Inspect every element and check to see if it’s the one you are
looking for. Amount of “effort” is proportional to the length of the list, e.g. worst case: 60,000 questions, average case 30,000 questions.
n Binary searchn Ask questions that reduce the number of possibilities in half.
Amount of effort is proportional to the logarithm (base 2) of the length of the list, i.e. about 16 questions for 60,000 words.
Binary search
60,000
30,000 30,000
In 1st ½ of set? yes no
15,000 15,000
yes no
15,000 15,000
yes noIn 1st ½ of subset?
In 1st ½ of subset?
In 1st ½ of subset?
In 1st ½ of subset?
In 1st ½ of subset?
~16 questions need to get down to a subset with 1 word
Queuesn A queue stores items in FIFO (first-in first-out) order.n It returns them in the same order that they are entered,
like a line of people at a cashier.n Useful for letting one chunk of code collect (or generate)
items to be processes, while a separate chunk of code does the actual processing.n mouse clicksn internet TCP/IP packets
n Terminologyn Enqueue -- get in linen Dequeue -- get out of line (reach the cashier)
Stacksn A queue stores items in LIFO (last-in first-out)
order.n It returns them in the same order that plates are
stacked in a cafeteria.n Useful when operations need to be broken down
into sub-operations that are executed in sequence (especially recursive operations).n file search in file systemn parsing
n Terminologyn push -- put a plate on the stackn pop -- remove a plate from the stack
Stacks: examplen Infix notation: ((1+2)*4)+3n Postfix notation: 1 ,2, +, 4, *, 3, +n Evaluate postfix expressions with a stack
n 1. if operand, push onto stackn 2. if operator, pop, pop, evaluate, push result
Input Operations Stack
1 Push (1)2 Push (2,1)+ Pop,Pop,Add,Push (3)4 Push (4,3)* Pop,Pop,Mul,Push (12)3 Push (3,12)+ Pop,pop,Add,Push (15)
More complex data structures
Recursive data structuresn Example: Binary trees
value
left right
value
left right
value
left right
value
left right
value
left right
value
left right
value
left right
root
branches
leaves
Creating a new noden Represent each node by a hash with three�
keys: ‘LEFT’, ‘RIGHT’, and ‘VALUE’;n The ‘VALUE’ will contain the content of the noden The values of ‘LEFT’ and ‘RIGHT’ are references to the child
nodes (i.e. more hashes).n Here is a subroutine that returns a reference to node data
structure. The argument of the subroutine is the value
value
left right
sub newNode {return {
'VALUE' => shift, 'LEFT' => undef, 'RIGHT' => undef
};
}
Attaching a node
value
left right
$root_ref->{LEFT} = $someNode_ref;
value
left right
$root_ref
$someNode_ref
Trees: in-order traversal
traverse($theTree);
sub traverse { my($tree) = @_;
if(!defined($tree)){return undef }; # if no node traverse($tree->{LEFT});
processTheNode($tree->{VALUE}); # e.g. print value traverse($tree->{RIGHT});
}
Trees: insertionsub insert { # -- recursively builds the tree my($tree, $val) = @_; if(!$tree) { # no node exists so create one $_[0] = newNode($val); return; } else { # a node exists, so insert if($tree->{VALUE}>$val)
{insert($tree->{LEFT},$val)} elsif($tree->{VALUE}<$val)
{insert($tree->{RIGHT},$val)} else
{ warn "dup insert of $val\n" if 0 } }}
Remindersn Code examples in Readonly directory on Pinedalabn Project coming up...
Complexity
Example: Searchn Given a list of ordered values how do we find
one? e.g.n Numbers in a listn Words in a dictionary
n The complexity depends on the data structure used to represent the set of objects and on the algorithm used to process the data structure
Big-O notationn Big-O notation is way to express the asymptotic time-
complexity of a computer algorithm.
n O(1) constantn O(log(n)) logarithmicn O(n) linearn O(n2) quadraticn O(nc) polynomialn O(cn) exponential
Linear search
n Represent the set of objects as a list and then sequentially search the listn Space complexity is proportional to the number
of objects.n Time complexity proportional to the number of
objects, i.e. O(n).
Binary Search
n Represent the set of objects as a binary tree and sequentially search the listn Space complexity is proportional to the number
of objects.n Time complexity ?
Binary searchn Represent the set of objects as a binary tree and
search the tree�
sub lookup { my($tree, $value) = @_;
if(!$tree) { return; }
elsif ($tree->{VALUE} == $value) { return $tree;}
elsif($value < $tree->{VALUE} ){return lookup($tree->{LEFT}, $value)}
else{return lookup($tree->{RIGHT},$value)}
}
Binary searchn The search time depends on how deeply in the
tree you have to go to find the objectn The depth of the tree depends on how it was
constructedn Worst case: Input was presorted�
depth = n, complexity: O(n)n Best case: Tree is balanced depth = log(n), complexity: O(log(n))n If input is random then it can be shown that �
depth = nlog(n), complexity: O(nlog(n))n (Average case)
Hash table searchn Calculate a number from the key
n Performed by a hash function n Use the number to index into an arrayn If more than one key hashes to the same index �
(a collision)n Maintain a list of keys that resolve to the same hash
value
NP-Completenessn A problem is tractable if some algorithm exists that
always solves the problem in a time that is proportional to some power of the length of the input. Such problems are said to be solvable in polynomial time. (Of course if the power is 50, then the problem is practically intractable).
n A problem with no polynomial time algorithm is said to be intractable.
n In the 1970’s the class of NP-complete problems was defined. NP stands for Nondeterministic polynomial. This class of problems have no known polynomial time algorithms.
Salient properties of NP-complete problems
n No NP-complete problem has been proven to be solvable in polynomial time.
n No NP-complete problem has been proven to be unsolvable in polynomial time.
n All NP-complete problems are computationally equivalent in the following sense:n If any polynomial-time algorithm can be found to solve any NP-
complete problem, then every NP-complete problem can be solved by some polynomial-time algorithm.
n Since so many computer scientists and mathematicians have tried unsuccessfully to solve so many NP-complete problems, no one believes that polynomial-time algorithms exist for NP-hard problems -- but this hasn’t been proven either.
The Harsh realities of life: !Most problems of interest in
bioinformatics and computational biology are NP-complete