Searching Chapter 7. Objectives Introduce sequential search. – Calculate the computational complexity of a successful search. Introduce binary search

Searching

Chapter 7

Objectives

• Introduce sequential search.– Calculate the computational complexity of a

successful search.

• Introduce binary search.– 4 different versions.– Calculate the computational complexity.

• Discuss comparison trees and how they can be used to analyze algorithm performance.– Internal path length– External path length– Average path length

Homework Overview

• Written (max 40 points)– 7.2 E3 (4 pts)– 7.3 E1 (a, b, c, d) (2 pts each)– 7.4 E1 (a, b, c, d) (3 pts each)– 7.4 E2 (5 pts)– 7.4 E3 (10 pts)– 7.6 E1 (a, b, c, d) (2 pts each)– 7.6 E2 (6 pts)– 7.6 E5 (a, b, c, d, e, f, g, h) (1 pt each)– 7.6 E6 (a, b, c, d) (2 pts each)

• Programming (max 20 points)– 7.2 E4 (8 pts)– 7.2 P2 (12 pts)– 7.4 P1 (15 pts)

Searching

• A very common problem in computer science is trying to find a particular data entry.

• There are two main strategies.– Use a general storage type and then produce and algorithm to

search within that type.– Design special storage types that make searching more

efficient.

• In general we assume each entry has a key.– Name– ID number– Value– etc.

• We search the entries until we find the desired key.

Sequential Search

• If there is no organizational structure to the data then the only real strategy is a sequential search.– We start at one end of the list and examine

each key in turn.

for (position = 0; position < size; position++){the_list.retrieve(position, data);if (data == target) return success;

}return not_present;

Complexity of Sequential Search

• To determine the computational cost of doing this search we count how many times some representative operation occurs.– We will choose to count the number of comparisons.– For some data types, comparisons may be very expensive

• For example comparing long strings.

– How many times is the == operator used?

• The answer depends on where (or if) the target key is stored in the list.– We could get lucky and it the target on the first comparison.– We could find the key on the last comparison.– If the key is not in the list we will need to look at every entry

to make sure.

Complexity of Sequential Search

• Let’s assume we know the key is in the list so the search will be successful.– Let’s also assume the key has an equal probability of

being in any location in the list.

• Let n be the number of entries in the list.• We could find the desired key after 1, 2, 3, …, n

comparisons all with equal probability.• The average search time is:

Key Class

• In order to count the number of comparisons in the run of an actual program, it is helpful to create a custom key class.

• This class will represent the key (any data type) but more importantly it will allow us to overload the comparison operators.

• The class will contain a static variable that will count the number of comparisons.

• Each time a comparison is made, the overloaded operator will add one to the comparison count.

• We can examine the comparison count at the end of the program.

Key Class Definition

class Key {int key;

public:static int comparisons;Key (int x = 0);int the_key() const;

};

bool operator == (const Key &x, const Key &y);bool operator > (const Key &x, const Key &y);bool operator < (const Key &x, const Key &y);bool operator >= (const Key &x, const Key &y);bool operator <= (const Key &x, const Key &y);bool operator != (const Key &x, const Key &y);

int Key::comparisons = 0;

• Note the static variable is assigned its initial value outside of any function.– It is accessed using the class name and the scope

resolution operator.

Key Class Methods

• The constructor and accessor methods are simple.Key::Key(int x){

key = x;}

int Key::the_key() const{

return key;}

• The operators are also simple.– They use the accessor method and the default comparison to

do their job.– They increment the comparison count.– They are all similar to the following.

bool operator == (const Key &x, const Key &y){

Key::comparisons++;return x.the_key() == y.the_key();

}

Sequential Search Testing Program

• Now that we have a key class that can count comparisons for us, we can write a program to test sequential search.

• We will generate a list of odd entries from a known range of values.

• We will repeatedly select a random value that we know is in the list and search for it.– Computing the average number of comparisons over a large

number of runs.– We will also calculate the run time for these searches.

• We will then repeatedly select a random entry we know is not in the list and search for it.– Computing the average number of comparisons over a large

number of runs.– We will also calculate the run time for these searches.

Random, Timer and List

• To generate the random numbers we will use the Random class defined in Appendix B of the book.

• To calculate the run times we will use the Timer class defined in Appendix C of the book.

• We have used similar code before so we won’t go into the details here.

• Finally, we will use one of the list packages (all should work) that we developed in the last chapter.– You will need to add not_present to the

enumeration of the return values.

Main Function

• First, we will just be storing Keys in the list.typedef Key Record;

• The main function asks the user for details, creates the list and then call test_search.

int main(){

int items, searches;List<Record> the_list;Key::comparisons = 0;cout << "How many items should be stored in the list? " << flush;cin >> items;if (items < 0) {

cout << "Error: the number of items must be nonnegative." << endl;exit(1);

}cout << "How many searches should be performed? " << flush;cin >> searches;if (searches <= 0) {

cout << "Error: the number of searches must be positive." << endl;exit(1);

}for (int i = 0; i < items; i++)

the_list.insert(i, 2 * i + 1);test_search(searches, the_list);

}

test_search Functionvoid test_search(int searches, List<Record> &the_list)/* Pre: The number searches is a positive integer and the List the_list has been

filled some number of integers. Post: Statistics are printed about the performance of searching algorithms when

the searched for key is present in the list and when it is absent. Uses: The List class, the Random number class, the Key class, the Timer class,

and the function sequential_search. */{

int list_size = the_list.size();if (searches <= 0 || list_size < 0){ cout << " Exiting test: " << endl

<< " The number of searches must be positive." << endl << " The number of list entries must exceed 0." <<

endl; return; }

int i, target, found_at;Key::comparisons = 0;Random number;Timer clock;for (i = 0; i < searches; i++){ target = 2 * number.random_integer(0, list_size - 1) + 1;

if (sequential_search(the_list, target, found_at) == not_present)

cout << "Error: Failed to find expected target " << target << endl;

}

test_search Functionprint_out("Successful", clock.elapsed_time(), Key::comparisons,

searches);Key::comparisons = 0;clock.reset();for (i = 0; i < searches; i++){

target = 2 * number.random_integer(0, list_size);if (sequential_search(the_list, target, found_at) == success) cout << "Error: Found unexpected target " << target

<< " at " << found_at << endl;}print_out("Unsuccessful", clock.elapsed_time(), Key::comparisons,

searches);}

Sequential Search FunctionError_code sequential_search(const List<Record> &the_list, const Key &target, int &position)/* Post: If an entry in the_list has key equal to target, the return success and

the output parameter position locates such an entry within the list.

Otherwise return not_present and position becomes invalid. */{

int s = the_list.size();for (position = 0; position < s; position++){

Record data;the_list.retrieve(position, data);if (data == target) return success;


}

print_out Function

void print_out(char *search, double time, int comparisons, int searches)/* Pre: search is a string describing a search. Post: Statistics about the search are printed out. */{

cout << "The search " << search << " took " << time << " seconds and " << comparisons << " comparisons to make " << searches << "

searches." << endl;

cout << "This results in an average search time of " << time / searches<< " and an average number of comparisons of " << comparisons /

searches << "." << endl;

}

Sample Output

• Here is a sample of the output of the testing program.How many items should be stored in the list? 1000How many searches should be performed? 100The search Successful took 0.002343 seconds and 49474 comparisons to make 100 searches.This results in an average search time of 2.343e-05 and an average number of comparisons of 494.The search Unsuccessful took 0.004342 seconds and 100000 comparisons to make 100 searches.This results in an average search time of 4.342e-05 and an average number of comparisons of 1000.

• We expected the average number of comparisons to be 1001/2 = 500.5 which is slightly different than the actual result.

• The time for a successful search was on average about half the time for an unsuccessful search.

Computational Complexity

• For successful searches:– The average number of comparisons is approximately half of the number of items

in the list.

• For both successful and unsuccessful searches:– When the number of entries (and the number of comparisons) increases by a

factor of 10, the run time increases by a little less than 10 times.– Both the run time and the number of comparisons are linear functions of the

number of items.– There are a lot of details here that depend on the computer, compiler, language,

programmer skill, etc.– We use a shorthand notation, O(n), to say the runtime is a linear function of the list

size.

Successful Searches Unsuccessful Searches

n Ave. comp. Ave. time Ave. comp. Ave. time

10 5 3.70e-07 10 5.50e-07

100 49 2.46e-06 100 4.96e-06

1000 494 2.34e-05 1000 4.34e-0510000 4942 0.00018757 10000 0.00033711100000 49425 0.00126353 100000 0.00294095

Homework – Section 7.2 (page 276)

• Written– E3 (written on paper) (4 pts)

• Programming– E4 (email code) (8 pts)– P2 (email code and written report) (12 pts)

Binary Search

• If the list is ordered (say from the smallest key to the largest key) then we can do much better than sequential search.

• With binary search we can divide the list in two and eliminate the half that we know does not contain the desired key.

• We divide the list in half.

• Since the keys are ordered and the desired key is larger than the mid key we know that the desired entry (if it exists) is in the top half of the list.

Binary Search

• We have cut the size of the problem in half with one comparison!

• We can repeat the problem, resetting bottom and top to indicate the part of the list that still might contain the desired key.

Binary Search Termination

• There are several options when implementing the binary search algorithm.

• First, when do we terminate?• There are two options for terminating the

division:1. Stupid condition: We have a list with one entry

(top == bottom).• With this method we might keep going after we have

“found” the target key.

2. Clever condition: We have a list with one entry or we find the key (top == bottom || data == target).• This has the penalty of an extra comparison.

Recursion

• We can also implement the binary search recursively or iteratively.

• This leaves us with 4 possible solutions:

• Which method is fastest and has the least number of comparisons?– Let’s implement them all and test them just

the way we did with the sequential search.

Recursive, Stupid Recursive, Clever

Iterative, Stupid Iterative, Clever

Ordered Lists

• To enforce the fact that our list must be ordered, we will create an extension of the list class.

• The new class will be called Ordered_list and will overload (replace) the insert and replace methods.– The new versions will ensure that the list is

always in sorted order.

Ordered List Classclass Ordered_list: public List<Record>{public: Ordered_list(); /* Post: The Ordered_list is initialized to be empty. */ Error_code insert(const Record &data); /* Post: If the Ordered_list is not full, the function succeeds: the Record

data is inserted into the list following the last entry of the list with a strictly lesser key (or in the first position if no element has a lesser key).

Else: the function fails with the diagnostic Error_code overflow. */ Error_code insert(int position, const Record &data); /* Post: If the Ordered_list is not full, 0 <= position <= n, where n is the

number of elements in the list, and the Record data can be inserted at position in the list, without disturbing the list order, then the function succeeds: Any enry formerly in position and all later entries have their position numbers increased by 1 and data is inserted at position of the List.

Else: the function fails with a diagnostic Error_code. */ Error_code replace (int position, const Record &data); /* Post: If the entry at position can be replaced with data without disturbing

the list order, then the function succeeds and the entry is replaced. Else: the function fails with a diagnostic Error_code. */};

Ordered List Methods

Ordered_list::Ordered_list()/* Post: The Ordered_list is initialized to be empty. */{ count = 0;}

Error_code Ordered_list::insert(const Record &data)/* Post: If the Ordered_list is not full, the function succeeds: the Record

data is inserted into the list following the last entry of the list with

a strictly lesser key (or in the first position if no element has a lesser key).

Else: the function fails with the diagnostic Error_code overflow.*/{ int s = size(); int position; for (position = 0; position < s; position++){ Record list_data; retrieve(position, list_data); if (data >= list_data) break; } return List<Record>::insert(position, data);}

Ordered List MethodsError_code Ordered_list::insert(int position, const Record &data)/* Post: If the Ordered_list is not full, 0 <= position <= n, where n is the number of elements in the list, and the Record data can be inserted at position

in the list, without disturbing the list order, then the function succeeds: Any enry formerly in position and all later entries have their position numbers increased by 1 and data is inserted at position of the List.

Else: the function fails with a diagnostic Error_code. */{ Record list_data; if (position > 0){ retrieve(position - 1, list_data); if (data < list_data) return fail; } if (position < size()){ retrieve(position, list_data); if (data > list_data) return fail; } return List<Record>::insert(position, data);}

Ordered List Methods

Error_code Ordered_list::replace (int position, const Record &data)/* Post: If the entry at position can be replaced with data without

disturbing the list order, then the function succeeds and the entry is replaced.

Else: the function fails with a diagnostic Error_code. */{ Record list_data; if (position > 0){ retrieve(position - 1, list_data); if (data < list_data) return fail; } if (position < size()){ retrieve(position + 1, list_data); if (data > list_data) return fail; } return List<Record>::replace(position, data);}

Binary Search Algorithm

• Binary search is famous for being coded incorrectly – be careful.

• We need to carefully define our variables:– top and bottom will be indices enclosing the part of the list in

which we are searching for the target key.

• At each step we will reduce the region between top and bottom by about half.

• The following is our loop invariant:– The target key, provided it is present in the list will be found

between the indices bottom and top inclusive.

• We will start with the following values:– bottom = 0– top = list.size() – 1

Binary Search Algorithm

• To actually do the searching we calculate the midpoint in the list

mid=(bottom + top)/2• We will compare the target key to the key at position mid.

– If the target key is greater than the key at position mid then the target can only lie in the top half of the list.• bottom = mid + 1.

– If the target key is less than or equal to the key at position mid then the target can only lie in the bottom half of the list.• top = mid.

• This process repeats until top <= bottom.– Alternatively we could also terminate when the target key ==

the key at position mid.

• The process can be either iterative or recursive.

Stupid Recursive Algorithm

Simplification Step: If the target key > the key at position mid then repeat the problem with

bottom = mid + 1

Otherwise repeat withtop = mid

Base Case: if top <= bottom then the list is has at most one entry. Check this entry to see if it is the target.

Stupid Recursive VersionError_code recursive_binary_1(const Ordered_list &the_list, const Key &target, int bottom, int top, int &position)/* Pre: The indices bottom to top define the range to search for the target. Post: If a Record in the range from bottom to top in the_list has key equal to

target, then position locates one such entry and success is returned. Otherwise, not_present is returned and position becomes undefined.{

Record data;if (bottom < top){ // List has more than one entry. int mid = (bottom + top) / 2;

the_list.retrieve(mid, data);if (data < target) // Reduce to top half of the list. return recursive_binary_1(the_list, target, mid + 1, top,

position);else // Reduce to bottom half of the list. return recursive_binary_1(the_list, target, bottom, mid,

position);}else if (top < bottom) return not_present; // List is empty.else {

position = bottom;the_list.retrieve(bottom, data);if (data == target) return success;else return not_present;

}}

Stupid Recursive Version

• So that a user of this algorithm can call it like any other sorting algorithm we introduce a simple function to arrange the parameters into the correct format for the recursion.

Error_code run_recursive_binary_1(const Ordered_list &the_list, const Key &target, int &position)

/* Post: If a Record in the_list has key equal to target, then position locates one such entry and a code of success is returned. Otherwise, the Error_code of not_present is returned and position becomes undefined.

Uses: recursive_binary_1 and methods of the classes Ordered_list and Record. */{

return recursive_binary_1(the_list, target, 0, the_list.size() - 1, position);}

Stupid Iterative Algorithm

• Since the recursion is tail recursion, it is fairly simple to write an iterative version of the same algorithm.

• In this case we do not need a special function just to set up the correct parameters.

Stupid Iterative VersionError_code binary_search_1(const Ordered_list &the_list, const Key &target, int &position)/* Post: If a Record in the_list has key equal to target, then position

locates one such entry and a code of success is returned. Otherwise, the Error_code of not_present is returned and position becomes undefined.

Uses: Methods of the classes Ordered_list and Record. */{

Record data;int bottom = 0, top = the_list.size() - 1;while(bottom < top){

int mid = (bottom + top) / 2;the_list.retrieve(mid, data);if (data < target)

bottom = mid + 1;else

top = mid;}if (top < bottom) return not_present;else{

position = bottom;the_list.retrieve(bottom, data);if (data == target) return success;else return not_present;

}}

Clever Versions

• If we check to see in the target key == the key at position mid then we might get lucky and get to quit early.

• The modifications to the code are fairly simple.

• Will the possibility of quitting early be worth the extra comparison at each step?– We will run an experiment to see.

Clever Recursive VersionError_code recursive_binary_2(const Ordered_list &the_list, const Key &target, int bottom, int top, int &position)/* Pre: The indices bottom to top define the range in the list to search for the

target. Post: If a Record in the range of locations from bottom to top in the_list has key equal to target, then position locates one such entry and a code of

success is returned. Otherwise, the Error_code of not_present is returned

and position becomes undefined. Uses: recursive_binary_2 and methods of the classes Ordered_list and Record. */{

Record data;if (bottom <= top){ int mid = (bottom + top) / 2;

the_list.retrieve(mid, data);if (data == target){

position = mid;return success;

}else if (data < target) // Reduce to top half of the list. return recursive_binary_2(the_list, target, mid + 1, top,

position);else // Reduce to bottom half of the list. return recursive_binary_2(the_list, target, bottom, mid - 1,

position);}else return not_present; // List is empty.

}

Clever Iterative Version

Error_code binary_search_2(const Ordered_list &the_list, const Key &target, int &position)/* Post: If a Record in the_list has key equal to target, then position

locates one such entry and a code of success is returned. Otherwise, the Error_code of not_present is returned and position becomes undefined.

Uses: Methods of the classes Ordered_list and Record. */{

Record data;int bottom = 0, top = the_list.size() - 1;while(bottom <= top){

position = (bottom + top) / 2;the_list.retrieve(position, data);if (data == target) return success;if (data < target)bottom = position + 1;elsetop = position - 1;


}

Modify Main

• The main function we used to test the sequential search can be modified in a fairly obvious manner to test these 4 different versions of binary search.

Comparison of Methods

• First, let’s look at successful searches.

• The clever versions are faster for short lists, but as the lists get longer, eventually the stupid versions win.– If comparisons were more expensive, the stupid version would be the clear

winner.

• Iterative versions are generally a little faster than the recursive versions.

• All of these are much faster than sequential search for long lists.

Sequential Stupid, recursive Stupid, iterativeClever,recursive

Clever,iterative

nComp Time Comp Time Comp Time Comp Time Comp Time

105 3.70e-07 4 4e-07 4 3.5e-07 4 3e-07 4 2.3e-07

100 49 2.46e-06 7 1.08e-06 7 9.8e-07 10 7.8e-07 10 6.9e-07

1000 494 2.34e-05 10 5.17e-06 10 4.29e-06 17 4.36e-06 17 4.15e-06

10000 4942 0.00018757 14 4.772e-05 14 2.928e-05 23 2.687e-05 23 2.643e-05

100000 49425 0.00126353 17 0.0003288 17 0.00024821 30 0.00025079 30 0.00025482

Comparison of Methods

• Next, let’s look at unsuccessful searches.

• Unlike sequential search the run times are similar between successful and unsuccessful searches.

• The clever version has an even larger number of comparisons, the stupid version remains the same.

• Otherwise the results are similar.

Sequential Stupid, recursive Stupid, iterativeClever,recursive

Clever,iterative

nComp Time Comp Time Comp Time Comp Time Comp Time

1010 5.50e-07 4 3.1e-07 4 2.7e-07 7 2.9e-07 7 3.8e-07

100 100 4.96e-06 7 1.1e-06 7 9.7e-07 13 8.2e-07 13 1e-06

1000 1000 4.34e-05 10 4.39e-06 10 4.28e-06 19 4.41e-06 20 5.89e-06

10000 10000 0.00033711 14 3.107e-05 14 2.613e-05 26 2.68e-05 26 2.701e-05

100000 100000 0.00294095 17 0.0003201 17 0.0003494 33 0.00026173 33 0.00025522

Conclusions

• It seems that the clever version is not worth the trouble, particularly if comparisons are expensive.– For example comparing strings.

• Similarly, the recursive version has slightly worse performance and the iterative version may be easier to understand.

• Winner – the stupid iterative version!

Computational Complexity

• Notice the relationship between the number of comparisons and the logarithm of the size of the list.

• We say that binary search is O(log n).

n Comp. log2n

10 4 3.32

100 7 6.64

1000 10 9.97

10000 14 13.29

100000 17 16.61


• Written– E1 (a, b, c, d) (written on paper) (2 pts each)

Comparison trees

• Our analysis of binary search algorithms depended on:– A particular implementation– A particular computer– A particular operating system– A particular language– A particular compiler

• It would be nice to have a general analysis that would avoid all these issues.

• One method is to construct a comparison tree (decision tree or search tree).

• This tree represents each comparison (or decision) in the algorithm.

Sequential Search Comparison Tree

• Suppose we are searching the list 1, 2, 3, …, n using sequential search.• The following is the comparison tree.

– Each circle represents a comparison.– Each square represents a possible result of the search.– The F result means the target key was not in the list.

• First the target is compared to entry 1.– If they are equal we have found the target and we are done.– If they are not equal then move on to entry 2. – etc.

Stupid Binary Search Comparison Tree

• Suppose we want to search the list 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 using the stupid version of binary search.

• This is the search tree.

• The height of the tree represents the number of comparisons in the worst case.– In this case there might be 5 comparisons, but many cases need only 4.

Root, Leaf and Path Length

• The initial comparison is called the root of the tree and will be made in all cases.

• Each ultimate result is a leaf of the tree.• The path length is the number of interior vertices

(circles) between the root and the leaf.• The path length for a particular target key corresponds to

the number of comparisons needed for the search.• For example if the target key is 7, the path involves the

following comparisons.– Compare to 5 (greater than)– Compare to 8 (less than or equal to)– Compare to 7 (less than or equal to)– Compare to 6 (greater than)– Compare to 7 (equal)– The total number of comparisons is 5

Stupid Average Path Length

• We want to know the average number of comparisons.• For the searches using the stupid version:

– 12 paths of length 4– 8 paths of length 5

• All of these paths begin at the root and end at a leaf and are called external paths.

• Adding up the lengths of all the external paths produces the external path length of the tree.

• The average number of comparisons in a search is

Clever Binary Search Comparison Tree

• Suppose we want to search the list 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 using the clever version of binary search.

• This is the search tree.

• In this case there might be anywhere between 1 and 8 comparisons.

Clever Binary Search Comparison Tree

• The tree for the clever method is somewhat complicated.• We can simplify it by combining pairs of comparisons into a single

circle.

• Here a circle represents:– One comparison if the target key is found.– Two comparisons if the target is not found.

Clever Path Length

• In this clever implementation, if the target key is 7, the path involves the following comparisons.– 5 (not equal)– 5 (greater than)– 8 (not equal)– 8 (less than)– 6 (not equal)– 6 (greater than)– 7 (equal)– There are a total of 7 comparisons.

Clever Average Successful Path Length

• Now a search could terminate at any vertex.– Successful searches end at interior vertices– Unsuccessful searches end at leaves.

• To calculate the average number of comparisons for a successful search we need the length of all the paths from the root to an interior vertex.– This is the interior path length of the tree.

Clever Average Successful Path Length

• There are 10 interior paths with lengths:– 1 path of length 0 (one comparison)– 2 paths of length 1 (three comparisons)– 4 paths of length 2 (five comparisons)– 3 paths of length 3 (seven comparisons)

• The total interior path length is

• Every vertex in the path represents 2 comparisons plus one for the terminating node.

• The average number of comparisons in a successful search is

Clever Average Unsuccessful Path Length

• There are 11 unsuccessful searches using the clever version.• They all end in leaves so we will calculate the external path length.• There are:

– 5 paths of length 3– 6 paths of length 4

• The total external path length is

• Every vertex in these paths represents 2 comparisons.• Average number of comparisons is

• The big penalty for the clever version comes in the unsuccessful case.


• Exercises 7.4 (page 296)– E1(a, b, c, d) (written on paper) (3 pts each)– E2 (written on paper) (5 pts)

• Programming– P1 (email code, written report) (15 pts)

Extending to Larger Trees

• We want to extend our results to larger cases without going through the pain of actually drawing the decision trees.

• A 2-tree is a tree where every vertex except the leaves have two children.

• This means we can predict the maximum possible number of vertices at each level.

Level Max. # of vertices

0 1

1 2

2 4

3 8

… …

t 2t


• This means that if we know we have k vertices on level t then


• We often want to round our results and there are two possibilities.– The floor of x (written ) is the largest

integer less than or equal to x. (round down)– The ceiling of x (written ) is the smallest

integer greater than or equal to x. (round up)

• Notice that

Analysis of Stupid Method

• Suppose we are searching a list of n items.• There are n successful outcomes and the last step is to

check for equality with two possible outcomes.• Therefore, there are 2n leaves.• The number of levels on the tree must be

• This is also the maximum number of comparisons.• Notice that we can either end at level t or at level t-1 and

that

and

• In all cases there will be between lgn and lgn + 2 comparisons.

Analysis of Clever Method – Unsuccessful Searches

• In the clever method all unsuccessful searches end in leaves on the last two levels.

• If we are searching a list on n items then there are n+1 leaves.– Less than smallest key– Between each pair of adjacent keys.– More than the largest key.

• The height of the tree is

• Each level in the tree corresponds to two comparisons and the leaves are on either level t or level t-1.

• This means the number of comparisons will be between

and

• As we have seen before this is around twice the number with the stupid method.

Internal vs. External Vertices

• To compute the average number of successful searches we need a fact about the relationship between the path lengths of internal and external vertices of a 2-tree.– Let E be the external path length.– Let I be the internal path length.– Let q be the number of internal vertices (not

leaves)

• It is a general fact that E = I + 2q.

Internal vs. External Vertices

• To see that E = I + 2q we need to use a proof by induction.• Base case: Suppose a tree contains only the root. In this case E = I = q =

0 so the equation is true.• Induction Step: We build a larger 2-tree from a simpler one.

– Suppose we have a 2-tree (with values E1, I1 and q1) where E1 = I1 + 2q1.

– Pick a leaf v with path length k from the root.– Add two children to v so that it is no longer a leaf.– This produces a new 2-tree (with values E2, I2 and q2).

– Notice that v is in both trees but in the new tree it is no longer a leaf so q2 = q1 + 1.

– Also the internal path length is now I2 = I1 + k.

– Finally, there are two new leaves at level k + 1 but one fewer leaf at level k.– This means E2 = E1 + 2(k+1) – k.

– Now notice that E2 = E1 + 2(k + 1) – k = I1 + 2q1+ k + 2

= (I1 + k) + 2 (q1+1) = I2 + 2q2.

Analysis of Clever Method – Successful Searches

• In the clever method the path length to the leaves is either

or

• There are n+1 leaves so external path length is

• Each internal node corresponds to a unique list key, so q = n. • This means

• Recall the number of comparisons is 2I + q.– Every node on the each path makes 2 comparisons.– The terminating node makes 1.

• Thus the average number of comparisons is

Analysis of Clever Method – Successful Searches

• Notice that for large n,and

• So the average number of comparisons in a successful search is approximately

• The only thing different from the stupid method here is the -3.

• With a big penalty for unsuccessful searches.• Moral:– For short lists (<= 8) use sequential search.– For longer use the stupid binary search.


• Written– E3 (written on paper) (10 pts)

Asymptotics

• When we are talking about the run time of an algorithm we often are only worried about what happens for large problems.

• We also don’t want to focus on details that would depend on a particular system.

• We want to compare our run times to a “library” of basic functions.– g(n) = 1 (constant)– g(n) = log n (logarithmic)– g(n) = n (linear)– g(n) = n2 (quadratic)– g(n) = n3 (cubic)– g(n) = 2n (exponential)

Asymptotics

Test Conclusion

f(n) has a smaller order of magnitude than g(n). f(n) is growing slower than g(n)

is finite (not 0, not infinity)

f(n) has the same order of magnitude as g(n). The growth of f(n) and g(n) only differs by a multiplied constant.

f(n) has a smaller order of magnitude than g(n). f(n) is growing slower than g(n).

Big O Notation

Notation Name Comparison

little o < = 0

big O <= >= 0, finite

big theta = nonzero, finite

big omega >= nonzero, could be infinite

• We introduce a new notation to express the different asymptotic growth rates.

Comparisons By Growth Rate

• A chart can make the relationships between our “library” functions clear.

n 1 lg n n lg n n2 n3 2n

1 1 0 0 1 1 2

10 1 3.32 33 100 1000 1024

100 1 6.64 664 10,000 1,000,000 1.268x1030

1000 1 9.97 9970 1,000,000 109 1.072x10301

Rules for Big O Calculations

1. Ignore multiplied constants.– so

2. Ignore all but the fastest growing term.– so

Rules for Big O Calculations

3. The base for logarithms doesn’t matter.– so


• Exercises 7.6 (page 312)– E1 (a, b, c, d) (written on paper) (2 pts each)– E2 (written on paper) (6 pts)– E5 (a, b, c, d, e, f, g, h) (written on paper) (1

pt each)– E6 (a, b, c, d) (written on paper) (2 pts each)

Documents

Searching Chapter 7. Objectives Introduce sequential search. – Calculate the computational complexity of a successful search. Introduce binary search