DAA Lecture 3

8/3/2019 DAA Lecture 3

1/63

Design and Analysis of AlgorithmsGraduate Course-Number CSC5011Fall Semester 2011

Lecture 3 Searching Tactics

Dr. Md. Shamim Akhter

Assistant ProfessorComputer Science Department

American International University Bangladesh

Email: [email protected]


2/63

Searching Concept (1/3)Common problem in computer science

Involves storing and maintaining large dataset, and then searching the data forarticular values

data storage and retrieval are key tomany industry applications

search algorithms are necessary tostoring and retrieving data efficiently


3/63

Searching Concept (2/3) For instance, a program that checks

the spelling of words, searches forthem in a dictionary, which is just anordered list of words.

Problems of this kind are calledsearching problems.


4/63

Searching Concept (3/3) There are many searching algorithms.

The natural searching method is linearsearch (or sequential search, or exhaustivesearch)

very simple but takes a long time to apply withlarge lists

A binary search repeatedly subdivides the

list to locate an item much faster than linear search

Like a binary search, an interpolation

search repeatedly subdivides the list tolocate an item


5/63

Linear / Sequential Search Special case of brute-force search

This is a very simple algorithm It uses a loop to sequentially step through

, .

It compares each element with the valuebeing searched for and stops when that

value is found or the end of the array isreached.


6/63

Linear Search (2/8)Sub LinearSearch(x:int, a[]: Int, loc: Int)

i:=1While (i


7/63

Linear Search (3/8)Array numlist contains

earc ng or t e t e va ue 11, nearsearch examines 17, 23, 5, and11 -> Found

Searching for the the value 7, linearsearch examines 17, 23, 5, 11, 2,29, and 3 -> Not Found


8/63

Linear Search (4/8) The advantage is its simplicity.

It is easy to understand Easy to implement Does not require the array to be in order

The disadvantage is its inefficiency If there are 20,000 items in the array and

what you are looking for is in the 19,999th

element, you need to search through theentire list.


9/63

Linear Search (5/8) Whenever the number of entries doubles,

so does the running time, roughly. If a machine does 1 million comparisons

er second it takes about 30 minutes for

4 billion comparisons.


10/63

Linear Search (6/8)


11/63

Linear Search (7/8)Use a Sentinel to Improve the

Performance

Sub LinearSearch2(x:int, a[]: Int, loc: Int)

= = =While (xa[i])i = i+1

End WhileIf i


12/63

Linear Search (8/8)Apply Linear Search to Sorted Lists

Sub LinearSearch3(x:int, a[]: Int, loc: Int)

i = 1

While (x > a[i])i = i+1

End While

If a[i] = x Then loc = i Else loc = 0End Sub


13/63

Binary Search (1/9)Can We Search More Efficiently?

Yes, provided the list is in some kind oforder, for example alphabetical order withrespect to the names.

If this is the case, we use a divide andconquer strategy to find an item quickly.

This strategy is what one would use in anumber guessing game, for example.


14/63

Binary Search (2/9)Im Thinking of A Number

between 1 and 1000. Guess it!

Is it 750? Nope, too high. Is it 625? etc

This strategy guarantees a correctguess in no more than ten guesses!


15/63

Binary Search (3/9)Apply This Strategy to Searching

The resulting algorithm is called theBinary Searchalgorithm. We check the middle key in our list.

If it is beyond what we are looking for(too high), we look only at the 1st half ofthe list.

If its not far enough in (too low), welook at the 2nd half.

Then iterate!


16/63

Binary Search (4/9)1. Divide a sorted array into three

sections. middle element elements on one side of the middle

element elements on the other side of the middle

element

2. If the middle element is the correct

value, done. Otherwise, go to step 1,using only the half of the array thatmay contain the correct value.


17/63

Binary Search (5/9)

3. Continue steps 1 and 2 until either the

value is found or there are no moreelements to examine.


18/63

Binary Search (6/9)Binary Search Example

Array numlist2 contains

2 3 5 11 17 23 29

Searching for the value 11, binarysearch examines 11 and stops. Found.

Searching for the value 7, binary searchexamines 11,3,5,and stops. Not

Found.


19/63

Binary Search (7/9)Algorithm for Binary search

Sub BinarySearch(x:int, a[]: int, loc: Int)i =1: j =n

wbeginm =(i + j) \ 2

if x > a[m] then i=m+1 else j=mendif x=a[i] then loc=i else loc=0

End Sub


20/63

Binary Search (8/9) The worst case number of comparisons

grows by only 1 comparison every time listsize is doubled.

Only 32 comparisons would be needed on

a list of4 billion using Binary Search. Sequential Search would need 4 billion

comparisons and would take 30 minutes!


21/63

Binary Search (9/9) Benefit

Much more efficient than linear search. For array of N elements, performs at

Disadvantage

Requires that array elements be sorted.


22/63

Interpolation Search (1/9) Binary search is a great improvement

over linear search eliminates large portion of the list without

ll x min ll

Values are fairly evenly distributed,interpolation can be used to

eliminate more values at each step.


23/63

Interpolation Search (2/9) Interpolation is the process of

using knowledge to guess theposition of an unknown value

Indexes of known values in the list

value should have. Interpolation search selects the

dividing point by interpolation usingthe following code:

m = l + (x a[l])*(r-l)/(a[r]-a[l])


24/63

Interpolation Search (3/9) Compare x to a[m]

If x = a[m]: Found. If x a m : set l = m + 1

If searching is still not finish, continuesearching with new l and r.

Stop searching when Found or xa[r].


25/63

Interpolation Search (4/9)Example: Find the key x = 32 in the list

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 201 4 7 9 9 12 13 17 19 21 24 32 36 44 45 54 55 63 66 70

1: l=1, r=20 -> m=1+(32-1)*(20-1)/(70-1) =10

a[10]=21 l=11

2: l=11, r=20 -> m=11+(30-24)*(20-11)/(70-24) = 12

a[12]=32=x -> Found at m = 12


26/63

Interpolation Search (5/9)Example: Find the key x = 30 in the list

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

1 4 7 9 9 12 13 17 19 21 24 32 36 44 45 54 55 63 66 70

1: l=1, r=20 -> m=1+(30-1)*(20-1)/(70-1) = 9

a = =x - =2: l=10, r=20 -> m=10+(30-21)*(20-10)/(70-21) = 12

a[12]=32>30=x -> r = 113: l=10, r=11 -> m=10+(30-24)*(11-10)/(24-

21) = 12

m=12>11=r: Not Found


27/63

Interpolation Search (6/9)Private Sub Interpolation(a[]: Int, x: Int, n: Int,

Found: Boolean)l = 1: r = n

Do While (r > l)

m = l + ((x a[l]) / (a[r] a[l])) * (r - l)Verify and Decide What to do next

Loop

End Sub


28/63

Interpolation Search (7/9)

Verify and Decide what to do next

If (a[m] = x) Or (m < l) Or (m > r) ThenFound = iif(a[m] = x, True, False)Exit Do

ElseIf (a[m] < x) Thenl = m + 1

ElseIf (a[m] > x) Then

r = m 1End If


29/63

Interpolation Search (8/9) Binary search is very fast (O(logn)), but

interpolation search is much faster(O(loglogn)).

For n = 2^32 (four billion items) Binary search took32 steps of verification Interpolation search tookonly 5 steps of

verification.


30/63

Interpolation Search (9/9) Interpolation search performance

time is nearly constant for a largerange of n.

data had been stored on a hard diskor other relatively slow device.


31/63

Binary Search Tree (BST) Its a binary tree !

For each node in a BST left subtree is smaller than it;

an

right subtree is greater than it.


32/63

Search Operation

Search operation takestime O(h), where h isthe height of a BST


33/63

Operation Insert


34/63

Worst Case


35/63

Performance

Depend on the shape of the tree

Best Case: Perfectly balanced tree, log N nodes from

root to leave

Worst Case: N nodes in a search path

Average Case: 1.39 log N comparisons for N keys


36/63

Balanced Tree

Tree structures support various basic dynamicset operations in time proportional to the heightof the tree

e.g.: Search, Predecessor, Successor, Minimum,, ,

Ideally, a tree will be balanced and the heightwill be log nwhere nis the number of nodes

in the tree To ensure that the height of the tree is as

small as possible and therefore provide the

best running time


37/63

Balanced BST

BST Worst case O(N)

Need to be balancedApproach:

Recursive and linear time

However, insertion cost quadratic

Frequently rebalancing

Is there a type of BST which guarantee??

Every insert and search will be logarithmic


38/63

Top Down 2-3-4 Trees

Nodes store 1, 2, or 3 keys and have 2,

3, or 4 children, respectivelyAll leaves have the same depth


39/63

2-3-4 Tree Nodes

Introduction of nodes with more than 1

key, and more than 2 children

-

same as a binary node

3 Node: 2 keys, 3 links

4 Node:

3 keys, 4 links


40/63

Why 2-3-4? (1/2)

Why not minimize height by maximizing children ina d-tree?

Let each node have d children so that we getO(logd N) search time! Right?

That means if d = N1/2, we get a height of 2


41/63

Why 2-3-4? (2/2)

However, searching out the correct childon each level requires O(log N1/2) by

binary search

2 log N1/2 = O(log N) which is not as good

as we had hoped for! 2-3-4-trees will guarantee O(log N) height

using only 2, 3, or 4 children per node


42/63

Insertion into 2-3-4 Trees (1/3)

Insert the new key at the lowest internal

node reached in the search 2-node becomes 3-node

3-node becomes 4-node

What about a 4-node?

We cant insert another key!


43/63


In our way down the tree, whenever we

reach a 4-node, we break it up into two2-nodes, and move the middle elementup into the parent node


44/63


Now we can perform the insertion using

one of the previous two cases Since, we follow this method from the

root down to the leaf it is called to

down insertion


45/63

Splitting the Tree

As we travel down the tree, if we

encounter any 4-node we will break it upinto 2-nodes.

his uarantees that we will never have

the problem of inserting the middleelement of a former 4-node into itsparent 4-node.


46/63

Splitting the Tree


47/63

Splitting the Tree

Time Complexity of Insertion


48/63

Time Complexity of Insertion

in 2-3-4 Trees Time complexity:

A search visits O(log N) nodesAn insertion requires O(log N) node splits

Each node s lit takes constant time

Operations Search and Insert eachtaketime O(log N)

d


49/63

Beyond 2-3-4 Trees

What do we know about 2-3-4 Trees?

Balanced

O(log N) search time

Different node structures

Can we get 2-3-4 tree advantages ina binary tree format???

Welcome to the world of Red-Black Trees!!!


50/63

Best both methods

Search in BST Insert in 2-3-4 search tree

R d Bl k T


51/63

Red-Black Tree

A red-black tree is a binary search tree withthe following properties:

edges are colored red or black

no two consecutive red ed es on an root-leaf

path same number of black edges on any root-leaf

path (= black height of the tree)

edges connecting leaves are black

R d Bl k T


52/63

Red-Black Tree

2 3 4 T E l i


53/63

2-3-4 Tree Evolution

How 2-3-4 trees relate to red-black trees


54/63

Insertion into Red-Black Tree1. Perform a standard search to find the leaf where

the key should be added

2. Replace the leafwith an internal node with thenew key

.

4. Add two new leaves, and color their incomingedges black

Inse tion into Red Black T ee


55/63

Insertion into Red-Black Tree

If the parent had an incoming red edge,we now have two consecutive red edges!

We must re-organize tree to remove thatviolation.

What must be done depends on the siblingof the parent.

I ti Pl i d Si l


56/63

Insertion - Plain and Simple

Right Left Rotation


57/63

Right Left Rotation

Restructuring


58/63

Restructuring

Case 2: Incoming edge of p is red,and its sibling is black


59/63

Similar to a right rotation, we can do aleft rotation...

Double Rotation


60/63

Double Rotation

What if the new node is between its parent andgrandparent in the inorder sequence?

We must perform a double rotation(which is nomore difficult than a single one)

This would be called a left-right double rotation

Last of the Rotations


61/63

Last of the Rotations

And this would be called a right-leftdouble rotation

Bottom-Up Rebalancing


62/63

Bottom-Up Rebalancing

Case 3: Incoming edge of p is red and itssibling is also red

We call this a promotion

Note how the black depthremains unchanged for allof the descendants ofg This process will continue

upward beyondg if necessary: rename gas n and repeat.

Summary of Insertion


63/63

Summary of Insertion

If two red edges are present, we do either

a restructuring(with a simple or doublerotation)

and stop, or apromotion and continue

A r r rin k n n im n i

performed at most once. It reorganizes an off-balanced section of the tree.

Promotions may continue up the tree and are

executed O(log N) times. The time complexity of an insertion is

O(logN).

Documents

DAA Lecture 3