Lecture 18 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec18.leftovers/lec18.pdfPage 1 of 35 CSE 100, UCSD: LEC 18 Lecture 18 Self-organizing data structures

of 35

CSE 100, UCSD: LEC 18

Lecture 18

✔ Self-organizing data structures

✔ Self-organizing lists

✔ Splay trees

✔ Spatial data structures

✔ K-D trees

✔ The C++ Standard Template Library

✔ Java JDK 1.5 generics

Reading: Weiss, Ch. 4 and Ch. 12

of 35

Final exam

ns of the textbook, and


✔ Final exam time: Tue Mar 17 11:30am-2:00pm

✔ Location: CSB 002

✔ Closed book, closed notes, no calculators...

✔ Bring something to write with, and picture ID

✔ Practice final is on line

✔ Final exam discussion topic on Webboard

✔ Exam review sessions will be:

✗ last lecture

✔ Coverage: All lectures, all assignments, corresponding sectiohandouts

of 35

Self-organizing data structures

more than just find the joint subset structure so

e worst case, any all (practically constant)

“in the long run”

ll manner of data t:


✔ The path-compression find algorithm for disjoint subsets doeslabel of the subset containing an item: it also modifies the dissubsequent find operations will be faster

✔ Amortized analysis of path-compression find shows that in thsufficiently long sequence of find operations will have very smtime cost per operation

✗ ... so the additional work to compress the paths is worth it

✔ This idea of self-organizing data structures can be applied to astructures: lists, trees, hashtables, etc. We will look briefly a

✗ self-organizing lists

✗ splay trees

of 35

Self-organizing lists

ge case time cost for

ore likely to be searched


✔ Worst-case time cost for searching a linked list is O(N)

✔ Assuming all keys are equally likely to be searched for, averasuccessful search in a linked list is also O(N)

✔ But what if all keys are not equally likely?

✔ Then we might be able to do better by putting keys that are mfor at the beginning of the list...

of 35

Nonuniform search probabilities: exponential distribution

/2i, except for kN which

ent, k2 is in the second cessful search is

ptimal ordering of the list case (still O(N) worst-


✔ Suppose there are N keys in the list: k1, ..., kN

✔ Suppose the probability that the user will search for key ki is 1has probability 1/2N-1

✔ Now if the list is arranged so that key k1 is in the first list elemlist element, etc., the average number of comparisons in a suc

✔ So, with under this rather extreme probabilistic assumption, opermits successful search to take constant time in the averagecase, though)

i2i----

i 1=

N

∑N2N------+ O 1( )=

of 35

Nonuniform search probabilities: Zipf distribution

roportional to

ribution; it fits the

ement, k2 is in the second cessful search is

e list permits successful


✔ Suppose there are N keys in the list: k1, ..., kN

✔ Suppose the probability that the user will search for key ki is p1/( i H(i) ) , where H() is the harmonic series

✔ This probability distribution over keys is known as a Zipf distobserved frequencies of words in natural languages

✔ If as before the list is ordered so that key k1 is in the first list ellist element, etc., the average number of comparisons in a suc

✔ So, under this probabilistic assumption, optimal ordering of thsearch to be more efficient than O(N) in the average case

H i( ) 1j---

j 1=

i

∑=

iiH i( )-------------

i 1=

N

∑ ON

Nlog------------ =

of 35

Nonuniform search probabilities: the 90/10 rule

se, 90% of the references

ases... Sometimes the

e frequently referenced times faster than in a

r


✔ The 90/10 rule: Empirical studies show that in a typical databaare to only 10% of the items

✔ (This is a rough qualitiative result that may not be true in all csame idea is called the 80/20 rule...)

✔ If the 90/10 rule holds, and the list is arranged so that the morkeys are at the front of the list, average access time will be 10randomly arranged list

✔ This is still O(N) average case, but with a better constant facto

of 35

Self-organizing list strategies

ist ordered by those

s, by moving a found key stay the same)

o-front


✔ If you know the probabilities for each key, you could build a lprobabilities (this is called the “optimal static ordering”)

✔ But usually you don’t know the probabilities beforehand

✔ A self-organizing list will adapt to the actual pattern of searchenearer to the front of the list (insert and delete operations can

✔ There are two simple strategies for this: transpose and move-t

of 35

Transpose

at key with the key in the

linked list or as an array

e front of the list...

hen only the last two

other, and will stay at the


✔ When there is a successful search for a key in the list, swap thprevious element in the list

✔ Easy to do, whether the list is implemented as a pointer-based

✔ Eventually, frequently accessed keys will tend to move near th

✔ ... but they are not guaranteed to do so!

✔ Consider this “worst case” situation: The list is created, and telements are accessed alternately

✗ these two elements will repeatedly be exchanged with eachend of the list

of 35

Move-to-front

e element with that key to

list implemented with an

near the front of the list

ns for a successful search that of the optimal static


✔ When there is a successful search for a key in the list, move ththe front of the list

✔ Easy to do in a pointer-based linked list, more expensive in a array

✔ Obviously, over time, frequently accessed keys will tend to be

✔ In fact, it can be shown that the average number of comparisousing the simple move-to-front strategy is no more than twiceordering!

of 35

Self-organizing search trees

s as well

ove the node containing fast

g the node up to the root

78] and by Bitner [1979]

t has worst-case me, and remain, very


✔ The ideas of self-organizing lists can be applied to search tree

✔ Basic idea: when a search operation finds a key in the tree, mthat key to the root of the tree, so subsequent accesses will be

✔ This can be done using repeated AVL single rotations, rotatin

✗ This idea was originally proposed by Allen and Munro [19

✔ However, simply rotating the node with a found key to the rooamortized cost O(N) per operation (because the tree can becounbalanced)

✔ Something better is needed...

of 35

Splay trees

ns after a successful find s improve subsequent

worse on average

lving a node, its parent,

f M > N find operations ng run”

ion is still O(N) )


✔ Splay trees were proposed by Sleator and Tarjan [1985]

✔ Splay trees are binary search trees that use “splaying” operatioto move the found key to the root of the tree. These operationsearches for that key, while not making access for other keys

✔ Splaying operations are composed out of AVL rotations invoand grandparent, depending on various special cases.

✔ The result: worst-case amortized time cost for any sequence ois O(M log N) , so splay trees act like balanced trees “in the lo

✔ (However, worst-case time cost for any individual find operat

of 35

Search trees and multidimensional keys

rch trees, AVL trees, red-possible keys be

al set (or anyway be


✔ The search trees we have discussed so far (ordinary binary seablack trees, B trees, treaps, splay trees) require that the set of completely ordered

✗ for each pair of keys K1, K2 exactly one of these is true:

• K1 < K2

• K1 = K2

• K1 > K2

✔ That means that the possible keys must lie in a one-dimensionmapped to a one-dimensional set by the ordering relation)

✔ What to do if the keys are multidimensional?

of 35

Multidimensional keys

f strings

be an ordered triple of the center of the object


✔ Example 1: In a database of persons, the key may be a pair o(lastname, firstname)

✔ Example 2: In a database of 3D graphics objects, the key maynumbers (x,y,z) which are the Cartesian x,y,z coordinates of

✔ How can these keys be stored in a search tree?

of 35

Making multidimensional keys one dimensional

h dimension is a string, it ic (dictionary) ordering:

name2) if and only if

rstname2

ch dimension is a spatial

odulus

+ z2*z2


✔ Example 1: For multidimensional keys where the value on eacoften makes sense to reduce to 1 dimension using lexicograph

✗ (lastname1, firstname1) > (lastname2,first lastname1 > lastname2, or lastname1 == lastname2 and firstname1 > fi

✔ Example 2: For multidimensional keys where the value on eacoordinate, several approaches might be considered possible:

✗ lexicographic ordering on the coordinates

• (x1,y1,z1) > (x2,y2,z2) if and only if x1 > x2, or x1 == x2 and y1 > y2, or x1 == x2 and y1 == y2 and z1 > z2

✗ interpret coordinates as a vector, and consider its squared m

• (x1,y1,z1) > (x2,y2,z2) if and only if x1*x1 + y1*y1 + z1*z1 > x2*x2 + y2*y2

✗ etc.

of 35

Keeping multidimensional keys multidimensional

ordering information can ple:

earest the given name, or

trically nearest the given

ing information you need

ons: nearness of mparing vector moduli

ltidimensional keys


✔ As you know, search trees require ordered keys

✔ One advantage of search trees (over hashtables) is that the keybe used to make some interesting queries efficient. For exam

✗ find the 10 names in the database that are alphabetically n

✗ find the 7 objects in the database whose centers are geomeobject, etc.

✔ But if you map multidimensions to 1D, you may lose the orderfor such queries

✔ This is particularly a problem in spatial or geometric applicaticoordinates is not preserved with lexicographic ordering or co

✔ The solution is to use a search tree that deals directly with mu

of 35

K-D trees

rocessing of nal”)

akes branching decisions

L make branching

ranching decisions on the the y coordinate; andchildren of the root

dimension of the key, and

l dimensions of the key)


✔ k-d trees are a variation on the BST that allows for efficient pmultidimensional keys (k-d is an abbreviation of “k-dimensio

✔ k-d trees differ from BST’s in that each level of the k-d tree mbased on one of the k dimensions of the key

✔ If a key has k dimensions 0 ,..., k-1, then tree nodes at level decisions based on key dimension L mod k

✔ For example, in a 3-d tree with keys (x,y,z), the root makes bvalue of the x coordinate; children of the root discriminate ongrandchildren of the root discriminate on z; children of the grdiscriminate on x, etc.

✔ This branching decisions are just like those in a BST

✗ the only difference is that they “pay attention” to only one which dimension depends on the level of the node

✗ (of course equality comparisons need to pay attention to al

of 35

K-D tree ordering property

hose dimension L mod k

ension L mod k has value

on L mod k has value


✔ k-d tree ordering property:

✗ Consider any node X at level L; suppose it contains a key whas value V.

✗ Then all keys in the left subtree of X have keys whose dimless than V, and

✗ all keys in the right subtree of X have keys whose dimensigreater than or equal to V

of 35

A 2-d tree

(70,10)

(69,50)

(80,90)(55,80)


✔ Points on a plane stored in a 2-dimensional k-d tree

x

y

0

0

100

100

(40,45)

(15,70)

of 35

K-D tree operations

BST operations

de!

e to actually remove from

’s


✔ Find and insert operations in k-d trees are similar to standard

✔ Descend the tree, making decisions at each node

✔ But in a k-d tree, these decisions depend on the level of the no

✔ The result of an insert operation is a new leaf node

✔ Non-lazy delete operations are a little harder: finding the nodthe tree requires some thought

✔ k-d trees have the same asymptotic time costs as ordinary BST

✗ (balanced k-d trees are very tricky to implement)

✔ k-d trees can also be used to perform “region” queries

of 35

K-D tree region queries

onal “hypercube” region

ursively process its left

check to see if it is to

ult of the comparison) of


✔ Suppose you want to find all the points that lie in a k-dimensiof a certain size, centered on a certain point

✔ This is pretty easy to do in a k-d tree:

✔ Start at the root. If it is in the region, add it to the list, and recand right children.

✔ If any node encountered in the search is not in the region, thenbecause of the coordinate that this node is “paying attention”

✔ If so, then the entire left or right subtree (depending on the resthis node can be eliminated from the search

✔ Example: Find all keys that lie within a certain hypercube

of 35

K-D tree region queries: an example

e point (25,50)

(70,10)

(69,50)

(80,90)(55,80)


✔ Find all keys within a square 50 units on a side centered at th

x

y

0

0

100

100

(40,45)

(15,70)

of 35

Learning and using data structures

data structures are ng else you have to write

++, Java 2, very nice

they are well-written, ent for the language is

(we have already )


✔ Learning data structures has two parts:

✗ learning how to design and implement data structures

✗ learning how to use data structures

✔ In language environments like ANSI C, Pascal, etc., very fewavailable by default (maybe just arrays and records); everythiyourself

✔ However in other language environments such as Smalltalk, Cstandard data structure libraries are available

✔ For many applications, it makes sense to use these if you can:well-tested, and are available wherever the standard environmavailable

✔ We will look briefly at the Standard Template Library for C++covered the similar Collections API with generics for JDK 1.5

of 35

The C++ Standard Template Library (STL)

s a set of generic sses

espond to fundamental

rs

fundamental operations erations in terms of these

mutating sequence numeric operations

nd adaptors

TL classes)

as part of the standard environment


✔ The Standard Template Library is a C++ library that providecontainer classes, generic algorithms, and other supporting cla

✔ The container classes are abstract data types (ADTs) that corrdata structures

✗ 2 kinds of containers: sequences and associative containe

✔ The container classes have member functions that implement on container instances; generic algorithms implement other op

✗ 4 kinds of algorithms: nonmutating sequence operations, operations, sorting and related operations, and generalized

✔ Important supporting classes are: STL iterators, allocators, a

✗ (you usually do not need to write your own allocators for S

✔ The ANSI/ISO C++ Standards Committee voted to adopt STLC++ library, so it ‘should’ be available in every standard C++

of 35

STL iterators

inters: instances of e.g., the first element, last a container (e.g., next

ries of iterators it supports

a sequence, expressed

ons, expressed with ++

l, plus “long jumps” access iterator and n is an ith r+n, r-n, r[n];

r random access iterator), s r < s, r > s, r = s, producing bool values.

✗ Input and output iterators: somewhat weaker than forwardused in accessing input and output streams

of 35

STL containers: sequences

objects, all of the same

rators.

d erase operations at the

s and allows worst-case sequence (once an d)

ents is not supported

m access iterators

erase operations at the t and erase in the middle


✔ A sequence is a kind of container that organizes a finite set oftype, into a strictly linear arrangement

✔ The STL provides 3 basic sequence classes:

✔ vector is a kind of sequence that supports random-access ite

✗ In addition, it supports (amortized) constant time insert anend; insert and erase in the middle take linear time.

✔ list is a kind of sequence that supports bidirectional iteratorconstant time insert and erase operations anywhere within theiterator referring to the insert or erase position has been create

✗ Unlike vectors and deques, fast random access to list elem

✔ deque is a kind of sequence that, like a vector, supports rando

✗ In addition, it supports worst-case constant time insert andbeginning or the end (deque = double-ended queue) ; insertake linear time

of 35

STL containers: associative containers

ata based on keys

t, multiset, map,

ed) on the key type, and

associated data

timap allow multiple

e insert, erase, and find

ers in ascending order of


✔ Associative containers provide an ability for fast retrieval of d

✔ The STL provides 4 basic kinds of associative containers: semultimap

✔ All of the associative containers are parameterized (templatizon an ordering relation that imposes a total order on keys

✔ map and multimap are also parameterized on the type of the

✔ set and map support only unique keys; multiset and muloccurrences of the same key

✔ All associative containers support bidirectional iterators

✔ All associative containers support worst-case logarithmic-timoperations

✔ Iterators for associative containers iterate through the containkey values

of 35

Implementing STL containers

perations, and worst-case

STL map class?

mented using doubly-


✔ The STL containers have an interface that specifies required oor amortized time cost bounds on the operations

✔ What would you use to implement the STL list class? The

✔ In SGI’s reference implementation of the STL, list is implelinked lists, and map is implemented using red-black trees

of 35

STL algorithms

m any container class; primitive types

operations, mutating alized numeric operations

d, count, mismatch,

, fill, generate, remove,

h, merge, set includes, set e, minimum, maximum,

artial sum, adjacent


✔ In the STL, some algorithms are provided that are separate frothis permits using the algorithms on user-defined classes and

✔ There are 4 kinds of these algorithms: nonmutating sequencesequence operations, sorting and related operations, and gener

✔ Nonmutating sequence algorithms: foreach, find, adjacent finequal, search

✔ Mutating sequence algorithms: copy, swap, transform, replaceunique, reverse, rotate, random shuffle, partition

✔ Sorting and related algorithms: sort, nth-element, binary searcunion, set intersection, set difference, set symmetric differenclexicographic comparison, permutation generation

✔ Generalized numeric algorithms: accumulate, inner product, pdifference

of 35

STL adaptors

s into one that has a

ck, pop_back can be used used

sh_back, pop_front can be used

r and supporting ntiate priority_queue. In


✔ Adaptors are templated classes that “convert” a container clasdifferent, more restrictive interface

✔ stack: Any sequence that supports operations back, push_bato instantiate stack. In particular, vector, list and deque can be

✔ queue: Any sequence that supports operations front, back, pube used to instantiate queue. In particular, list and deque can

✔ priority queue: Any sequence with random access iteratooperations front, push_back and pop_back can be used to instaparticular, vector and deque can be used

of 35

Generic data structures and type-checking

d design it to contain

downcast it to the rs or properties

out this at runtime, with a

hrows at runtime


✔ In (pre-1.5) Java, to design a generic data structure, you woulObjects

✔ Then when you get an object out of the container, you wouldappropriate subclass if you need to access its subclass behavio

✔ If some other kind of object was stored, you would find out abClassCastException. Example:

Vector v = new Vector();

v.add(new Integer(3));

v.add(new String("hey");

...

Integer i, String s;

i = (Integer) v.elementAt(0); // okay

s = (String) v.elementAt(0); // compiles, but t

of 35

Sorted generic data structures and type checking

u would design it to

y in the structure, you ception. Example:

rows at runtime

l, but it requires runtime

let the compiler do the


✔ In (pre-1.5) Java, to design a generic sorted data structure, yocontain Comparables, or Objects that use a Comparator

✔ If you store an Object that is not comparable to objects alreadwould find out about this only at runtime, with a ClassCastEx

TreeSet t = new TreeSet();

...

t.add(new String("hey");

...

t.add(new Integer(0)); // this compiles, but th

...

✔ Type checking at runtime is better than no type checking at altesting

✔ It is better to do type checking automatically at compile time:work!

✔ The new generics in Java 1.5 permit this...

of 35

JDK 1.5 generics

now support full compile-ine your own, as well)

bj>;

t needed!


✔ All the Java Collections API interfaces and container classes time type checking via parameterized class types (you can def

✔ (The syntax is very similar to C++ templates)

✔ Examples:Vector v = new Vector();

v.add(new Integer(3));

v.add(new String("hey"); // compile time error

Integer i;

i = v.elementAt(0); // note no downcast needed!

TreeSet t = new TreeSet();

t.add(new String("hey");

t.add(new Integer(0)); // compile time error

HashMap h = new HashMap

of 35

Automatic boxing/unboxing and “enhanced for”

matic boxing and type. Example:

>();

>();

lable:

ic unboxing can be used:


✔ If a container class is parameterized on a “wrapper” type, autounboxing is done with respect to the corresponding primitive

LinkedList ls1 = new LinkedList

of 35

Next time


✔ Final review

Lecture 18Self-organizing data structuresSelf-organizing listsSplay treesSpatial data structuresK-D treesThe C++ Standard Template LibraryJava JDK 1.5 generics Reading: Weiss, Ch. 4 and Ch. 12

Final examFinal exam time: Tue Mar 17 11:30am-2:00pmLocation: CSB 002Closed book, closed notes, no calculators...Bring something to write with, and picture IDPractice final is on lineFinal exam discussion topic on WebboardExam review sessions will be:last lecture

Coverage: All lectures, all assignments, corresponding sections of the textbook, and handouts

Self-organizing data structuresThe path-compression find algorithm for disjoint subsets does more than just find the label of th...Amortized analysis of path-compression find shows that in the worst case, any sufficiently long s...... so the additional work to compress the paths is worth it “in the long run”

This idea of self-organizing data structures can be applied to all manner of data structures: lis...self-organizing listssplay trees

Self-organizing listsWorst-case time cost for searching a linked list is O(N)Assuming all keys are equally likely to be searched for, average case time cost for successful se...But what if all keys are not equally likely?Then we might be able to do better by putting keys that are more likely to be searched for at the...

Nonuniform search probabilities: exponential distributionSuppose there are N keys in the list: k1, ..., kNSuppose the probability that the user will search for key ki is 1/2i, except for kN which has pro...Now if the list is arranged so that key k1 is in the first list element, k2 is in the second list...So, with under this rather extreme probabilistic assumption, optimal ordering of the list permits...

Nonuniform search probabilities: Zipf distributionSuppose there are N keys in the list: k1, ..., kNSuppose the probability that the user will search for key ki is proportional to 1/( i H(i) ) , wh...This probability distribution over keys is known as a Zipf distribution; it fits the observed fre...If as before the list is ordered so that key k1 is in the first list element, k2 is in the second...So, under this probabilistic assumption, optimal ordering of the list permits successful search t...

Nonuniform search probabilities: the 90/10 ruleThe 90/10 rule: Empirical studies show that in a typical database, 90% of the references are to o...(This is a rough qualitiative result that may not be true in all cases... Sometimes the same idea...If the 90/10 rule holds, and the list is arranged so that the more frequently referenced keys are...This is still O(N) average case, but with a better constant factor

Self-organizing list strategiesIf you know the probabilities for each key, you could build a list ordered by those probabilities...But usually you don’t know the probabilities beforehandA self-organizing list will adapt to the actual pattern of searches, by moving a found key nearer...There are two simple strategies for this: transpose and move-to-front

TransposeWhen there is a successful search for a key in the list, swap that key with the key in the previo...Easy to do, whether the list is implemented as a pointer-based linked list or as an arrayEventually, frequently accessed keys will tend to move near the front of the list...... but they are not guaranteed to do so!Consider this “worst case” situation: The list is created, and then only the last two elements ar...these two elements will repeatedly be exchanged with each other, and will stay at the end of the ...

Move-to-frontWhen there is a successful search for a key in the list, move the element with that key to the fr...Easy to do in a pointer-based linked list, more expensive in a list implemented with an arrayObviously, over time, frequently accessed keys will tend to be near the front of the listIn fact, it can be shown that the average number of comparisons for a successful search using the...

Self-organizing search treesThe ideas of self-organizing lists can be applied to search trees as wellBasic idea: when a search operation finds a key in the tree, move the node containing that key to...This can be done using repeated AVL single rotations, rotating the node up to the rootThis idea was originally proposed by Allen and Munro [1978] and by Bitner [1979]

However, simply rotating the node with a found key to the root has worst-case amortized cost O(N)...Something better is needed...

Splay treesSplay trees were proposed by Sleator and Tarjan [1985]Splay trees are binary search trees that use “splaying” operations after a successful find to mov...Splaying operations are composed out of AVL rotations involving a node, its parent, and grandpare...The result: worst-case amortized time cost for any sequence of M > N find operations is O(M log N...(However, worst-case time cost for any individual find operation is still O(N) )

Search trees and multidimensional keysThe search trees we have discussed so far (ordinary binary search trees, AVL trees, red- black tr...for each pair of keys K1, K2 exactly one of these is true:• K1 < K2• K1 = K2• K1 > K2

That means that the possible keys must lie in a one-dimensional set (or anyway be mapped to a one...What to do if the keys are multidimensional?

Multidimensional keysExample 1: In a database of persons, the key may be a pair of strings (lastname, firstname)Example 2: In a database of 3D graphics objects, the key may be an ordered triple of numbers (x,y...How can these keys be stored in a search tree?

Making multidimensional keys one dimensionalExample 1: For multidimensional keys where the value on each dimension is a string, it often make...(lastname1, firstname1) > (lastname2,firstname2) if and only if lastname1 > lastname2, or lastnam...

Example 2: For multidimensional keys where the value on each dimension is a spatial coordinate, s...lexicographic ordering on the coordinates• (x1,y1,z1) > (x2,y2,z2) if and only if x1 > x2, or x1 == x2 and y1 > y2, or x1 == x2 and y1 == ...

interpret coordinates as a vector, and consider its squared modulus• (x1,y1,z1) > (x2,y2,z2) if and only if x1*x1 + y1*y1 + z1*z1 > x2*x2 + y2*y2 + z2*z2

etc.

Keeping multidimensional keys multidimensionalAs you know, search trees require ordered keysOne advantage of search trees (over hashtables) is that the key ordering information can be used ...find the 10 names in the database that are alphabetically nearest the given name, orfind the 7 objects in the database whose centers are geometrically nearest the given object, etc.

But if you map multidimensions to 1D, you may lose the ordering information you need for such que...This is particularly a problem in spatial or geometric applications: nearness of coordinates is n...The solution is to use a search tree that deals directly with multidimensional keys

K-D treesk-d trees are a variation on the BST that allows for efficient processing of multidimensional key...k-d trees differ from BST’s in that each level of the k-d tree makes branching decisions based on...If a key has k dimensions 0 ,..., k-1, then tree nodes at level L make branching decisions based ...For example, in a 3-d tree with keys (x,y,z), the root makes branching decisions on the value of ...This branching decisions are just like those in a BSTthe only difference is that they “pay attention” to only one dimension of the key, and which dime...(of course equality comparisons need to pay attention to all dimensions of the key)

K-D tree ordering propertyk-d tree ordering property:Consider any node X at level L; suppose it contains a key whose dimension L mod k has value V.Then all keys in the left subtree of X have keys whose dimension L mod k has value less than V, andall keys in the right subtree of X have keys whose dimension L mod k has value greater than or eq...

A 2-d treePoints on a plane stored in a 2-dimensional k-d tree

K-D tree operationsFind and insert operations in k-d trees are similar to standard BST operationsDescend the tree, making decisions at each nodeBut in a k-d tree, these decisions depend on the level of the node!The result of an insert operation is a new leaf nodeNon-lazy delete operations are a little harder: finding the node to actually remove from the tree...k-d trees have the same asymptotic time costs as ordinary BST’s(balanced k-d trees are very tricky to implement)

k-d trees can also be used to perform “region” queries

K-D tree region queriesSuppose you want to find all the points that lie in a k-dimensional “hypercube” region of a certa...This is pretty easy to do in a k-d tree:Start at the root. If it is in the region, add it to the list, and recursively process its left a...If any node encountered in the search is not in the region, then check to see if it is because of...If so, then the entire left or right subtree (depending on the result of the comparison) of this ...Example: Find all keys that lie within a certain hypercube

K-D tree region queries: an exampleFind all keys within a square 50 units on a side centered at the point (25,50)

Learning and using data structuresLearning data structures has two parts:learning how to design and implement data structureslearning how to use data structures

In language environments like ANSI C, Pascal, etc., very few data structures are available by def...However in other language environments such as Smalltalk, C++, Java 2, very nice standard data st...For many applications, it makes sense to use these if you can: they are well-written, well-tested...We will look briefly at the Standard Template Library for C++ (we have already covered the simila...

The C++ Standard Template Library (STL)The Standard Template Library is a C++ library that provides a set of generic container classes, ...The container classes are abstract data types (ADTs) that correspond to fundamental data structures2 kinds of containers: sequences and associative containers

The container classes have member functions that implement fundamental operations on container in...4 kinds of algorithms: nonmutating sequence operations, mutating sequence operations, sorting and...

Important supporting classes are: STL iterators, allocators, and adaptors(you usually do not need to write your own allocators for STL classes)

The ANSI/ISO C++ Standards Committee voted to adopt STL as part of the standard C++ library, so i...

STL iteratorsSTL iterators are classes that generalize the idea of C/C++ pointers: instances of iterator class...A container class will be characterized in part by what categories of iterators it supportsThe STL provides 5 categories of iterators:Forward iterators: provide for one-directional traversal of a sequence, expressed with ++Bidirectional iterators: provide for traversal in both directions, expressed with ++ and --Random access iterators: provide for bidirectional traversal, plus “long jumps” expressed with r ...Input and output iterators: somewhat weaker than forward iterators, and are mainly used in access...

STL containers: sequencesA sequence is a kind of container that organizes a finite set of objects, all of the same type, i...The STL provides 3 basic sequence classes:vector is a kind of sequence that supports random-access iterators.In addition, it supports (amortized) constant time insert and erase operations at the end; insert...

list is a kind of sequence that supports bidirectional iterators and allows worst-case constant t...Unlike vectors and deques, fast random access to list elements is not supported

deque is a kind of sequence that, like a vector, supports random access iteratorsIn addition, it supports worst-case constant time insert and erase operations at the beginning or...

STL containers: associative containersAssociative containers provide an ability for fast retrieval of data based on keysThe STL provides 4 basic kinds of associative containers: set, multiset, map, multimapAll of the associative containers are parameterized (templatized) on the key type, and on an orde...map and multimap are also parameterized on the type of the associated dataset and map support only unique keys; multiset and multimap allow multiple occurrences of the sam...All associative containers support bidirectional iteratorsAll associative containers support worst-case logarithmic-time insert, erase, and find operationsIterators for associative containers iterate through the containers in ascending order of key values

Implementing STL containersThe STL containers have an interface that specifies required operations, and worst-case or amorti...What would you use to implement the STL list class? The STL map class?In SGI’s reference implementation of the STL, list is implemented using doubly- linked lists, and...

STL algorithmsIn the STL, some algorithms are provided that are separate from any container class; this permits...There are 4 kinds of these algorithms: nonmutating sequence operations, mutating sequence operati...Nonmutating sequence algorithms: foreach, find, adjacent find, count, mismatch, equal, searchMutating sequence algorithms: copy, swap, transform, replace, fill, generate, remove, unique, rev...Sorting and related algorithms: sort, nth-element, binary search, merge, set includes, set union,...Generalized numeric algorithms: accumulate, inner product, partial sum, adjacent difference

STL adaptorsAdaptors are templated classes that “convert” a container class into one that has a different, mo...stack: Any sequence that supports operations back, push_back, pop_back can be used to instantiate...queue: Any sequence that supports operations front, back, push_back, pop_front can be used to ins...priority queue: Any sequence with random access iterator and supporting operations front, push_ba...

Generic data structures and type-checkingIn (pre-1.5) Java, to design a generic data structure, you would design it to contain ObjectsThen when you get an object out of the container, you would downcast it to the appropriate subcla...If some other kind of object was stored, you would find out about this at runtime, with a ClassCa...Vector v = new Vector();v.add(new Integer(3));v.add(new String("hey");...Integer i, String s;i = (Integer) v.elementAt(0); // okays = (String) v.elementAt(0); // compiles, but throws at runtime

Sorted generic data structures and type checkingIn (pre-1.5) Java, to design a generic sorted data structure, you would design it to contain Comp...If you store an Object that is not comparable to objects already in the structure, you would find...TreeSet t = new TreeSet();...t.add(new String("hey");...t.add(new Integer(0)); // this compiles, but throws at runtime...Type checking at runtime is better than no type checking at all, but it requires runtime testingIt is better to do type checking automatically at compile time: let the compiler do the work!The new generics in Java 1.5 permit this...

JDK 1.5 genericsAll the Java Collections API interfaces and container classes now support full compile- time type...(The syntax is very similar to C++ templates)Examples:Vector v = new Vector();v.add(new Integer(3));v.add(new String("hey"); // compile time errorInteger i;i = v.elementAt(0); // note no downcast needed!TreeSet t = new TreeSet();t.add(new String("hey");t.add(new Integer(0)); // compile time errorHashMap h = new HashMap;h.put("abcd",new MyObj());MyObj o = h.get("some key"); // note no downcast needed!

Automatic boxing/unboxing and “enhanced for”If a container class is parameterized on a “wrapper” type, automatic boxing and unboxing is done ...LinkedList ls1 = new LinkedList();LinkedList ls2 = new LinkedList();ls1.add(3); ls1.add(8); ls1.add(12);ls2.add (ls1.removeFirst() + 2);ls2.add ( ls1.get(0) * ls1.get(1) );If a class has an iterator, a cleaner for-loop syntax is now available:

for( Integer i : ls1 ) {// whatever}And in fact if the contained type is a “wrapper” type, automatic unboxing can be used:

for( int i : ls1 ) {// whatever}

Next timeFinal review

Documents

Lecture 18 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec18.leftovers/lec18.pdfPage 1 of 35 CSE 100, UCSD: LEC 18 Lecture 18 Self-organizing data structures