35
Page 1 of 35 CSE 100, UCSD: LEC 18 Lecture 18 Self-organizing data structures Self-organizing lists Splay trees Spatial data structures K-D trees The C++ Standard Template Library Java JDK 1.5 generics Reading: Weiss, Ch. 4 and Ch. 12

Lecture 18 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec18.leftovers/lec18.pdfPage 1 of 35 CSE 100, UCSD: LEC 18 Lecture 18 Self-organizing data structures

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

  • Page 1 of 35

    CSE 100, UCSD: LEC 18

    Lecture 18

    ✔ Self-organizing data structures

    ✔ Self-organizing lists

    ✔ Splay trees

    ✔ Spatial data structures

    ✔ K-D trees

    ✔ The C++ Standard Template Library

    ✔ Java JDK 1.5 generics

    Reading: Weiss, Ch. 4 and Ch. 12

  • Page 2 of 35

    Final exam

    ns of the textbook, and

    CSE 100, UCSD: LEC 18

    ✔ Final exam time: Tue Mar 17 11:30am-2:00pm

    ✔ Location: CSB 002

    ✔ Closed book, closed notes, no calculators...

    ✔ Bring something to write with, and picture ID

    ✔ Practice final is on line

    ✔ Final exam discussion topic on Webboard

    ✔ Exam review sessions will be:

    ✗ last lecture

    ✔ Coverage: All lectures, all assignments, corresponding sectiohandouts

  • Page 3 of 35

    Self-organizing data structures

    more than just find the joint subset structure so

    e worst case, any all (practically constant)

    “in the long run”

    ll manner of data t:

    CSE 100, UCSD: LEC 18

    ✔ The path-compression find algorithm for disjoint subsets doeslabel of the subset containing an item: it also modifies the dissubsequent find operations will be faster

    ✔ Amortized analysis of path-compression find shows that in thsufficiently long sequence of find operations will have very smtime cost per operation

    ✗ ... so the additional work to compress the paths is worth it

    ✔ This idea of self-organizing data structures can be applied to astructures: lists, trees, hashtables, etc. We will look briefly a

    ✗ self-organizing lists

    ✗ splay trees

  • Page 4 of 35

    Self-organizing lists

    ge case time cost for

    ore likely to be searched

    CSE 100, UCSD: LEC 18

    ✔ Worst-case time cost for searching a linked list is O(N)

    ✔ Assuming all keys are equally likely to be searched for, averasuccessful search in a linked list is also O(N)

    ✔ But what if all keys are not equally likely?

    ✔ Then we might be able to do better by putting keys that are mfor at the beginning of the list...

  • Page 5 of 35

    Nonuniform search probabilities: exponential distribution

    /2i, except for kN which

    ent, k2 is in the second cessful search is

    ptimal ordering of the list case (still O(N) worst-

    CSE 100, UCSD: LEC 18

    ✔ Suppose there are N keys in the list: k1, ..., kN

    ✔ Suppose the probability that the user will search for key ki is 1has probability 1/2N-1

    ✔ Now if the list is arranged so that key k1 is in the first list elemlist element, etc., the average number of comparisons in a suc

    ✔ So, with under this rather extreme probabilistic assumption, opermits successful search to take constant time in the averagecase, though)

    i2i----

    i 1=

    N

    ∑N2N------+ O 1( )=

  • Page 6 of 35

    Nonuniform search probabilities: Zipf distribution

    roportional to

    ribution; it fits the

    ement, k2 is in the second cessful search is

    e list permits successful

    CSE 100, UCSD: LEC 18

    ✔ Suppose there are N keys in the list: k1, ..., kN

    ✔ Suppose the probability that the user will search for key ki is p1/( i H(i) ) , where H() is the harmonic series

    ✔ This probability distribution over keys is known as a Zipf distobserved frequencies of words in natural languages

    ✔ If as before the list is ordered so that key k1 is in the first list ellist element, etc., the average number of comparisons in a suc

    ✔ So, under this probabilistic assumption, optimal ordering of thsearch to be more efficient than O(N) in the average case

    H i( ) 1j---

    j 1=

    i

    ∑=

    iiH i( )-------------

    i 1=

    N

    ∑ ON

    Nlog------------ =

  • Page 7 of 35

    Nonuniform search probabilities: the 90/10 rule

    se, 90% of the references

    ases... Sometimes the

    e frequently referenced times faster than in a

    r

    CSE 100, UCSD: LEC 18

    ✔ The 90/10 rule: Empirical studies show that in a typical databaare to only 10% of the items

    ✔ (This is a rough qualitiative result that may not be true in all csame idea is called the 80/20 rule...)

    ✔ If the 90/10 rule holds, and the list is arranged so that the morkeys are at the front of the list, average access time will be 10randomly arranged list

    ✔ This is still O(N) average case, but with a better constant facto

  • Page 8 of 35

    Self-organizing list strategies

    ist ordered by those

    s, by moving a found key stay the same)

    o-front

    CSE 100, UCSD: LEC 18

    ✔ If you know the probabilities for each key, you could build a lprobabilities (this is called the “optimal static ordering”)

    ✔ But usually you don’t know the probabilities beforehand

    ✔ A self-organizing list will adapt to the actual pattern of searchenearer to the front of the list (insert and delete operations can

    ✔ There are two simple strategies for this: transpose and move-t

  • Page 9 of 35

    Transpose

    at key with the key in the

    linked list or as an array

    e front of the list...

    hen only the last two

    other, and will stay at the

    CSE 100, UCSD: LEC 18

    ✔ When there is a successful search for a key in the list, swap thprevious element in the list

    ✔ Easy to do, whether the list is implemented as a pointer-based

    ✔ Eventually, frequently accessed keys will tend to move near th

    ✔ ... but they are not guaranteed to do so!

    ✔ Consider this “worst case” situation: The list is created, and telements are accessed alternately

    ✗ these two elements will repeatedly be exchanged with eachend of the list

  • Page 10 of 35

    Move-to-front

    e element with that key to

    list implemented with an

    near the front of the list

    ns for a successful search that of the optimal static

    CSE 100, UCSD: LEC 18

    ✔ When there is a successful search for a key in the list, move ththe front of the list

    ✔ Easy to do in a pointer-based linked list, more expensive in a array

    ✔ Obviously, over time, frequently accessed keys will tend to be

    ✔ In fact, it can be shown that the average number of comparisousing the simple move-to-front strategy is no more than twiceordering!

  • Page 11 of 35

    Self-organizing search trees

    s as well

    ove the node containing fast

    g the node up to the root

    78] and by Bitner [1979]

    t has worst-case me, and remain, very

    CSE 100, UCSD: LEC 18

    ✔ The ideas of self-organizing lists can be applied to search tree

    ✔ Basic idea: when a search operation finds a key in the tree, mthat key to the root of the tree, so subsequent accesses will be

    ✔ This can be done using repeated AVL single rotations, rotatin

    ✗ This idea was originally proposed by Allen and Munro [19

    ✔ However, simply rotating the node with a found key to the rooamortized cost O(N) per operation (because the tree can becounbalanced)

    ✔ Something better is needed...

  • Page 12 of 35

    Splay trees

    ns after a successful find s improve subsequent

    worse on average

    lving a node, its parent,

    f M > N find operations ng run”

    ion is still O(N) )

    CSE 100, UCSD: LEC 18

    ✔ Splay trees were proposed by Sleator and Tarjan [1985]

    ✔ Splay trees are binary search trees that use “splaying” operatioto move the found key to the root of the tree. These operationsearches for that key, while not making access for other keys

    ✔ Splaying operations are composed out of AVL rotations invoand grandparent, depending on various special cases.

    ✔ The result: worst-case amortized time cost for any sequence ois O(M log N) , so splay trees act like balanced trees “in the lo

    ✔ (However, worst-case time cost for any individual find operat

  • Page 13 of 35

    Search trees and multidimensional keys

    rch trees, AVL trees, red-possible keys be

    al set (or anyway be

    CSE 100, UCSD: LEC 18

    ✔ The search trees we have discussed so far (ordinary binary seablack trees, B trees, treaps, splay trees) require that the set of completely ordered

    ✗ for each pair of keys K1, K2 exactly one of these is true:

    • K1 < K2

    • K1 = K2

    • K1 > K2

    ✔ That means that the possible keys must lie in a one-dimensionmapped to a one-dimensional set by the ordering relation)

    ✔ What to do if the keys are multidimensional?

  • Page 14 of 35

    Multidimensional keys

    f strings

    be an ordered triple of the center of the object

    CSE 100, UCSD: LEC 18

    ✔ Example 1: In a database of persons, the key may be a pair o(lastname, firstname)

    ✔ Example 2: In a database of 3D graphics objects, the key maynumbers (x,y,z) which are the Cartesian x,y,z coordinates of

    ✔ How can these keys be stored in a search tree?

  • Page 15 of 35

    Making multidimensional keys one dimensional

    h dimension is a string, it ic (dictionary) ordering:

    name2) if and only if

    rstname2

    ch dimension is a spatial

    odulus

    + z2*z2

    CSE 100, UCSD: LEC 18

    ✔ Example 1: For multidimensional keys where the value on eacoften makes sense to reduce to 1 dimension using lexicograph

    ✗ (lastname1, firstname1) > (lastname2,first lastname1 > lastname2, or lastname1 == lastname2 and firstname1 > fi

    ✔ Example 2: For multidimensional keys where the value on eacoordinate, several approaches might be considered possible:

    ✗ lexicographic ordering on the coordinates

    • (x1,y1,z1) > (x2,y2,z2) if and only if x1 > x2, or x1 == x2 and y1 > y2, or x1 == x2 and y1 == y2 and z1 > z2

    ✗ interpret coordinates as a vector, and consider its squared m

    • (x1,y1,z1) > (x2,y2,z2) if and only if x1*x1 + y1*y1 + z1*z1 > x2*x2 + y2*y2

    ✗ etc.

  • Page 16 of 35

    Keeping multidimensional keys multidimensional

    ordering information can ple:

    earest the given name, or

    trically nearest the given

    ing information you need

    ons: nearness of mparing vector moduli

    ltidimensional keys

    CSE 100, UCSD: LEC 18

    ✔ As you know, search trees require ordered keys

    ✔ One advantage of search trees (over hashtables) is that the keybe used to make some interesting queries efficient. For exam

    ✗ find the 10 names in the database that are alphabetically n

    ✗ find the 7 objects in the database whose centers are geomeobject, etc.

    ✔ But if you map multidimensions to 1D, you may lose the orderfor such queries

    ✔ This is particularly a problem in spatial or geometric applicaticoordinates is not preserved with lexicographic ordering or co

    ✔ The solution is to use a search tree that deals directly with mu

  • Page 17 of 35

    K-D trees

    rocessing of nal”)

    akes branching decisions

    L make branching

    ranching decisions on the the y coordinate; andchildren of the root

    dimension of the key, and

    l dimensions of the key)

    CSE 100, UCSD: LEC 18

    ✔ k-d trees are a variation on the BST that allows for efficient pmultidimensional keys (k-d is an abbreviation of “k-dimensio

    ✔ k-d trees differ from BST’s in that each level of the k-d tree mbased on one of the k dimensions of the key

    ✔ If a key has k dimensions 0 ,..., k-1, then tree nodes at level decisions based on key dimension L mod k

    ✔ For example, in a 3-d tree with keys (x,y,z), the root makes bvalue of the x coordinate; children of the root discriminate ongrandchildren of the root discriminate on z; children of the grdiscriminate on x, etc.

    ✔ This branching decisions are just like those in a BST

    ✗ the only difference is that they “pay attention” to only one which dimension depends on the level of the node

    ✗ (of course equality comparisons need to pay attention to al

  • Page 18 of 35

    K-D tree ordering property

    hose dimension L mod k

    ension L mod k has value

    on L mod k has value

    CSE 100, UCSD: LEC 18

    ✔ k-d tree ordering property:

    ✗ Consider any node X at level L; suppose it contains a key whas value V.

    ✗ Then all keys in the left subtree of X have keys whose dimless than V, and

    ✗ all keys in the right subtree of X have keys whose dimensigreater than or equal to V

  • Page 19 of 35

    A 2-d tree

    (70,10)

    (69,50)

    (80,90)(55,80)

    CSE 100, UCSD: LEC 18

    ✔ Points on a plane stored in a 2-dimensional k-d tree

    x

    y

    0

    0

    100

    100

    (40,45)

    (15,70)

  • Page 20 of 35

    K-D tree operations

    BST operations

    de!

    e to actually remove from

    ’s

    CSE 100, UCSD: LEC 18

    ✔ Find and insert operations in k-d trees are similar to standard

    ✔ Descend the tree, making decisions at each node

    ✔ But in a k-d tree, these decisions depend on the level of the no

    ✔ The result of an insert operation is a new leaf node

    ✔ Non-lazy delete operations are a little harder: finding the nodthe tree requires some thought

    ✔ k-d trees have the same asymptotic time costs as ordinary BST

    ✗ (balanced k-d trees are very tricky to implement)

    ✔ k-d trees can also be used to perform “region” queries

  • Page 21 of 35

    K-D tree region queries

    onal “hypercube” region

    ursively process its left

    check to see if it is to

    ult of the comparison) of

    CSE 100, UCSD: LEC 18

    ✔ Suppose you want to find all the points that lie in a k-dimensiof a certain size, centered on a certain point

    ✔ This is pretty easy to do in a k-d tree:

    ✔ Start at the root. If it is in the region, add it to the list, and recand right children.

    ✔ If any node encountered in the search is not in the region, thenbecause of the coordinate that this node is “paying attention”

    ✔ If so, then the entire left or right subtree (depending on the resthis node can be eliminated from the search

    ✔ Example: Find all keys that lie within a certain hypercube

  • Page 22 of 35

    K-D tree region queries: an example

    e point (25,50)

    (70,10)

    (69,50)

    (80,90)(55,80)

    CSE 100, UCSD: LEC 18

    ✔ Find all keys within a square 50 units on a side centered at th

    x

    y

    0

    0

    100

    100

    (40,45)

    (15,70)

  • Page 23 of 35

    Learning and using data structures

    data structures are ng else you have to write

    ++, Java 2, very nice

    they are well-written, ent for the language is

    (we have already )

    CSE 100, UCSD: LEC 18

    ✔ Learning data structures has two parts:

    ✗ learning how to design and implement data structures

    ✗ learning how to use data structures

    ✔ In language environments like ANSI C, Pascal, etc., very fewavailable by default (maybe just arrays and records); everythiyourself

    ✔ However in other language environments such as Smalltalk, Cstandard data structure libraries are available

    ✔ For many applications, it makes sense to use these if you can:well-tested, and are available wherever the standard environmavailable

    ✔ We will look briefly at the Standard Template Library for C++covered the similar Collections API with generics for JDK 1.5

  • Page 24 of 35

    The C++ Standard Template Library (STL)

    s a set of generic sses

    espond to fundamental

    rs

    fundamental operations erations in terms of these

    mutating sequence numeric operations

    nd adaptors

    TL classes)

    as part of the standard environment

    CSE 100, UCSD: LEC 18

    ✔ The Standard Template Library is a C++ library that providecontainer classes, generic algorithms, and other supporting cla

    ✔ The container classes are abstract data types (ADTs) that corrdata structures

    ✗ 2 kinds of containers: sequences and associative containe

    ✔ The container classes have member functions that implement on container instances; generic algorithms implement other op

    ✗ 4 kinds of algorithms: nonmutating sequence operations, operations, sorting and related operations, and generalized

    ✔ Important supporting classes are: STL iterators, allocators, a

    ✗ (you usually do not need to write your own allocators for S

    ✔ The ANSI/ISO C++ Standards Committee voted to adopt STLC++ library, so it ‘should’ be available in every standard C++

  • Page 25 of 35

    STL iterators

    inters: instances of e.g., the first element, last a container (e.g., next

    ries of iterators it supports

    a sequence, expressed

    ons, expressed with ++

    l, plus “long jumps” access iterator and n is an ith r+n, r-n, r[n];

    r random access iterator), s r < s, r > s, r = s, producing bool values.

    ✗ Input and output iterators: somewhat weaker than forwardused in accessing input and output streams

  • Page 26 of 35

    STL containers: sequences

    objects, all of the same

    rators.

    d erase operations at the

    s and allows worst-case sequence (once an d)

    ents is not supported

    m access iterators

    erase operations at the t and erase in the middle

    CSE 100, UCSD: LEC 18

    ✔ A sequence is a kind of container that organizes a finite set oftype, into a strictly linear arrangement

    ✔ The STL provides 3 basic sequence classes:

    ✔ vector is a kind of sequence that supports random-access ite

    ✗ In addition, it supports (amortized) constant time insert anend; insert and erase in the middle take linear time.

    ✔ list is a kind of sequence that supports bidirectional iteratorconstant time insert and erase operations anywhere within theiterator referring to the insert or erase position has been create

    ✗ Unlike vectors and deques, fast random access to list elem

    ✔ deque is a kind of sequence that, like a vector, supports rando

    ✗ In addition, it supports worst-case constant time insert andbeginning or the end (deque = double-ended queue) ; insertake linear time

  • Page 27 of 35

    STL containers: associative containers

    ata based on keys

    t, multiset, map,

    ed) on the key type, and

    associated data

    timap allow multiple

    e insert, erase, and find

    ers in ascending order of

    CSE 100, UCSD: LEC 18

    ✔ Associative containers provide an ability for fast retrieval of d

    ✔ The STL provides 4 basic kinds of associative containers: semultimap

    ✔ All of the associative containers are parameterized (templatizon an ordering relation that imposes a total order on keys

    ✔ map and multimap are also parameterized on the type of the

    ✔ set and map support only unique keys; multiset and muloccurrences of the same key

    ✔ All associative containers support bidirectional iterators

    ✔ All associative containers support worst-case logarithmic-timoperations

    ✔ Iterators for associative containers iterate through the containkey values

  • Page 28 of 35

    Implementing STL containers

    perations, and worst-case

    STL map class?

    mented using doubly-

    CSE 100, UCSD: LEC 18

    ✔ The STL containers have an interface that specifies required oor amortized time cost bounds on the operations

    ✔ What would you use to implement the STL list class? The

    ✔ In SGI’s reference implementation of the STL, list is implelinked lists, and map is implemented using red-black trees

  • Page 29 of 35

    STL algorithms

    m any container class; primitive types

    operations, mutating alized numeric operations

    d, count, mismatch,

    , fill, generate, remove,

    h, merge, set includes, set e, minimum, maximum,

    artial sum, adjacent

    CSE 100, UCSD: LEC 18

    ✔ In the STL, some algorithms are provided that are separate frothis permits using the algorithms on user-defined classes and

    ✔ There are 4 kinds of these algorithms: nonmutating sequencesequence operations, sorting and related operations, and gener

    ✔ Nonmutating sequence algorithms: foreach, find, adjacent finequal, search

    ✔ Mutating sequence algorithms: copy, swap, transform, replaceunique, reverse, rotate, random shuffle, partition

    ✔ Sorting and related algorithms: sort, nth-element, binary searcunion, set intersection, set difference, set symmetric differenclexicographic comparison, permutation generation

    ✔ Generalized numeric algorithms: accumulate, inner product, pdifference

  • Page 30 of 35

    STL adaptors

    s into one that has a

    ck, pop_back can be used used

    sh_back, pop_front can be used

    r and supporting ntiate priority_queue. In

    CSE 100, UCSD: LEC 18

    ✔ Adaptors are templated classes that “convert” a container clasdifferent, more restrictive interface

    ✔ stack: Any sequence that supports operations back, push_bato instantiate stack. In particular, vector, list and deque can be

    ✔ queue: Any sequence that supports operations front, back, pube used to instantiate queue. In particular, list and deque can

    ✔ priority queue: Any sequence with random access iteratooperations front, push_back and pop_back can be used to instaparticular, vector and deque can be used

  • Page 31 of 35

    Generic data structures and type-checking

    d design it to contain

    downcast it to the rs or properties

    out this at runtime, with a

    hrows at runtime

    CSE 100, UCSD: LEC 18

    ✔ In (pre-1.5) Java, to design a generic data structure, you woulObjects

    ✔ Then when you get an object out of the container, you wouldappropriate subclass if you need to access its subclass behavio

    ✔ If some other kind of object was stored, you would find out abClassCastException. Example:

    Vector v = new Vector();

    v.add(new Integer(3));

    v.add(new String("hey");

    ...

    Integer i, String s;

    i = (Integer) v.elementAt(0); // okay

    s = (String) v.elementAt(0); // compiles, but t

  • Page 32 of 35

    Sorted generic data structures and type checking

    u would design it to

    y in the structure, you ception. Example:

    rows at runtime

    l, but it requires runtime

    let the compiler do the

    CSE 100, UCSD: LEC 18

    ✔ In (pre-1.5) Java, to design a generic sorted data structure, yocontain Comparables, or Objects that use a Comparator

    ✔ If you store an Object that is not comparable to objects alreadwould find out about this only at runtime, with a ClassCastEx

    TreeSet t = new TreeSet();

    ...

    t.add(new String("hey");

    ...

    t.add(new Integer(0)); // this compiles, but th

    ...

    ✔ Type checking at runtime is better than no type checking at altesting

    ✔ It is better to do type checking automatically at compile time:work!

    ✔ The new generics in Java 1.5 permit this...

  • Page 33 of 35

    JDK 1.5 generics

    now support full compile-ine your own, as well)

    bj>;

    t needed!

    CSE 100, UCSD: LEC 18

    ✔ All the Java Collections API interfaces and container classes time type checking via parameterized class types (you can def

    ✔ (The syntax is very similar to C++ templates)

    ✔ Examples:Vector v = new Vector();

    v.add(new Integer(3));

    v.add(new String("hey"); // compile time error

    Integer i;

    i = v.elementAt(0); // note no downcast needed!

    TreeSet t = new TreeSet();

    t.add(new String("hey");

    t.add(new Integer(0)); // compile time error

    HashMap h = new HashMap

  • Page 34 of 35

    Automatic boxing/unboxing and “enhanced for”

    matic boxing and type. Example:

    >();

    >();

    lable:

    ic unboxing can be used:

    CSE 100, UCSD: LEC 18

    ✔ If a container class is parameterized on a “wrapper” type, autounboxing is done with respect to the corresponding primitive

    LinkedList ls1 = new LinkedList

  • Page 35 of 35

    Next time

    CSE 100, UCSD: LEC 18

    ✔ Final review

    Lecture 18Self-organizing data structuresSelf-organizing listsSplay treesSpatial data structuresK-D treesThe C++ Standard Template LibraryJava JDK 1.5 generics Reading: Weiss, Ch. 4 and Ch. 12

    Final examFinal exam time: Tue Mar 17 11:30am-2:00pmLocation: CSB 002Closed book, closed notes, no calculators...Bring something to write with, and picture IDPractice final is on lineFinal exam discussion topic on WebboardExam review sessions will be:last lecture

    Coverage: All lectures, all assignments, corresponding sections of the textbook, and handouts

    Self-organizing data structuresThe path-compression find algorithm for disjoint subsets does more than just find the label of th...Amortized analysis of path-compression find shows that in the worst case, any sufficiently long s...... so the additional work to compress the paths is worth it “in the long run”

    This idea of self-organizing data structures can be applied to all manner of data structures: lis...self-organizing listssplay trees

    Self-organizing listsWorst-case time cost for searching a linked list is O(N)Assuming all keys are equally likely to be searched for, average case time cost for successful se...But what if all keys are not equally likely?Then we might be able to do better by putting keys that are more likely to be searched for at the...

    Nonuniform search probabilities: exponential distributionSuppose there are N keys in the list: k1, ..., kNSuppose the probability that the user will search for key ki is 1/2i, except for kN which has pro...Now if the list is arranged so that key k1 is in the first list element, k2 is in the second list...So, with under this rather extreme probabilistic assumption, optimal ordering of the list permits...

    Nonuniform search probabilities: Zipf distributionSuppose there are N keys in the list: k1, ..., kNSuppose the probability that the user will search for key ki is proportional to 1/( i H(i) ) , wh...This probability distribution over keys is known as a Zipf distribution; it fits the observed fre...If as before the list is ordered so that key k1 is in the first list element, k2 is in the second...So, under this probabilistic assumption, optimal ordering of the list permits successful search t...

    Nonuniform search probabilities: the 90/10 ruleThe 90/10 rule: Empirical studies show that in a typical database, 90% of the references are to o...(This is a rough qualitiative result that may not be true in all cases... Sometimes the same idea...If the 90/10 rule holds, and the list is arranged so that the more frequently referenced keys are...This is still O(N) average case, but with a better constant factor

    Self-organizing list strategiesIf you know the probabilities for each key, you could build a list ordered by those probabilities...But usually you don’t know the probabilities beforehandA self-organizing list will adapt to the actual pattern of searches, by moving a found key nearer...There are two simple strategies for this: transpose and move-to-front

    TransposeWhen there is a successful search for a key in the list, swap that key with the key in the previo...Easy to do, whether the list is implemented as a pointer-based linked list or as an arrayEventually, frequently accessed keys will tend to move near the front of the list...... but they are not guaranteed to do so!Consider this “worst case” situation: The list is created, and then only the last two elements ar...these two elements will repeatedly be exchanged with each other, and will stay at the end of the ...

    Move-to-frontWhen there is a successful search for a key in the list, move the element with that key to the fr...Easy to do in a pointer-based linked list, more expensive in a list implemented with an arrayObviously, over time, frequently accessed keys will tend to be near the front of the listIn fact, it can be shown that the average number of comparisons for a successful search using the...

    Self-organizing search treesThe ideas of self-organizing lists can be applied to search trees as wellBasic idea: when a search operation finds a key in the tree, move the node containing that key to...This can be done using repeated AVL single rotations, rotating the node up to the rootThis idea was originally proposed by Allen and Munro [1978] and by Bitner [1979]

    However, simply rotating the node with a found key to the root has worst-case amortized cost O(N)...Something better is needed...

    Splay treesSplay trees were proposed by Sleator and Tarjan [1985]Splay trees are binary search trees that use “splaying” operations after a successful find to mov...Splaying operations are composed out of AVL rotations involving a node, its parent, and grandpare...The result: worst-case amortized time cost for any sequence of M > N find operations is O(M log N...(However, worst-case time cost for any individual find operation is still O(N) )

    Search trees and multidimensional keysThe search trees we have discussed so far (ordinary binary search trees, AVL trees, red- black tr...for each pair of keys K1, K2 exactly one of these is true:• K1 < K2• K1 = K2• K1 > K2

    That means that the possible keys must lie in a one-dimensional set (or anyway be mapped to a one...What to do if the keys are multidimensional?

    Multidimensional keysExample 1: In a database of persons, the key may be a pair of strings (lastname, firstname)Example 2: In a database of 3D graphics objects, the key may be an ordered triple of numbers (x,y...How can these keys be stored in a search tree?

    Making multidimensional keys one dimensionalExample 1: For multidimensional keys where the value on each dimension is a string, it often make...(lastname1, firstname1) > (lastname2,firstname2) if and only if lastname1 > lastname2, or lastnam...

    Example 2: For multidimensional keys where the value on each dimension is a spatial coordinate, s...lexicographic ordering on the coordinates• (x1,y1,z1) > (x2,y2,z2) if and only if x1 > x2, or x1 == x2 and y1 > y2, or x1 == x2 and y1 == ...

    interpret coordinates as a vector, and consider its squared modulus• (x1,y1,z1) > (x2,y2,z2) if and only if x1*x1 + y1*y1 + z1*z1 > x2*x2 + y2*y2 + z2*z2

    etc.

    Keeping multidimensional keys multidimensionalAs you know, search trees require ordered keysOne advantage of search trees (over hashtables) is that the key ordering information can be used ...find the 10 names in the database that are alphabetically nearest the given name, orfind the 7 objects in the database whose centers are geometrically nearest the given object, etc.

    But if you map multidimensions to 1D, you may lose the ordering information you need for such que...This is particularly a problem in spatial or geometric applications: nearness of coordinates is n...The solution is to use a search tree that deals directly with multidimensional keys

    K-D treesk-d trees are a variation on the BST that allows for efficient processing of multidimensional key...k-d trees differ from BST’s in that each level of the k-d tree makes branching decisions based on...If a key has k dimensions 0 ,..., k-1, then tree nodes at level L make branching decisions based ...For example, in a 3-d tree with keys (x,y,z), the root makes branching decisions on the value of ...This branching decisions are just like those in a BSTthe only difference is that they “pay attention” to only one dimension of the key, and which dime...(of course equality comparisons need to pay attention to all dimensions of the key)

    K-D tree ordering propertyk-d tree ordering property:Consider any node X at level L; suppose it contains a key whose dimension L mod k has value V.Then all keys in the left subtree of X have keys whose dimension L mod k has value less than V, andall keys in the right subtree of X have keys whose dimension L mod k has value greater than or eq...

    A 2-d treePoints on a plane stored in a 2-dimensional k-d tree

    K-D tree operationsFind and insert operations in k-d trees are similar to standard BST operationsDescend the tree, making decisions at each nodeBut in a k-d tree, these decisions depend on the level of the node!The result of an insert operation is a new leaf nodeNon-lazy delete operations are a little harder: finding the node to actually remove from the tree...k-d trees have the same asymptotic time costs as ordinary BST’s(balanced k-d trees are very tricky to implement)

    k-d trees can also be used to perform “region” queries

    K-D tree region queriesSuppose you want to find all the points that lie in a k-dimensional “hypercube” region of a certa...This is pretty easy to do in a k-d tree:Start at the root. If it is in the region, add it to the list, and recursively process its left a...If any node encountered in the search is not in the region, then check to see if it is because of...If so, then the entire left or right subtree (depending on the result of the comparison) of this ...Example: Find all keys that lie within a certain hypercube

    K-D tree region queries: an exampleFind all keys within a square 50 units on a side centered at the point (25,50)

    Learning and using data structuresLearning data structures has two parts:learning how to design and implement data structureslearning how to use data structures

    In language environments like ANSI C, Pascal, etc., very few data structures are available by def...However in other language environments such as Smalltalk, C++, Java 2, very nice standard data st...For many applications, it makes sense to use these if you can: they are well-written, well-tested...We will look briefly at the Standard Template Library for C++ (we have already covered the simila...

    The C++ Standard Template Library (STL)The Standard Template Library is a C++ library that provides a set of generic container classes, ...The container classes are abstract data types (ADTs) that correspond to fundamental data structures2 kinds of containers: sequences and associative containers

    The container classes have member functions that implement fundamental operations on container in...4 kinds of algorithms: nonmutating sequence operations, mutating sequence operations, sorting and...

    Important supporting classes are: STL iterators, allocators, and adaptors(you usually do not need to write your own allocators for STL classes)

    The ANSI/ISO C++ Standards Committee voted to adopt STL as part of the standard C++ library, so i...

    STL iteratorsSTL iterators are classes that generalize the idea of C/C++ pointers: instances of iterator class...A container class will be characterized in part by what categories of iterators it supportsThe STL provides 5 categories of iterators:Forward iterators: provide for one-directional traversal of a sequence, expressed with ++Bidirectional iterators: provide for traversal in both directions, expressed with ++ and --Random access iterators: provide for bidirectional traversal, plus “long jumps” expressed with r ...Input and output iterators: somewhat weaker than forward iterators, and are mainly used in access...

    STL containers: sequencesA sequence is a kind of container that organizes a finite set of objects, all of the same type, i...The STL provides 3 basic sequence classes:vector is a kind of sequence that supports random-access iterators.In addition, it supports (amortized) constant time insert and erase operations at the end; insert...

    list is a kind of sequence that supports bidirectional iterators and allows worst-case constant t...Unlike vectors and deques, fast random access to list elements is not supported

    deque is a kind of sequence that, like a vector, supports random access iteratorsIn addition, it supports worst-case constant time insert and erase operations at the beginning or...

    STL containers: associative containersAssociative containers provide an ability for fast retrieval of data based on keysThe STL provides 4 basic kinds of associative containers: set, multiset, map, multimapAll of the associative containers are parameterized (templatized) on the key type, and on an orde...map and multimap are also parameterized on the type of the associated dataset and map support only unique keys; multiset and multimap allow multiple occurrences of the sam...All associative containers support bidirectional iteratorsAll associative containers support worst-case logarithmic-time insert, erase, and find operationsIterators for associative containers iterate through the containers in ascending order of key values

    Implementing STL containersThe STL containers have an interface that specifies required operations, and worst-case or amorti...What would you use to implement the STL list class? The STL map class?In SGI’s reference implementation of the STL, list is implemented using doubly- linked lists, and...

    STL algorithmsIn the STL, some algorithms are provided that are separate from any container class; this permits...There are 4 kinds of these algorithms: nonmutating sequence operations, mutating sequence operati...Nonmutating sequence algorithms: foreach, find, adjacent find, count, mismatch, equal, searchMutating sequence algorithms: copy, swap, transform, replace, fill, generate, remove, unique, rev...Sorting and related algorithms: sort, nth-element, binary search, merge, set includes, set union,...Generalized numeric algorithms: accumulate, inner product, partial sum, adjacent difference

    STL adaptorsAdaptors are templated classes that “convert” a container class into one that has a different, mo...stack: Any sequence that supports operations back, push_back, pop_back can be used to instantiate...queue: Any sequence that supports operations front, back, push_back, pop_front can be used to ins...priority queue: Any sequence with random access iterator and supporting operations front, push_ba...

    Generic data structures and type-checkingIn (pre-1.5) Java, to design a generic data structure, you would design it to contain ObjectsThen when you get an object out of the container, you would downcast it to the appropriate subcla...If some other kind of object was stored, you would find out about this at runtime, with a ClassCa...Vector v = new Vector();v.add(new Integer(3));v.add(new String("hey");...Integer i, String s;i = (Integer) v.elementAt(0); // okays = (String) v.elementAt(0); // compiles, but throws at runtime

    Sorted generic data structures and type checkingIn (pre-1.5) Java, to design a generic sorted data structure, you would design it to contain Comp...If you store an Object that is not comparable to objects already in the structure, you would find...TreeSet t = new TreeSet();...t.add(new String("hey");...t.add(new Integer(0)); // this compiles, but throws at runtime...Type checking at runtime is better than no type checking at all, but it requires runtime testingIt is better to do type checking automatically at compile time: let the compiler do the work!The new generics in Java 1.5 permit this...

    JDK 1.5 genericsAll the Java Collections API interfaces and container classes now support full compile- time type...(The syntax is very similar to C++ templates)Examples:Vector v = new Vector();v.add(new Integer(3));v.add(new String("hey"); // compile time errorInteger i;i = v.elementAt(0); // note no downcast needed!TreeSet t = new TreeSet();t.add(new String("hey");t.add(new Integer(0)); // compile time errorHashMap h = new HashMap;h.put("abcd",new MyObj());MyObj o = h.get("some key"); // note no downcast needed!

    Automatic boxing/unboxing and “enhanced for”If a container class is parameterized on a “wrapper” type, automatic boxing and unboxing is done ...LinkedList ls1 = new LinkedList();LinkedList ls2 = new LinkedList();ls1.add(3); ls1.add(8); ls1.add(12);ls2.add (ls1.removeFirst() + 2);ls2.add ( ls1.get(0) * ls1.get(1) );If a class has an iterator, a cleaner for-loop syntax is now available:

    for( Integer i : ls1 ) {// whatever}And in fact if the contained type is a “wrapper” type, automatic unboxing can be used:

    for( int i : ls1 ) {// whatever}

    Next timeFinal review