Upload
ruby-casey
View
224
Download
0
Embed Size (px)
Citation preview
Sets of Digital DataCSCI 2720Fall 2005Kraemer
Digital Data In earlier work with BSTs and various balanced trees, we compared keys for order or equalityHere, we take advantage of structure of keyUse it as an index, orDecompose string key into characters, orTreat key as numerical quantity on which we can perform operations
AssumptionsWe will construct and manipulate sets thatAre drawn from a universe U of size NU = {u0, uN-1}A relatively simple procedure exists by which we can compute, for an element u U, the index i such that u = ui.Easy if U is set of integersAlso easy if U is set of characters with character codes in a contiguous interval
Bit VectorUsed to represent a subset S UA table of N bits, Bits[0.. N-1] Bits[i] == 1 if ui SBits[i] == 0 if ui SExample: todays attendance
1 1 0 1 0 1 10 1 2 3 4 5 6 -- student number1 = present0 = absent
Bit VectorsAssume:determining element index takes constant timeaccessing position in table takes constant timeMay actually take several ops, and depend somewhat on N(size of universe), but not on size of set representedThen:Insert, Delete, Member are constant time ops
Bit VectorsA subset of a set of size N always takes N bits to represent, independent of size of subsetMakes sense if:N is not too large need to represent sets of size comparable to N
Storage EfficiencyBit Vector vs. Binary TreesBinary Tree, set of size nRequires n(2p + K) bitsK >= lg N, size of field to represent key valuep = number of bits in a pointerBit Vector, takes N bitsIf n N, then bit vector more efficientIf p = K = 32, then tree becomes more space efficient when n/N 1%Actually, when n(2p + K) = N, which is when n/N = 1/96
When to use Bit Vectors?When universe is relatively smallWhen sets are large in relation to size of universe
Advantages of Bit VectorsO(1) implementation of Insert, Delete, MemberUnion and Intersection easyImplement via Boolean and and or operationsMay actually take less than one op/element, as operations are performed on full machine word If machine word == 32, then one machine operation handles 32 potential elements of set
Disadvantages of Bit VectorsOn some computers access to individual bits can require shifting and masking operations (expensive)Result is that Member may be much more expensive than Union
Initialization takes (N) -- zero all the bits in the vectorBut can use constant time initialization algorithmBut that makes storage requirement go to 2p + 1 bits per elementSo, in practice, just use machine ops to set to zero, which are efficient
Tries and Digital Search TreesIf the key can be decomposed into characters, then the characters of the key can be used as indicesTries are based on this ideatrie is the middle symbol of retrieval, a pun on tree, but pronounced try
TriesAssume k possible character valuesA trie is a (k+1)-ary treeeach node a table of k+1 pointersOne pointer for each possible characterOne for the end of string character,
Trie Example
TriesPath for key of m characters is length m, with pointer at Dont need to store key itself .. It is the path followed. Info field might be pointed to by element
Tries: AnalysisLet:n be the number of keys stored in a triel be the length(in characters) of the longest keys be the number of nodes in the triek be the size of the alphabetPro:Access time is O(l), independent of k, n and sCon:Size -- requires (k+1) * s * p bitsMost pointers are null, so lots of wasted space
Strategies for reducing storage requirements of triesImplement a k-ary trie with m nodes as a 2-D, m by k table
A B C D E M . P . T . 012345
------1-2-3--45-----------6---7--8----------------9------------10-
Table approachNumber the nodes in the diagram of slide 13 from 1 to mThe table entry corresponding to jth child of ith node is the index of the child node
How does that save space? Just as many nodes and elements as on slide 13 need only ceil(lg(m)) bits to represent, smaller than a pointer
Patricia Tree:Another strategy for reducing space in a triePatricia treePractical Algorithm to Retrieve Information Coded in AlphanumericEliminate nodes with only one nonempty child Can now skip right from T to in TURING in our exampleSkip from MA . To E or in the MENDEL , MENDELEEV chainBut need to store with each node the index of the character on which it discriminatesAnd need to store the key itself at the leaf
Patricia tree
de la Briandais treesAnother strategy to save space vs. standard triesUse a linked list instead of a table at the node levelEach pointer labeled with the character it indexeslonger search time than tries; depends on size of character setsaves significant amounts of memory
de la Briandais
Another strategy Use tries at the first few levelsUse ordinary BSTs or de la Briandais at the lower levelsreasoning: speed advantage at the top, but not too much extra memory requiredsave space at lower levels
Digital Search TreesTreat keys as bit strings (strings over the alphabet {0,1})Binary tree search directed left on 0, right on 1Each node contains not only two pointers, but also contains a key that matches that string prefixCompare for equality before searching left or rightIf frequencies are known, store higher frequency keys nearer rootCan be grown dynamicallyExpected Search time: O(log n)
Digital Search Tree