Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient...

Preview:

Citation preview

Code Specialization for Memory Efficient

Hash Tries

Michael Steindorfer, Jurgen Vinju Centrum Wiskunde & Informatica, Amsterdam, The Netherlands

0%

25%

50%

75%

100%

MapGeneric Specialized

45%

100%

2

0%

25%

50%

75%

100%

SetGeneric Specialized

22%

100%

3

• memory usage vs runtime

• size of source code or binary

• platform specifics

Hash Tries

Hash TriesFast Immutable Data Structures on the JVM

Hash Tries(Wide) Hash-Prefix Trees with Array Nodes

{32, 2, 4098, 34}

8

2

320 1 2 3 4 5 6 7 8

...... 31

034

1 2 3 4 5 6 7 8...

... 31

20 1 2 3

40984 5 6 7 8

...... 31

(a) 33 and 2

320 1 2 3 4 5 6

...... 31

034

1 2 3 4 5 6...

... 31

20 1 2 3

40984 5 6

...... 31

(b) 33 and 2

Figure 1. Inserting a sequence of numbers in a hash array mapped trie. The small index numbers refer to theposition in a sparse array.

insertion calculates the number’s hash code⇤:

hash(32) = . . . 000 00001 000002 = 0 0 0 0 0 1 032

hash(2) = . . . 000 00000 000102 = 0 0 0 0 0 0 232

hash(4098) = . . . 100 00000 000102 = 0 0 0 0 4 0 232

hash(34) = . . . 000 00001 000102 = 0 0 0 0 0 1 232

We first separate the hash codes in chunks of 5-bit to notate chunks as decimal values with rangesfrom 0 to 31. Then insertion places the values in a 32-nary tree (where each is encoded as a sparsearray), based on the hash code prefixes. The tree structure gets expanded until every prefix can beunambiguously stored.

To continue our example: 32 is inserted at the root node; 2 as well (because they do not sharea common prefix). 4098 shares the prefix path !0!2 with value 2, consequently it is placedunambiguously on level 3. 32 shares the prefix path !2 with 2 and 4098, but can be differentiatedon level 2 from both.

Note, that a chunk size of 5-bit for 32-bit hash codes results in trees with a maximal depth ofdlog32(232)e = 7.

In contrast Figure 3 illustrates an array-based hash-table. By comparing the visualizations of bothdata structures we can identify the following list of disadvantages of HAMTs over array-based hashtables (Section ?? provides evidence for our claims):

• Lookup has to follow a tree path of length between 1–7 nodes answer containment queries.Memory indirections and indexing into sparse arrays is less efficient than performing a singleindex-based lookup in continuous array.

⇤We assume a hash function for integers returns the argument, i.e. identity.

Copyright c� 0000 John Wiley & Sons, Ltd. Softw. Pract. Exper. (0000)Prepared using speauth.cls DOI: 10.1002/spe

2

320 1 2 3 4 5 6 7 8

...... 31

034

1 2 3 4 5 6 7 8...

... 31

20 1 2 3

40984 5 6 7 8

...... 31

(a) 33 and 2

320 1 2 3 4 5 6

...... 31

034

1 2 3 4 5 6...

... 31

20 1 2 3

40984 5 6

...... 31

(b) 33 and 2

Figure 1. Inserting a sequence of numbers in a hash array mapped trie. The small index numbers refer to theposition in a sparse array.

insertion calculates the number’s hash code⇤:

hash(32) = . . . 000 00001 000002 = 0 0 0 0 0 1 032

hash(2) = . . . 000 00000 000102 = 0 0 0 0 0 0 232

hash(4098) = . . . 100 00000 000102 = 0 0 0 0 4 0 232

hash(34) = . . . 000 00001 000102 = 0 0 0 0 0 1 232

We first separate the hash codes in chunks of 5-bit to notate chunks as decimal values with rangesfrom 0 to 31. Then insertion places the values in a 32-nary tree (where each is encoded as a sparsearray), based on the hash code prefixes. The tree structure gets expanded until every prefix can beunambiguously stored.

To continue our example: 32 is inserted at the root node; 2 as well (because they do not sharea common prefix). 4098 shares the prefix path !0!2 with value 2, consequently it is placedunambiguously on level 3. 32 shares the prefix path !2 with 2 and 4098, but can be differentiatedon level 2 from both.

Note, that a chunk size of 5-bit for 32-bit hash codes results in trees with a maximal depth ofdlog32(232)e = 7.

In contrast Figure 3 illustrates an array-based hash-table. By comparing the visualizations of bothdata structures we can identify the following list of disadvantages of HAMTs over array-based hashtables (Section ?? provides evidence for our claims):

• Lookup has to follow a tree path of length between 1–7 nodes answer containment queries.Memory indirections and indexing into sparse arrays is less efficient than performing a singleindex-based lookup in continuous array.

⇤We assume a hash function for integers returns the argument, i.e. identity.

Copyright c� 0000 John Wiley & Sons, Ltd. Softw. Pract. Exper. (0000)Prepared using speauth.cls DOI: 10.1002/spe

9

abstract class TrieSet implements java.util.Set {

TrieNode root; int size;

class TrieNode { int bitmap; Object[] contentAndSubTries; } }

4

0 2

320 1

0 434

2 4098

(a) Scala

320 2

034

1

20

40984

(b) Clojure

232

0

034

1

20

40984

(c) Clojure

Figure 3. Conceptual difference in tree layout between Clojure’s and Scala’s HAMT implementations.

Figure 4. Footprints of HAMT sets and HAMT maps in 32-bit and 64-bit environments. Defaulting to 5-bitprefix chunks.

subnode pointers. Michael IDon’t talk explicitly about Clojure/Scala, rather about mixed/separatedHAMT node designs.I

Division between internal nodes and leaf nodes. Scala’s HAMT implementations divide the treestructure into internal nodes and leaf nodes. The internal nodes amount for the hash-based prefixtree structure. The leaf nodes encapsulate the data tuple (i.e, a key in case of a set, a key/value pair

Copyright c� 0000 John Wiley & Sons, Ltd. Softw. Pract. Exper. (0000)Prepared using speauth.cls DOI: 10.1002/spe

10

abstract class TrieSet implements java.util.Set {

TrieNode root; int size;

class TrieNode { int bitmap; Object[] contentAndSubTries; } }

4

0 2

320 1

0 434

2 4098

(a) Scala

320 2

034

1

20

40984

(b) Clojure

232

0

034

1

20

40984

(c) Clojure

Figure 3. Conceptual difference in tree layout between Clojure’s and Scala’s HAMT implementations.

Figure 4. Footprints of HAMT sets and HAMT maps in 32-bit and 64-bit environments. Defaulting to 5-bitprefix chunks.

subnode pointers. Michael IDon’t talk explicitly about Clojure/Scala, rather about mixed/separatedHAMT node designs.I

Division between internal nodes and leaf nodes. Scala’s HAMT implementations divide the treestructure into internal nodes and leaf nodes. The internal nodes amount for the hash-based prefixtree structure. The leaf nodes encapsulate the data tuple (i.e, a key in case of a set, a key/value pair

Copyright c� 0000 John Wiley & Sons, Ltd. Softw. Pract. Exper. (0000)Prepared using speauth.cls DOI: 10.1002/spe

11

abstract class TrieSet implements java.util.Set {

TrieNode root; int size;

class TrieNode { int bitmap; Object[] contentAndSubTries; } }

4

0 2

320 1

0 434

2 4098

(a) Scala

320 2

034

1

20

40984

(b) Clojure

232

0

034

1

20

40984

(c) Clojure

Figure 3. Conceptual difference in tree layout between Clojure’s and Scala’s HAMT implementations.

Figure 4. Footprints of HAMT sets and HAMT maps in 32-bit and 64-bit environments. Defaulting to 5-bitprefix chunks.

subnode pointers. Michael IDon’t talk explicitly about Clojure/Scala, rather about mixed/separatedHAMT node designs.I

Division between internal nodes and leaf nodes. Scala’s HAMT implementations divide the treestructure into internal nodes and leaf nodes. The internal nodes amount for the hash-based prefixtree structure. The leaf nodes encapsulate the data tuple (i.e, a key in case of a set, a key/value pair

Copyright c� 0000 John Wiley & Sons, Ltd. Softw. Pract. Exper. (0000)Prepared using speauth.cls DOI: 10.1002/spe

12

13

... class NodeNode extends TrieNode { int bitmap; TrieNode nodeAtIndex0; TrieNode nodeAtIndex1; } class ElementNode extends TrieNode { int bitmap; Object keyAtIndex0; TrieNode nodeAtIndex1; } class NodeElement extends TrieNode { int bitmap; TrieNode nodeAtIndex0; Object keyAtIndex1; } ...

class TrieNode { int bitmap; Object[] contentAndSubTries; }

class TrieNode { int bitmap; Object[] contentAndSubTries; }

14

... class NodeNode extends TrieNode { int bitmap; TrieNode nodeAtIndex0; TrieNode nodeAtIndex1; } class ElementNode extends TrieNode { int bitmap; Object keyAtIndex0; TrieNode nodeAtIndex1; } class NodeElement extends TrieNode { int bitmap; TrieNode nodeAtIndex0; Object keyAtIndex1; } ...

ExponentialNumber of Specializations

15

Memory Overhead per Pointer (Set, 32-bit)

0 Bytes

8 Bytes

16 Bytes

24 Bytes

32 Bytes

40 Bytes

48 Bytes

1-ary 2-ary 3-ary 4-ary 5-ary 6-ary 7-ary 8-ary 9-ary 10-ary 11-ary 12-ary

7,38,08,08,99,010,310,712,814,0

18,7

24,0

48,0

16

Frequency by Node Arity

0%

10%

20%

30%

40%

50%

60%

70%

0-ary 1-ary 2-ary 3-ary 4-ary 5-ary 6-ary 7-ary 8-ary 9-ary 10-ary 11-ary 12-ary

1%1%1%1%1%1%1%1%3%

14%

63%

1%0%

17

Arities % of Nodes

≤4 82%

≤8 86%

≤12 90%

18

Arities Specializations

≤4 31

≤8 511

≤12 8191

19

Avoiding Permutations

20

Arities Specializations

≤4 15 (31)

≤8 45 (511)

≤12 91 (8191)

21

abstract class TrieSet implements java.util.Set { TrieNode root; int size;

interface TrieNode { ... } ... class NodeNode extends TrieNode { int bitmap; TrieNode nodeAtIndex0; TrieNode nodeAtIndex1; } class ElementNode extends TrieNode { int bitmap; Object keyAtIndex0; TrieNode nodeAtIndex1; } class NodeElement extends TrieNode { int bitmap; TrieNode nodeAtIndex0; Object keyAtIndex1; } class ElementElement extends TrieNode { int bitmap; Object keyAtIndex0; Object keyAtIndex1; } ... }

4

0 2

320 1

0 434

2 4098

(a) Scala

320 2

034

1

20

40984

(b) Clojure

232

0

034

1

20

40984

(c) Clojure

Figure 3. Conceptual difference in tree layout between Clojure’s and Scala’s HAMT implementations.

Figure 4. Footprints of HAMT sets and HAMT maps in 32-bit and 64-bit environments. Defaulting to 5-bitprefix chunks.

subnode pointers. Michael IDon’t talk explicitly about Clojure/Scala, rather about mixed/separatedHAMT node designs.I

Division between internal nodes and leaf nodes. Scala’s HAMT implementations divide the treestructure into internal nodes and leaf nodes. The internal nodes amount for the hash-based prefixtree structure. The leaf nodes encapsulate the data tuple (i.e, a key in case of a set, a key/value pair

Copyright c� 0000 John Wiley & Sons, Ltd. Softw. Pract. Exper. (0000)Prepared using speauth.cls DOI: 10.1002/spe

22

23

4

0 2

320 1

0 434

2 4098

(a) Scala

320 2

034

1

20

40984

(b) Clojure

232

0

034

1

20

40984

(c) Clojure

Figure 3. Conceptual difference in tree layout between Clojure’s and Scala’s HAMT implementations.

Figure 4. Footprints of HAMT sets and HAMT maps in 32-bit and 64-bit environments. Defaulting to 5-bitprefix chunks.

subnode pointers. Michael IDon’t talk explicitly about Clojure/Scala, rather about mixed/separatedHAMT node designs.I

Division between internal nodes and leaf nodes. Scala’s HAMT implementations divide the treestructure into internal nodes and leaf nodes. The internal nodes amount for the hash-based prefixtree structure. The leaf nodes encapsulate the data tuple (i.e, a key in case of a set, a key/value pair

Copyright c� 0000 John Wiley & Sons, Ltd. Softw. Pract. Exper. (0000)Prepared using speauth.cls DOI: 10.1002/spe

abstract class TrieSet implements java.util.Set { TrieNode root; int size;

interface TrieNode { ... } ... class NodeNode extends TrieNode {

byte pos1; TrieNode nodeAtPos1; byte pos2; TrieNode nodeAtPos2; } class ElementNode extends TrieNode {

byte pos1; Object keyAtPos1; byte pos2; TrieNode nodeAtPos2; } class NodeElement extends TrieNode {

byte pos1; TrieNode nodeAtPos1; byte pos2; Object keyAtPos2; } class ElementElement extends TrieNode {

byte pos1; Object keyAtPos1; byte pos2; Object keyAtPos2; } ... }

Lookup Performance (lower is better)

0%

25%

50%

75%

100%

125%

150%

MapGeneric Specialized 0-4 Specialized 0-8 Specialized 0-12

138%138%130%

100%

24

Memory Usage (lower is better)

0%

25%

50%

75%

100%

Map Set

22%

45%

23%

46%52%

62%

100%100% Generic0-40-80-12

25

Memory Usage (lower is better)

0%

25%

50%

75%

100%

Map Set

22%

45%

23%

46%52%

62%

100%100% Generic0-40-80-12

26

Memory Footprint Compared To Competition (lower is better)

0x

1x

2x

3x

4x

5x

Specialized 0-8 Clojure Scala

3,75x

2,2x

1x

4,9x

1,6x

1x

MapSet

27

worst hash distribution ->

good memory performance

28

best hash distribution ->

worst memory performance

29

best hash distribution ->

worst memory performancebest

30

Recommended