Code Specialization for Memory Efﬁcient Hash Tries · Code Specialization for Memory Efﬁcient...

Code Specialization for Memory Efficient

Hash Tries

Michael Steindorfer, Jurgen Vinju Centrum Wiskunde & Informatica, Amsterdam, The Netherlands

MapGeneric Specialized

SetGeneric Specialized

• memory usage vs runtime

• size of source code or binary

• platform specifics

Hash Tries

Hash TriesFast Immutable Data Structures on the JVM

Hash Tries(Wide) Hash-Prefix Trees with Array Nodes

{32, 2, 4098, 34}

320 1 2 3 4 5 6 7 8

...... 31

1 2 3 4 5 6 7 8...

... 31

20 1 2 3

40984 5 6 7 8

...... 31

(a) 33 and 2

320 1 2 3 4 5 6

...... 31

1 2 3 4 5 6...

... 31

20 1 2 3

40984 5 6

...... 31

(b) 33 and 2

Figure 1. Inserting a sequence of numbers in a hash array mapped trie. The small index numbers refer to theposition in a sparse array.

insertion calculates the number’s hash code⇤:

hash(32) = . . . 000 00001 000002 = 0 0 0 0 0 1 032

hash(2) = . . . 000 00000 000102 = 0 0 0 0 0 0 232

hash(4098) = . . . 100 00000 000102 = 0 0 0 0 4 0 232

hash(34) = . . . 000 00001 000102 = 0 0 0 0 0 1 232

We first separate the hash codes in chunks of 5-bit to notate chunks as decimal values with rangesfrom 0 to 31. Then insertion places the values in a 32-nary tree (where each is encoded as a sparsearray), based on the hash code prefixes. The tree structure gets expanded until every prefix can beunambiguously stored.

To continue our example: 32 is inserted at the root node; 2 as well (because they do not sharea common prefix). 4098 shares the prefix path !0!2 with value 2, consequently it is placedunambiguously on level 3. 32 shares the prefix path !2 with 2 and 4098, but can be differentiatedon level 2 from both.

Note, that a chunk size of 5-bit for 32-bit hash codes results in trees with a maximal depth ofdlog32(232)e = 7.

In contrast Figure 3 illustrates an array-based hash-table. By comparing the visualizations of bothdata structures we can identify the following list of disadvantages of HAMTs over array-based hashtables (Section ?? provides evidence for our claims):

• Lookup has to follow a tree path of length between 1–7 nodes answer containment queries.Memory indirections and indexing into sparse arrays is less efficient than performing a singleindex-based lookup in continuous array.

⇤We assume a hash function for integers returns the argument, i.e. identity.

Copyright c� 0000 John Wiley & Sons, Ltd. Softw. Pract. Exper. (0000)Prepared using speauth.cls DOI: 10.1002/spe

320 1 2 3 4 5 6 7 8

...... 31

1 2 3 4 5 6 7 8...

... 31

20 1 2 3

40984 5 6 7 8

...... 31

(a) 33 and 2

320 1 2 3 4 5 6

...... 31

1 2 3 4 5 6...

... 31

20 1 2 3

40984 5 6

...... 31

(b) 33 and 2

Figure 1. Inserting a sequence of numbers in a hash array mapped trie. The small index numbers refer to theposition in a sparse array.

insertion calculates the number’s hash code⇤:

hash(32) = . . . 000 00001 000002 = 0 0 0 0 0 1 032

hash(2) = . . . 000 00000 000102 = 0 0 0 0 0 0 232

hash(4098) = . . . 100 00000 000102 = 0 0 0 0 4 0 232

hash(34) = . . . 000 00001 000102 = 0 0 0 0 0 1 232

We first separate the hash codes in chunks of 5-bit to notate chunks as decimal values with rangesfrom 0 to 31. Then insertion places the values in a 32-nary tree (where each is encoded as a sparsearray), based on the hash code prefixes. The tree structure gets expanded until every prefix can beunambiguously stored.

To continue our example: 32 is inserted at the root node; 2 as well (because they do not sharea common prefix). 4098 shares the prefix path !0!2 with value 2, consequently it is placedunambiguously on level 3. 32 shares the prefix path !2 with 2 and 4098, but can be differentiatedon level 2 from both.

Note, that a chunk size of 5-bit for 32-bit hash codes results in trees with a maximal depth ofdlog32(232)e = 7.

In contrast Figure 3 illustrates an array-based hash-table. By comparing the visualizations of bothdata structures we can identify the following list of disadvantages of HAMTs over array-based hashtables (Section ?? provides evidence for our claims):

• Lookup has to follow a tree path of length between 1–7 nodes answer containment queries.Memory indirections and indexing into sparse arrays is less efficient than performing a singleindex-based lookup in continuous array.

⇤We assume a hash function for integers returns the argument, i.e. identity.

abstract class TrieSet implements java.util.Set {

TrieNode root; int size;

class TrieNode { int bitmap; Object[] contentAndSubTries; } }

2 4098

(a) Scala

(b) Clojure

(c) Clojure

Figure 3. Conceptual difference in tree layout between Clojure’s and Scala’s HAMT implementations.

Figure 4. Footprints of HAMT sets and HAMT maps in 32-bit and 64-bit environments. Defaulting to 5-bitprefix chunks.

subnode pointers. Michael IDon’t talk explicitly about Clojure/Scala, rather about mixed/separatedHAMT node designs.I

Division between internal nodes and leaf nodes. Scala’s HAMT implementations divide the treestructure into internal nodes and leaf nodes. The internal nodes amount for the hash-based prefixtree structure. The leaf nodes encapsulate the data tuple (i.e, a key in case of a set, a key/value pair

2 4098

(a) Scala

(b) Clojure

(c) Clojure

2 4098

(a) Scala

(b) Clojure

(c) Clojure

... class NodeNode extends TrieNode { int bitmap; TrieNode nodeAtIndex0; TrieNode nodeAtIndex1; } class ElementNode extends TrieNode { int bitmap; Object keyAtIndex0; TrieNode nodeAtIndex1; } class NodeElement extends TrieNode { int bitmap; TrieNode nodeAtIndex0; Object keyAtIndex1; } ...

class TrieNode { int bitmap; Object[] contentAndSubTries; }

... class NodeNode extends TrieNode { int bitmap; TrieNode nodeAtIndex0; TrieNode nodeAtIndex1; } class ElementNode extends TrieNode { int bitmap; Object keyAtIndex0; TrieNode nodeAtIndex1; } class NodeElement extends TrieNode { int bitmap; TrieNode nodeAtIndex0; Object keyAtIndex1; } ...

ExponentialNumber of Specializations

Memory Overhead per Pointer (Set, 32-bit)

0 Bytes

8 Bytes

16 Bytes

24 Bytes

32 Bytes

40 Bytes

48 Bytes

1-ary 2-ary 3-ary 4-ary 5-ary 6-ary 7-ary 8-ary 9-ary 10-ary 11-ary 12-ary

7,38,08,08,99,010,310,712,814,0

Frequency by Node Arity

0-ary 1-ary 2-ary 3-ary 4-ary 5-ary 6-ary 7-ary 8-ary 9-ary 10-ary 11-ary 12-ary

1%1%1%1%1%1%1%1%3%

Arities % of Nodes

≤4 82%

≤8 86%

≤12 90%

Arities Specializations

≤4 31

≤8 511

≤12 8191

Avoiding Permutations

Arities Specializations

≤4 15 (31)

≤8 45 (511)

≤12 91 (8191)

abstract class TrieSet implements java.util.Set { TrieNode root; int size;

interface TrieNode { ... } ... class NodeNode extends TrieNode { int bitmap; TrieNode nodeAtIndex0; TrieNode nodeAtIndex1; } class ElementNode extends TrieNode { int bitmap; Object keyAtIndex0; TrieNode nodeAtIndex1; } class NodeElement extends TrieNode { int bitmap; TrieNode nodeAtIndex0; Object keyAtIndex1; } class ElementElement extends TrieNode { int bitmap; Object keyAtIndex0; Object keyAtIndex1; } ... }

2 4098

(a) Scala

(b) Clojure

(c) Clojure

2 4098

(a) Scala

(b) Clojure

(c) Clojure

abstract class TrieSet implements java.util.Set { TrieNode root; int size;

interface TrieNode { ... } ... class NodeNode extends TrieNode {

byte pos1; TrieNode nodeAtPos1; byte pos2; TrieNode nodeAtPos2; } class ElementNode extends TrieNode {

byte pos1; Object keyAtPos1; byte pos2; TrieNode nodeAtPos2; } class NodeElement extends TrieNode {

byte pos1; TrieNode nodeAtPos1; byte pos2; Object keyAtPos2; } class ElementElement extends TrieNode {

byte pos1; Object keyAtPos1; byte pos2; Object keyAtPos2; } ... }

Lookup Performance (lower is better)

MapGeneric Specialized 0-4 Specialized 0-8 Specialized 0-12

138%138%130%

Memory Usage (lower is better)

Map Set

46%52%

100%100% Generic0-40-80-12

Memory Usage (lower is better)

Map Set

46%52%

100%100% Generic0-40-80-12

Memory Footprint Compared To Competition (lower is better)

Specialized 0-8 Clojure Scala

MapSet

worst hash distribution ->

good memory performance

best hash distribution ->

worst memory performance

best hash distribution ->

worst memory performancebest

Code Specialization for Memory Efﬁcient Hash Tries · Code Specialization for Memory Efﬁcient...

Documents

Indexação e Hashing - Construção de Índices e Funções Hash · Funções Hash Dinâmicas ∙ A função hash é modiﬁcada dinamicamente Hash Extensível (Extendible Hashing)

AAS SPECIALIZATION. IT SPECIALIZATION C# Java VB.NET Networking Oracle BA SPECIALIZATION Management Accounting International Business

tvtonight.com.au€¦ · Web viewKatie tries a modern version of a classic Hawaiian breakfast in Honolulu, gets a bite of corn beef hash in Gulfport, Florida and livens things up

Optimizing Hash-Array Mapped Tries for Fast and Lean ... · lections under-the-hood. Example Java-based systems in this category are JRelCal [25], JGraLab [12] and Rascal [18]. Implementations

Cryptographic Hash functions Hash Functions RIPEMD-160 and

Cerebral Specialization 1 Cerebral Specialization during ... · Cerebral Specialization 1 Cerebral Specialization during Lucid Dreaming A Right Hemisphere Hypothesis Robert Piller

Optimizing Hash-tries for Fast and Lean Immutable ...homepages.cwi.nl/~jurgenv/presentations/IFIP2014.pdf · [Michelangelo di Lodovico Buonarroti Simoni] SWAT - SoftWare Analysis

Sudan Medical Specialization Board Pharmacy Specialization Board Nile -FMP.pdf · · 2018-03-25Sudan Medical Specialization Board Pharmacy Specialization Board ... There are two

Recommendation for applications using approved … WORDS: digital signatures, hash algorithm, hash function, hash-based key derivation algorithms, hash value, HMAC, message digest,

Fuzzy Hashing - Jesse Kornblumjessekornblum.com/presentations/cdfsl07.pdfRolling Hash!A different kind of hash function! ... Fuzzy Hashing!Combine Rolling Hash with a Traditional Hash!Use

Hash TableHash Table - ORCCAyxie/courses/cs2210b-2011/htmls/notes/05-hash... · Hash TableHash Table Outline • motivation • hash functions • collision handling 1 Courtesy to

Dictionaries and Hash Tables - Purdue University · Dictionaries and Hash Tables 4 Hash Functions and Hash Tables (§8.2) A hash function h maps keys of a given type to integers in

Hash Functions 1 Hash Functions Hash Functions 2 Cryptographic Hash Function Crypto hash function h(x) must provide o Compression output length is

Evolving Hash Functions using Genetic Algorithmsajiips.com.au/papers/V4.1/V4N1.4 - Evolving Hash Functions using... · hash function called "PKP Hash" by Peter.K.Pearson [5] that

ACADEMIC PROGRAM REQUIREMENTS - burmanu.ca 16 17... · Biology Specialization Business Specialization English Specialization Mathematics Specialization Music Specialization Religious

Olympics, hash lyrics, forthcoming events, hash humour etc

COSC 1030 Lecture 10 Hash Table. Topics Table Hash Concept Hash Function Resolve collision Complexity Analysis

1 Lecture #14 The Modulus Operator Hash Tables –Closed hash tables Inserting, Searching, Deleting –Open hash tables –Hash table efficiency and “load factor”

1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

1 Binary Search Tree vs. Hash Table Binary Search Tree vs. Hash Table Hash Function (quick intro) Hash Function (quick intro) Collision Collision Coping