INSTITUTE OF THEORETICAL INFORMATICS – ALGORITHMICS
Scalable String and Suffix Sorting:Algorithms, Techniques, and ToolsTimo Bingmann · Dissertation Defense · July 3rd, 2018
KIT – The Research University in the Helmholtz Association www.kit.edu
Overview
Multi-CoreScalable String Sorting
External and DistributedScalable Suffix Sorting
a l p h a 0⊥a r c a d e 01
a r r a y 02
k a y a k 00
k e r n e l 01
k i t 01
k i t c h e n 03
k i t t e n 03
k r y p t o n 01
⊥0
1
4
0
2
5
0
3
$
a $
a c b a $
a c b a c b a $
b a $
b a c b a $
b a c b a c b a $
c b a $
c b a c b a $
AlgorithmEngineering
Timo Bingmann – Scalable String and Suffix Sorting: Algorithms, Techniques, and ToolsInstitute of Theoretical Informatics – Algorithmics July 3rd, 2018 2 / 29
Sorting Strings
a r r a y 0
k i t 0
a r r a n g e 0
k a y a k 0
k e r n e l 0
k i t c h e n 0
k i t t e n 0
a r c a d e 0
k i t t e n 0
a b a c u s 0
k r y p t o n 0
a l p h a 0
a r c a n e 0
⇒
a b a c u s 0⊥a l p h a 01
a r c a d e 01
a r c a n e 04
a r r a n g e 02
a r r a y 04
k a y a k 00
k e r n e l 01
k i t 01
k i t c h e n 03
k i t t e n 03
k i t t e n 06
k r y p t o n 01
DLCP
Input: n strings containing N characters in total.
Timo Bingmann – Scalable String and Suffix Sorting: Algorithms, Techniques, and ToolsInstitute of Theoretical Informatics – Algorithmics July 3rd, 2018 3 / 29
String Sorting AlgorithmsTheoretical Parallel Algorithms
“Optimal Parallel String Algorithms: . . .” [Hagerup ’94]O(log N/ log log N) time and O(N log log N) work on CRCW PRAM
Existing Basic Sequential Algorithms
Radix Sort O(D + n logσ) [McIlroy et al. ’95]
Multikey Quicksort O(D + n log n) exp. [Bentley, Sedgewick ’97]
Burstsort O(D + n logσ) exp. [Sinha, Zobel ’04]
Binary LCP-Mergesort O(D + n log n) [Ng, Kakehi ’08]
Existing Algorithm Library
in C/C++ by Rantala (for Engineering Radix Sort [Kärkkäinen, Rantala ’09])
Our Contributions: New Basic and Practical Parallel Algorithms
Parallel Super Scalar String Sample Sort (pS5) [B, Sanders, ESA’13]
Parallel K -way LCP-aware Mergesort (and Merge) [B, et al. Algorithmica’17]
Timo Bingmann – Scalable String and Suffix Sorting: Algorithms, Techniques, and ToolsInstitute of Theoretical Informatics – Algorithmics July 3rd, 2018 4 / 29
Super Scalar String Sample Sort (S5)
a r r a y 0
k i t 0
a r r a n g e 0
k a y a k 0
k e r n e l 0
k i t c h e n 0
k i t t e n 0
a r c a d e 0
k i t e 0
a b a c u s 0
k r y p t o n 0
a l p h a 0
a r c a n e 0
x1
x0 x2
b0 b2 b4 b6
< >
< > < >
based on Super Scalar Sample Sort[Sanders, Winkel ’04]
Timo Bingmann – Scalable String and Suffix Sorting: Algorithms, Techniques, and ToolsInstitute of Theoretical Informatics – Algorithmics July 3rd, 2018 5 / 29
Super Scalar String Sample Sort (S5)
a r r a y 0
k i t 0
a r r a n g e 0
k a y a k 0
k e r n e l 0
k i t c h e n 0
k i t t e n 0
a r c a d e 0
k i t e 0
a b a c u s 0
k r y p t o n 0
a l p h a 0
a r c a n e 0
x1
x0 x2
b0 b2 b4 b6
< >
< > < >
based on Super Scalar Sample Sort[Sanders, Winkel ’04]
Timo Bingmann – Scalable String and Suffix Sorting: Algorithms, Techniques, and ToolsInstitute of Theoretical Informatics – Algorithmics July 3rd, 2018 6 / 29
Super Scalar String Sample Sort (S5)
a r r a y 0
k i t 0
a r r a n g e 0
k a y a k 0
k e r n e l 0
k i t c h e n 0
k i t t e n 0
a r c a d e 0
k i t e 0
a b a c u s 0
k r y p t o n 0
a l p h a 0
a r c a n e 0
ar
ab ki
b0 b2 b4 b6
< >
< > < >
partition by w chars
store in level-order anduse predicated instructions
ar ab ki
1 2 3
i := 2i + 0/1
Timo Bingmann – Scalable String and Suffix Sorting: Algorithms, Techniques, and ToolsInstitute of Theoretical Informatics – Algorithmics July 3rd, 2018 7 / 29
Super Scalar String Sample Sort (S5)
a r r a y 0
k i t 0
a r r a n g e 0
k a y a k 0
k e r n e l 0
k i t c h e n 0
k i t t e n 0
a r c a d e 0
k i t e 0
a b a c u s 0
k r y p t o n 0
a l p h a 0
a r c a n e 0
ar
ab ki
b0 b2 b4 b6b1 b3 b5
< >
< > < >
=
= =
equality checking:1 at each splitter2 after full descent
interleave tree descents:classify four strings at once⇒ super scalar parallelism
Timo Bingmann – Scalable String and Suffix Sorting: Algorithms, Techniques, and ToolsInstitute of Theoretical Informatics – Algorithmics July 3rd, 2018 8 / 29
Super Scalar String Sample Sort (S5)
a r r a y 0
k i t 0
a r r a n g e 0
k a y a k 0
k e r n e l 0
k i t c h e n 0
k i t t e n 0
a r c a d e 0
k i t e 0
a b a c u s 0
k r y p t o n 0
a l p h a 0
a r c a n e 0
ar
ab ki
b0 b1 b2 b3 b4 b5 b6
b0 b1 b2 b3 b4 b5 b6
b0 b1 b2 b3 b4 b5 b6
< >
< > < >
=
= =
easy parallelization
classification tree inL2 caches of processors
Timo Bingmann – Scalable String and Suffix Sorting: Algorithms, Techniques, and ToolsInstitute of Theoretical Informatics – Algorithmics July 3rd, 2018 9 / 29
Super Scalar String Sample Sort (S5)
a b a c u s 0
a l p h a 0
a r r a y 0
a r r a n g e 0
a r c a d e 0
a r c a n e 0
k a y a k 0
k e r n e l 0
k i t 0
k i t c h e n 0
k i t t e n 0
k i t e 0
k r y p t o n 0
prefix21
2
0
2
0
ar
ab ki
0 0 0 2 2 1 00 1 0 1 0 3 00 0 1 1 0 0 1
< >
< > < >
=
= =
1 0
increase prefix byLCP of splitters
or key size.
reorder out-of-place, in-place,and/or in parallel
top-level algorithm in parallel S5
Timo Bingmann – Scalable String and Suffix Sorting: Algorithms, Techniques, and ToolsInstitute of Theoretical Informatics – Algorithmics July 3rd, 2018 10 / 29
LCP Loser Tree – K -way LCP-Merge
a a 0⊥a a a 02
(2, aab)
(1, acb)
(2, aac) (0,bca)
(2, aab) (2, aac) (0,bca) (1, acb)
LCP-Merge
a a b 0⊥a b a 01
a a c 0⊥a a d 02
b c a 0⊥c a 00
a c b 0⊥d a 00
needs at most∆L + n log2 K + Kcharacter comparisons
Timo Bingmann – Scalable String and Suffix Sorting: Algorithms, Techniques, and ToolsInstitute of Theoretical Informatics – Algorithmics July 3rd, 2018 11 / 29
LCP Loser Tree – K -way LCP-Merge
a a 0⊥a a a 02
(2, aab)
(1, acb)
(2, aac) (0,bca)
(2, aab) (2, aac) (0,bca) (1, acb)
LCP-Merge
a a b 0⊥a b a 01
a a c 0⊥a a d 02
b c a 0⊥c a 00
a c b 0⊥d a 00
needs at most∆L + n log2 K + Kcharacter comparisons
Timo Bingmann – Scalable String and Suffix Sorting: Algorithms, Techniques, and ToolsInstitute of Theoretical Informatics – Algorithmics July 3rd, 2018 12 / 29
Contributed String Sorting AlgorithmsParallel Super Scalar String Sample Sort (pS5)
fully parallel S5, sequential S5, and fast base case sorterssequential running time of S5:O( D
w + n log n) expected time with equality checks, andO(( D
w + n) log v + n log n) expected time with unrolled descents.parallel running time of a single step of fully parallel S5:O( n
p log v + log p) time and O(n log v + pv) work.
Hybrid NUMA-aware pS5 + K -way LCP-Merge
Parallel Multikey Quicksort
Parallel Radix Sort (Adaptive 16-bit and 8-bit)
Additional Algorithms:
(Parallel) Multiway LCP-aware Mergesort O(D + n log n + nK )
Sequential LCP-aware Insertion Sort O(D + n2)
Timo Bingmann – Scalable String and Suffix Sorting: Algorithms, Techniques, and ToolsInstitute of Theoretical Informatics – Algorithmics July 3rd, 2018 13 / 29
128 GiB GOV2 – Speedup on 32-Core Intel
1 8 16 32 48 640
5
10
15
20
number of threads
speedup
pS5
pS5 + 4-way LCP-MergepMultikeyQuicksortpRadixSort (16-bit)Akiba pRadixSort
Input characteristics: n = 3.1 G, N = 128 Gi, DN = 82.7 %.
Timo Bingmann – Scalable String and Suffix Sorting: Algorithms, Techniques, and ToolsInstitute of Theoretical Informatics – Algorithmics July 3rd, 2018 14 / 29
Overview
Multi-CoreScalable String Sorting
External and DistributedScalable Suffix Sorting
a l p h a 0⊥a r c a d e 01
a r r a y 02
k a y a k 00
k e r n e l 01
k i t 01
k i t c h e n 03
k i t t e n 03
k r y p t o n 01
⊥0
1
4
0
2
5
0
3
$
a $
a c b a $
a c b a c b a $
b a $
b a c b a $
b a c b a c b a $
c b a $
c b a c b a $
AlgorithmEngineering
Timo Bingmann – Scalable String and Suffix Sorting: Algorithms, Techniques, and ToolsInstitute of Theoretical Informatics – Algorithmics July 3rd, 2018 15 / 29
Example T = [0 1 2 3 4 5 6 7 8 9 10 11 12 13
t obeo rno t t obe$]
i LCPi Ti
0 0 t o b e o r n o t t o b e $
1 0 o b e o r n o t t o b e $
2 0 b e o r n o t t o b e $
3 0 e o r n o t t o b e $
4 0 o r n o t t o b e $
5 0 r n o t t o b e $
6 0 n o t t o b e $
7 0 o t t o b e $
8 0 t t o b e $
9 0 t o b e $
10 0 o b e $
11 0 b e $
12 0 e $
13 0 $
Timo Bingmann – Scalable String and Suffix Sorting: Algorithms, Techniques, and ToolsInstitute of Theoretical Informatics – Algorithmics July 3rd, 2018 16 / 29
Example T = [0 1 2 3 4 5 6 7 8 9 10 11 12 13
t obeo rno t t obe$]
SAi LCPi TSAi...n
13 - $
11 0 b e $
2 2 b e o r n o t t o b e $
12 0 e $
3 1 e o r n o t t o b e $
6 0 n o t t o b e $
10 0 o b e $
1 3 o b e o r n o t t o b e $
4 1 o r n o t t o b e $
7 1 o t t o b e $
5 0 r n o t t o b e $
9 1 t o b e $
0 4 t o b e o r n o t t o b e $
8 1 t t o b e $
Timo Bingmann – Scalable String and Suffix Sorting: Algorithms, Techniques, and ToolsInstitute of Theoretical Informatics – Algorithmics July 3rd, 2018 17 / 29
Pre
fixD
oubl
ing
Induced Copying Recursion
MM90original
BW94BWT
Far97O(n) tree
LS99qsufsort
Sew001/2 copy
IT99A/B copy
KSPP03mod2 split
KS03DC
HSS03mod2
BK03diffcover
KA03L/S split
MF02deep-shallow
KJP04fixed ΣSS05
bpr MP06ISA chains
Mor06divsufsort MP08
cache aware
NZ07O(n log |Σ|)
NZC09aSA-IS
NZC09bSA-DS
Non13SACA-K
LLH16O(1) space
Got17O(1) space
AN10SFE-coding
Bai16prev ptr
19992000
2003
2004
200520062008
2009
2010
20132016
2017
Pre
fixD
oubl
ing
Induced Copying Recursion
MM90original
BW94BWT
Far97O(n) tree
LS99qsufsort
Sew001/2 copy
IT99A/B copy
KSPP03mod2 split
KS03DC
HSS03mod2
BK03diffcover
KA03L/S split
MF02deep-shallow
KJP04fixed ΣSS05
bpr MP06ISA chains
Mor06divsufsort MP08
cache aware
NZ07O(n log |Σ|)
NZC09aSA-IS
NZC09bSA-DS
Non13SACA-K
LLH16O(1) space
Got17O(1) space
AN10SFE-coding
Bai16prev ptr
19992000
2003
2004
200520062008
2009
2010
20132016
2017
Pre
fixD
oubl
ing
Induced Copying Recursion
MM90original
BW94BWT
Far97O(n) tree
LS99qsufsort
Sew001/2 copy
IT99A/B copy
KSPP03mod2 split
KS03DC
HSS03mod2
BK03diffcover
KA03L/S split
MF02deep-shallow
KJP04fixed ΣSS05
bpr MP06ISA chains
Mor06divsufsort MP08
cache aware
NZ07O(n log |Σ|)
NZC09aSA-IS
NZC09bSA-DS
Non13SACA-K
LLH16O(1) space
Got17O(1) space
AN10SFE-coding
Bai16prev ptr
19992000
2003
2004
200520062008
2009
2010
20132016
2017
prefix doubling – [CF02]O(sort(n)·log(maxlcp)) I/Os
prefix doubling – [DKMS05]sort(5n)·log2(maxlcp) +O(sort(n)) I/Os
DC7 – [DKMS05]sort(24.75n) + scan(6n) I/Os
eSAIS – [B, Fischer, Osipov, ALENEX’13, JEA’16]sort(17n) + scan(9n) I/Os
fSAIS – [KKPZ17]O(scan(n)·log2
M/B(n/B)) I/Os
External Memory Algorithms
STX LX
bwUniCluster© KIT (SCC)
Big Data Batch Processing
InterfaceLow LevelDifficult
High LevelSimple
Perfo
rman
ceS
low
Fast MPI
MapReduceHadoop
ApacheSpark
ApacheFlink
Our Requirements:compound primitives intocomplex algorithms
efficient simple data types,
overlap computation andcommunication,
automatic disk usage,
C++, and much more...
New Framework:
Thrill
Lower Layersof Thrill
Timo Bingmann – Scalable String and Suffix Sorting: Algorithms, Techniques, and ToolsInstitute of Theoretical Informatics – Algorithmics July 3rd, 2018 22 / 29
Distributed Immutable Array (DIA)User Programmer’s View:
DIA<T> = distributed array of items T on the cluster
Cannot access items directly, instead use small set of scalableprimitives, for example: Map, Sort, ReduceByKey, Zip, Window, etc.
A
A.Map(·) =: B
B.Sort(·) =: C
PE0 PE1 PE2 PE3
Framework Designer’s View:
Goals: distribute work, optimize execution on cluster, add redundancywhere applicable. =⇒ build data-flow graph.
DIA<T> = pipelined chain of computations
Timo Bingmann – Scalable String and Suffix Sorting: Algorithms, Techniques, and ToolsInstitute of Theoretical Informatics – Algorithmics July 3rd, 2018 23 / 29
Distributed Immutable Array (DIA)User Programmer’s View:
DIA<T> = distributed array of items T on the cluster
Cannot access items directly, instead use small set of scalableprimitives, for example: Map, Sort, ReduceByKey, Zip, Window, etc.
A
A.Map(·) =: B
B.Sort(·) =: C
PE0 PE1 PE2 PE3A
B := A.Map()
C := B.Sort()
C
Framework Designer’s View:
Goals: distribute work, optimize execution on cluster, add redundancywhere applicable. =⇒ build data-flow graph.
DIA<T> = pipelined chain of computations
Timo Bingmann – Scalable String and Suffix Sorting: Algorithms, Techniques, and ToolsInstitute of Theoretical Informatics – Algorithmics July 3rd, 2018 24 / 29
Distributed Immutable Array (DIA)User Programmer’s View:
DIA<T> = distributed array of items T on the cluster
Cannot access items directly, instead use small set of scalableprimitives, for example: Map, Sort, ReduceByKey, Zip, Window, etc.
A
A.Map(·) =: B
B.Sort(·) =: C
PE0 PE1 PE2 PE3A
B := A.Map()
C := B.Sort()
CFramework Designer’s View:
Goals: distribute work, optimize execution on cluster, add redundancywhere applicable. =⇒ build data-flow graph.
DIA<T> = pipelined chain of computations
Timo Bingmann – Scalable String and Suffix Sorting: Algorithms, Techniques, and ToolsInstitute of Theoretical Informatics – Algorithmics July 3rd, 2018 25 / 29
Thrill’s Goal and Current Status
An easy way to program fast distributed algorithms in C++.
Current Status:
Open-source prototype at http://github.com/thrill/thrill.
≈ 60 K lines of C++14 code, 70–80 % written by B, ≥ 12 contributors
Published at IEEE Conference on Big Data [B, et al. ’16]
Faster than Apache Spark and Apache Flink on five micro benchmarks:WordCount1000, WordCountCC, PageRank, TeraSort, and K-Means.
Case Studies:
Five suffix sorting algorithms [B, Gog, Kurpicz, arXiv’17]
Louvain graph clustering algorithm [Hamann et al. arXiv’17]
More examples: stochastic gradient descent, triangle counting, etc.
Future: fault tolerance, scalability, and more applications.
Timo Bingmann – Scalable String and Suffix Sorting: Algorithms, Techniques, and ToolsInstitute of Theoretical Informatics – Algorithmics July 3rd, 2018 26 / 29
Data-Flow Graph of DC3 with RecursionT
T3 := T .FlatWindow3
S := T3.Sort
IS := S.MapN ′ := S.FlatWindow2
N := N ′.PrefixSum
N.Max TR := Zip([ IS,N ])
T ′R := TR.Sort
SAR := DC3(T ′R.Map)
I′R := SAR.ZipWithIndex
IR := I′R.Sort.Map
Z ′ := ZipWindow[3,2]([ T , IR ])
Z := Z ′.Window2
S′0 := Z .Map S′1 := Z .Map S′2 := Z .Map
S0 := S′0.Sort S1 := S′1.Sort S2 := S′2.Sort
Merge([ S0,S1,S2 ])
SAT
constructlexicographic names
recursion andcalculate ranks
create tuples and merge suffix array
Timo Bingmann – Scalable String and Suffix Sorting: Algorithms, Techniques, and ToolsInstitute of Theoretical Informatics – Algorithmics July 3rd, 2018 27 / 29
Suffix Sorting Wikipedia with 32 Hosts
224 226 228 230 232 234
0
10
20
input size n (B)
thro
ughp
ut(M
iB s)
DoublingWDoublingSDiscardingDC3DC7
B.pDC3 (MPI)B.pDC7 (MPI)FA.psac (MPI)
Run on 32× i3.4xlarge AWS EC2 instances containing 16-core Intel XeonE5-2686 CPUs with 2.30 GHz, 8 GB of RAM, and 2×1.9 TB NVMe SSDs.
Timo Bingmann – Scalable String and Suffix Sorting: Algorithms, Techniques, and ToolsInstitute of Theoretical Informatics – Algorithmics July 3rd, 2018 28 / 29
Overview: Main ContributionsMulti-Core
Scalable String SortingExternal and DistributedScalable Suffix Sorting
Parallel Super Scalar StringSample Sort (pS5) [BS13]
ar
ab ki
b0 b1 b2 b3 b4 b5 b6
<
=
>
< = > < = >
1 0
Parallel Multiway LCP-Merge,Merge Sort, and More [BES17]
(2, aab)
(1, acb)
(2, aac) (0,bca)
(2, aab) (2, aac) (0,bca) (1, acb)
LCP-Merge
Induced Sorting in ExternalMemory: eSAIS [BFO13, BFO16]
New High-Performance DistributedFramework in C++: Thrill [BAJ+16]
Distributed External Suffix SortingT T3 := T .FlatWindow3
S := T3.Sort
IS := S.MapN ′ := S.FlatWindow2
N := N ′.PrefixSum
N.Max TR := Zip([ IS,N ])
T ′R := TR.Sort
SAR := DC3(T ′R.Map)
I′R := SAR.ZipWithIndex
IR := I′R.Sort.Map
Z ′ := ZipWindow[3,2]([ T , IR ])
Z := Z ′.Window2
S′0 := Z .Map S′
1 := Z .Map S′2 := Z .Map
S0 := S′0.Sort S1 := S′
1.Sort S2 := S′2.Sort
Merge([ S0,S1,S2 ]) SAT
Timo Bingmann – Scalable String and Suffix Sorting: Algorithms, Techniques, and ToolsInstitute of Theoretical Informatics – Algorithmics July 3rd, 2018 29 / 29