21
Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

Embed Size (px)

Citation preview

Page 1: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

Trip ReportFINAL MEETING AND SUMMER SCHOOL OF

DFG PRIORITY PROGRAM

ALGORITHM ENGINEERING

Page 2: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 2

DFG PP 1307: Algorithm Engineering

PP 1307: Algorithm Engineering• 28 research projects

• 267 publications

• 17 software projects, e.g.:

• Multi-Core STL (MCSTL) – now gcc parallel mode

• STL for Extra Large Datasets (STXXL)

2014-10-27

DFG Priority Program: nationwide funding program over 6 years for up to 30 individual projects

Page 3: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 3

Recap: Algorithm Engineering

1. realistic modelshardware and problem

2. designefficient, implementable algorithms

3. analyzebeyond worst-case

4. implementwith hardware peculiarities in mind

5. experimentrepeatable, thorough interpretation

“The distance between theory and practice is closer in

theory than in practice”

[Y. Matias (Google) in his invited talk at ESA ‘12]

2014-10-27

Page 4: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 4

Final Meeting (17.09.2014)

9 talks, covering wide range of topics◦ route planning in road and public transport networks◦ graph clustering and partitioning◦ data compression◦ linear and mixed integer optimization◦ sequence analysis

no Indico used, slides only partially available

2014-10-27

Page 5: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 5

Summer School (18.-19.09.2014)

Two days of lectures and hands-on sessions◦ data compression (lecture only)

◦ linear and mixed integer optimization◦ network analysis - graph clustering and partitioning◦ shortest paths algorithms (lecture only)

about 30 PhD studentslots of discussion among students and lecturers

2014-10-27

Page 6: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 6

Selected Topics

2014-10-27

Page 7: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 7

Network Analysis Networks are everywhere

◦ Computer networks◦ Social networks◦ …

2014-10-27

Page 8: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 8

Network Analysis Network analysis mainly concerned with complex networks

◦ Small diameter◦ Varying degree distribution◦ Lots of triangles

2014-10-27

Page 9: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 9

Network AnalysisGRAPH CLUSTERING

◦ Find (non-overlapping) internally dense, externally sparse subgraphs

◦ Unknown: Number of subgraphs, their size◦ Goals / Applications:

GRAPH PARTITIONING

◦ Partition vertex set into k (nearly) equally sized blocks

◦ Objective functions aim at small interfaces◦ Applications:

◦ Numerical simulations◦ route planning◦ distributed graph algorithms

o Uncover community structure (analysis, ...)

o Prepartition network (distributed storage, ...)

2014-10-27

Page 10: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 10

Network AnalysisGRAPH CLUSTERING

Algorithms:◦ Label propagation algorithm◦ Louvain greedy method

Many different metrics:◦ Conductance◦ Expansion◦ Modularity◦ …

GRAPH PARTITIONING

Algorithms:◦ Size-constrained label propagation◦ Diffusion-based partitioning

2014-10-27

Page 11: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 11

Network Analysis NetworKit:

◦ Toolkit developed during the project for network analysis – C++ with Python bindings◦ Includes wide range of tools for graph analysis◦ Excellent IPython notebook-based tutorial

◦ Includes algorithms proposed for evolving networks◦ Analyze changing social networks – e.g. ITI email graph

Interest for CERN:◦ Community detection on the grid planning of file transfers◦ Track reconstruction ongoing work

2014-10-27

Page 12: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 12

Shortest Paths and Routing Problem: find shortest path between s and t in weighted graph G

Algorithms:◦ Dijkstra’s algorithm too slow for large graphs◦ Manifold speedup techniques [survey]

◦ A : search with Euclidean bounds (classic)∗◦ ALT: A search with landmarks, preprocessing computes distances to landmarks∗◦ Contraction Hierarchies: introduce shortcuts between “important” vertices of the graph◦ Hub Labeling: every vertex stores distance to several hubs, covering the graph

◦ Most techniques rely on (more or less) expensive pre-computations

2014-10-27

Page 13: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 13

Shortest Paths and Routing Problem: User-defined cost functions render pre-computations futile

Solution: Three-stage processing [Delling et al. 2013]1. Metric-independent pre-processing

Recursively partition graphGenerate arcs between entry and exit nodes to neighboring partitions

2. Metric-dependent pre-processingCompute metric between all shortcut arcs

3. QueryFind shortest-path in contracted graph and unpack it in original one

2014-10-27

≈ hr

≈ s

≈ μs

Page 14: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 14

Shortest Paths and Routing Routing in public transport networks is a much harder problem

◦ Inherent time-dependence◦ Solved using (potentially huge!) event-activity networks

Interest for CERN:◦ Grid tiers already define contraction hierarchy

examine actual data flows for missing/misplaced hubs

2014-10-27

Page 15: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 15

Data Compression

Requirements:◦ Compressed space◦ Decompression time◦ Compression time is not much an issue

Compressor on dataset

MINGW (1gb)

Compressed space (MB)

Decompression time

(secs)Gzip 344 5.5Lzma 188 8.3Snappy 461 0.9

Trade-off

“Snappy is widely used inside Google, in everything from BigTable and MapReduce …”

Problem: compress once, decompress many times

2014-10-27

Page 16: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 16

Data Compression Reminder: Lempel-Ziv compression

a a c a a c a b c a a d a a a<6,3>

a c<0,d>This part has been already compressed <3,2> <11,3>

Greedy approach only optimal if every pair takes constant space◦ but variable number of bits required for distances non-optimal

Bit-optimal LZ parsing [Ferragina et al. 2013]◦ Solve shortest path problem on DAG describing possible compression pairs

2014-10-27

Page 17: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 17

Data Compression Bi-criteria Compression [Farruggia et al. 2014]:

◦ Space and decompression time edge weight in DAQ◦ Fix space constraint, search for lowest decompression time and vice versa

2014-10-27

Page 18: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 18

Data Compression Different approach to compression: Burrows-Wheeler Transform [introduction]

2014-10-27

Page 19: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 19

Data Compression Different approach to compression: Burrows-Wheeler Transform

◦ Yields smaller compression size but longer decompression time◦ Construction of BWT closely related to suffix-array construction◦ Allows decompression of any substring

FM index [Ferragina and Manzini 2000]◦ Used BWT and auxiliary data structures to answer count and locate queries on compressed text

Interest for CERN:◦ Compression of ROOT files + access of individual entries◦ Compression of and search in dictionaries

2014-10-27

Page 20: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 20

Miscellaneous Linear programming

◦ Disprove of Hirsch conjecture poses thread to simplex method still well in practice

◦ Anecdote: interior point method patented by AT&T circumvent patent by polar transformation of problem and usage of barrier method

SeqAn◦ Package for analysis of (genome) sequences◦ Developers face similar problems as HEP:

Bridge gap between computer science and real world problems

External memory algorithms◦ Flow computations for massive LiDAR terrain data sets◦ General trick of time forward processing to reduce I/O

2014-10-27

Page 21: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 21

Conclusions◦ Final meeting gave good overview of broad activity in DFG PP 1307 “Algorithm Engineering”◦ Summer school expanded on four focus topics of the PP

◦ Similar research continues in DFG PP DFG 1736 “Algorithms for Big Data”◦ Funding period 2013-2019◦ Currently 16 projects covering graph analysis, energy efficient scheduling, search and text indexing, genome assembly,…

◦ Most projects concerned with computer science problems◦ Computational biology problems present in both PPs

2014-10-27

HEP community needs to explore how to exploit this resource of expertise and funding