Upload
viho
View
29
Download
2
Embed Size (px)
DESCRIPTION
An Interactive Environment for Combinatorial Scientific Computing. Viral B. Shah John R. Gilbert Steve Reinhardt With thanks to: Brad McRae, Stefan Karpinski, Vikram Aggarwal, Min Roh. HPC today is exciting !. Complex software stack. Computational ecology, CFD, data exploration. - PowerPoint PPT Presentation
Citation preview
QuickTime™ and a decompressor
are needed to see this picture.
An Interactive Environment for Combinatorial Scientific Computing
Viral B. Shah John R. GilbertSteve Reinhardt
With thanks to: Brad McRae, Stefan Karpinski, Vikram Aggarwal, Min Roh
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
HPC today is exciting !
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
Complex software stack
Distributed Sparse MatricesArithmetic, matrix multiplication, indexing, solvers (\, eigs)
Graph Analysis & PD Toolbox
Graph querying & manipulation, connectivity, spanning trees,
geometric partitioning, nested dissection, NNMF, . . .
Preconditioned Iterative Methods
CG, BiCGStab, etc. + combinatorial preconditioners (AMG, Vaidya)
Applications
Computational ecology, CFD, data exploration
QuickTime™ and a decompressor
are needed to see this picture.
Star-P
A = rand(4000*p, 4000*p);
x = randn(4000*p, 1);
y = zeros(size(x));
while norm(x-y) / norm(x) > 1e-11
y = x;
x = A*x;
x = x / norm(x);
end;
QuickTime™ and a decompressor
are needed to see this picture.
Star-P architecture
QuickTime™ and a decompressor
are needed to see this picture.
Parallel sorting
• Simple, widely used combinatorial primitive
• [V, perm] = sort (V)
• Used in many sparse matrix and array algorithms: sparse(),
indexing, concatenation, transpose, reshape, repmat etc.
• Communication efficient
3 6 8 1 5 4 7 2 9
1 2 3 4 5 6 7 8 9
QuickTime™ and a decompressor
are needed to see this picture.
Sorting performance
Time spent in different phases of Psort (192 processors on SGI Altix)
0.0001
0.001
0.01
0.1
1
10
100
1000
1E+06 1E+07 1E+08 1E+09 1E+10 1E+11
Problem Size
Time (seconds)
Sequential sorting Splitters using mediansCommunication MergingTotal time
QuickTime™ and a decompressor
are needed to see this picture.
P0
P1
P2
Pn
5941 532631
23 131
Each processor stores:
• # of local nonzeros (# local edges)• range of local rows (local vertices)• nonzeros in a compressed row data structure (local edges)
Distributed sparse arrays
1
2 326
53
41
31
59
QuickTime™ and a decompressor
are needed to see this picture.
Sparse matrix operations
• dsparse layout, same semantics as ddense
• Matrix arithmetic: +, max, sum, etc.
• matrix * matrix and matrix * vector
• Matrix indexing and concatenation
A (1:3, [4 5 2]) = [ B(:, J) C ] ;
• Linear solvers: x = A \ b; using MUMPS/SuperLU (MPI)
• Eigensolvers: [V, D] = eigs(A); using PARPACK (MPI)
QuickTime™ and a decompressor
are needed to see this picture.
Sparse matrix multiplication
B
= x
C A
for j = 1:nC(:, j) = A * B(:, j)
SPA
gather scatter/accumulate
All matrix columns and vectors are stored compressed except the SPA.
See A. Buluc (MS42, Fri 10am)
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
Interactive data exploration
A graph plotted with relaxed Fiedler co-ordinates
QuickTime™ and a decompressor
are needed to see this picture.
A 2-D density spy plot
Density spy plot of an R-MAT power law graph
QuickTime™ and a decompressor
are needed to see this picture.
Breadth-first search: sparse matvec
AT
1 2
3
4 7
6
5
(AT)2x
x ATx
• Multiply by adjacency matrix step to neighbor vertices
• Work-efficient implementation from sparse data structures
QuickTime™ and a decompressor
are needed to see this picture.
Maximal independent set
1 2
3
4 7
6
5
degree = sum(G, 2);
prob = 1 ./ (2 * deg);
select = rand (n, 1) < prob;
if ~isempty (select & (G * select);
% keep higher degree vertices
end
IndepSet = [IndepSet select];
neighbor = neighbor | (G * select);
remain = neighbor == 0;
G = G(remain, remain);
Luby’s algorithm
QuickTime™ and a decompressor
are needed to see this picture.
• Many tight clusters, loosely interconnected
• Vertices and edges permuted randomly
A graph clustering benchmark
Fine-grained, irregular data access
QuickTime™ and a decompressor
are needed to see this picture.
Clustering by BFS
% Grow each seed to vertices
% reached by at least k
% paths of length 1 or 2
C = sparse(seeds, 1:ns, 1, n, ns);
C = A * C;
C = C + A * C;
C = C >= k;
• Grow local clusters from many seeds in parallel
• Breadth-first search by sparse matrix * matrix
• Cluster vertices connected by many short paths
QuickTime™ and a decompressor
are needed to see this picture.
1213
12125
55
1313
13 13
12 12
[ignore, leader] = max(G);
S = sparse(leader,1:n,1,n,n) * G;
[ignore, leader] = max(S);
• Each vertex votes for its highest numbered neighbor as its leader
• Number of leaders is roughly the same as number of clusters
• Matrix multiplication gathers neighbor votes
• S(i,j) is the number of votes for i from j’s neighbors
Clustering by peer pressure
QuickTime™ and a decompressor
are needed to see this picture.
SSCA #2 v1.1 - scale 21
1
10
100
1000
8 24 40 56 72 88 104 120
Processors
Time (sec)
Data Generator
Kernel 1
Kernel 2
Kernel 3
Kernel 4
Scaling up
• Graph with 2 million nodes, 321 million directed edges, 89 million undirected edges, 32 thousand cliques
• Good scaling observed from 8 to 120 processors of an SGI Altix
QuickTime™ and a decompressor
are needed to see this picture.
Graph Laplacian
Graph of Poisson’s Equation on a 2D gridG = grid5 (10);
QuickTime™ and a decompressor
are needed to see this picture.
Spanning trees
Maximum weight spanning tree T = mst (G, ‘max’);
QuickTime™ and a decompressor
are needed to see this picture.
A combinatorial preconditionerV. Aggarwal
Augmented Vaidya’s preconditionerV = vaidya_support (G);
QuickTime™ and a decompressor
are needed to see this picture.
Quadtree meshes and AMGV. Aggarwal and M. Roh
QuickTime™ and a decompressor
are needed to see this picture.
Wireless traffic modelingS. Karpinski
• Non-negative matrix factorizations (NNMF) for wireless traffic modeling
• NNMF algorithms combine linear algebra and optimization methods
• Basic and “improved” NMF factorization algorithms implemented:
– euclidean (Lee & Seung 2000)
– K-L divergence (Lee & Seung 2000)
– semi-nonnegative (Ding et al. 2006)
– left/right-orthogonal (Ding et al. 2006)
– bi-orthogonal tri-factorization (Ding et al. 2006)
– sparse euclidean (Hoyer et al. 2002)
– sparse divergence (Liu et al. 2003)
– non-smooth (Pascual-Montano et al. 2006)
QuickTime™ and a decompressor
are needed to see this picture.
A meta-algorithm
sphericalk-means
ANLS
K-L div.
same asCDFs
QuickTime™ and a decompressor
are needed to see this picture.
Landscape ConnectivityB. McRae
• Landscape connectivity governs the degree to which the landscape facilitates or impedes movement
• Need to model important processes like:
– Gene flow (to avoid inbreeding)
– Movement and mortality patterns
• Corridor identification, conservation planning
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
Pumas in southern California
Joshua Tree National Park
Los AngelesPalm Springs
Habitat quality model
QuickTime™ and a decompressor
are needed to see this picture.
Model as a resistive network
Habitat
Nonhabitat
Reserve
QuickTime™ and a decompressor
are needed to see this picture.
Processing landscapes
QuickTime™ and a decompressor
are needed to see this picture.
Combinatorial methodsGraph construction
Graph contraction
Connected components
Numerical methods Linear systems
Combinatorial preconditioners
QuickTime™ and a decompressor
are needed to see this picture.
Results
• Solution time reduced from 3 days to 5 minutes for typical problems
• Aiming for much larger problems: Yellowstone-to-Yukon (Y2Y)
QuickTime™ and a decompressor
are needed to see this picture.
Multi-layered software tools
Distributed Sparse MatricesArithmetic, matrix multiplication, indexing, solvers (\, eigs)
Graph Analysis & PD Toolbox
Graph querying & manipulation, connectivity, spanning trees,
geometric partitioning, nested dissection, NNMF, . . .
Preconditioned Iterative Methods
CG, BiCGStab, etc. + combinatorial preconditioners (AMG, Vaidya)
Applications
Computational ecology, CFD, data exploration
QuickTime™ and a decompressor
are needed to see this picture.
Thanks for coming
Thank You