Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Tom Peterka
Mathematics and Computer Science Division ATPESC Talk 8/11/14
DIY Parallel Data Analysis
Image courtesy pigtimes.com
“I have had my results for a long time, but I do not yet know how I am to arrive at them.”
–Carl Friedrich Gauss, 1777-1855
Preliminaries
2
Moving from Postprocessing to Run-Time Scientific Data
Analysis in HPC ���
3
Analyze!
Postprocessing analysis and visualization
Run-time analysis and visualization
Example of a data flow network
Definition of Data Analysis
• Any data transformation, or a network or transformations. • Anything done to original data beyond its original generation. • Can be visual, analytical, statistical, or data management.
4
Examples
5
Streamlines and pathlines Stream surfaces
FTLE Information entropy
Morse-Smale complex Voronoi and Delaunay tessellation
Ptychography
Common Denominators
6
• Big science => big data, big machines • Most analysis algorithms are not up to speed
• Either serial, or
• Overheads kill scalability
• Solutions
• Process data closer to the source • Write scalable analysis algorithms
• Parallelize in various forms
• Build software stacks of useful and reusable layers
• Usability and workflow
• Develop libraries rather than tools • Users write small main programs and call into libraries
Abstractions Matter: Think Blocks, not Tasks
7
• Block = unit of decomposition • Block size, shape can be configured
• From coarse to fine
• Regular, adaptive, KD-tree
• Block placement is flexible, dynamic
• Blocks per task • Tasks per block
• Memory / storage hierarchy
• Data is first-class citizen
• Separate operations per block
• Thread safety
Parallel data analysis consists of decomposing a problem into blocks, operating on them, and communicating between them.
You Have Two Choices to Parallelize Data Analysis
8
or By hand With tools
void ParallelAlgorithm() { … MPI_Send(); … MPI_Recv(); … MPI_Barrier(); … MPI_File_write(); }
void ParallelAlgorithm() { … LocalAlgorithm(); … DIY_Merge_blocks(); … DIY_File_write() }
DIY Concepts
9
10
Library
Written in C++ with C bindings Autoconf build system (configure, make, make install) Lightweight: libdiy.a 800KB Maintainable: ~15K lines of code, including examples
DIY usage and library organization
Features
Parallel I/O to/from storage Domain decomposition Network communication Utilities
!"#$%&'"() *"+$&%",&'"()-.((%
/)&%0+"+-1"23&30
4%&+56-789:;;;6-</== >&3&*"8?6-*"+@'
@.16-A+$B%(?6-C5$%%6-*.D
E@F
G>@
78"H52(3
I%(2&%
J%(K9")H
/++"H)#8)'
E@F
E8K(#L(+"'"() =(##$)"K&'"()M8&NE&'&
@OA
P3"'8-M8+$%'+
=(#L38++"()Q'"%"'"8+ >&3&%%8%!(3'
E&'&'0L8=38&'"()
>&3&%%8%
DIY���helps the user write data-parallel analysis algorithms by decomposing a
problem into blocks and communicating items between blocks. ���
Nine Things That DIY Does
11
1. Separate analysis ops from data ops
2. Group data items into blocks
3. Assign blocks to processes
4. Group blocks into neighborhoods
5. Support multiple multiple instances of 2, 3, and 4
6. Handle time
7. Communicate between blocks in various ways
8. Read data and write results
9. Integrate with other libraries and tools
!"#$%&'(('( )"#$%&'(('( *"#$%&'((
!"#$
!%!&
!'()*+,
-$.!"+$/
/01!"1)2$"34(*.4**5
/01+$
$0*+4
!$#0*.1)2$"34(*.4**5
!"#$6/!$0/
!7!'
!8
!9!8
!:
!"#$%&'()*%+$#,$-$#./$#,$'$/#/'*$#,$01$2%3456#75##8+
Usage
12
Writing a DIY Program
13
Tutorial Examples • Block I/O: Reading data, writing analysis
results • Static: Merge-based, Swap-based reduction,
Neighborhood exchange
• Time-varying: Neighborhood exchange • Spare thread: Simulation and analysis
overlap • MOAB: Unstructured mesh data model • VTK: Integrating DIY communication with
VTK filters • R: Integrating DIY communication with R
stats algorithms • Multimodel: multiple domains and
communicating between them
Documentation • README for installation • User’s manual with description, examples
of custom datatypes, complete API reference
Execution: Same Core and Spare Core
14
Same core Spare core
Time: Static and Time-Varying
15
Static Time-varying
Communication: Merge and Swap Reduction, Neighbor Exchange
16
Merge and swap reduction
Neighbor exchange
One Example in Greater Detail
17
Parallel Tessellation ���
We developed a prototype library for computing in situ Voronoi and Delaunay tessellations from particle data and applied it to cosmology, molecular dynamics,
and plasma fusion. ���
Key Ideas
• Mesh tessellations convert sparse point data into continuous dense field data.
• Meshing output of simulations is data-intensive and requires supercomputing resources
• No large-scale data-parallel tessellation tools exist.
• We developed such a library, tess.
• We achieved good parallel performance and scalability.
• Widespread GIS applicability in addition to the datasets we tested.
18
Scalability
19
Strong and weak scaling for up to 20483 synthetic particles and up to 128K processes (excluding I/O) shows up to 90% strong scaling and up to 98% weak scaling.
Applications in Cosmology
20
Histogram of Cell Density Contrast at t = 11
Cell Density Contrast ((density − mean) / mean)
Num
ber o
f Cel
ls
050
010
0015
00
100 binsRange [ −0.77 , 0.59 ]Bin width 0.014Skewness 1.6Kurtosis 4.1
−0.768 −0.496 −0.225 0.046 0.318 0.589
Histogram of Cell Density Contrast at t = 21
Cell Density Contrast ((density − mean) / mean)
Num
ber o
f Cel
ls
050
010
0015
0020
00 100 binsRange [ −0.77 , 2.4 ]Bin width 0.033Skewness 2Kurtosis 5.5
−0.77 −0.13 0.52 0.84 1.16 1.48 1.80 2.12 2.45
Histogram of Cell Density Contrast at t = 31
Cell Density Contrast ((density − mean) / mean)
Num
ber o
f Cel
ls
010
0020
0030
0040
0050
0060
0070
00 100 binsRange [ −0.72 , 15 ]Bin width 0.15Skewness 4.5Kurtosis 23
−0.72 2.33 3.85 5.38 6.90 8.42 9.95 11.47 14.52
t=11 t=21 t=31
Temporal structure dynamics: As time progresses, the range of cell volume and density expands, kurtosis and skewness increases. These statistics are consistent with the governing physics of the formation of high- and low-density structures over time and can perhaps be used to summarize evolution at given time steps.
Feature statistics: Total volume, surface area, curvature, topology of connected components of Voronoi cells classify and quantify features.
Density estimation: Tessellations as intermediate representations enable accurate regular grid density estimators.
Recap
Block abstraction for parallelizing data analysis
Encapsulate data movement in a separate library
Define design patterns for data movement in HPC data analysis in terms of:
• Execution • Temporal behavior
• Communication pattern
The benefits are:
• Abstraction, implementation independence • Reuse, programmer productivity
• Standardization
• Benchmarking 21
Further Reading
22
DIY • Peterka, T., Ross, R., Kendall, W., Gyulassy, A., Pascucci, V., Shen, H.-W., Lee, T.-Y., Chaudhuri, A.: Scalable Parallel Building Blocks for Custom Data Analysis. Proceedings of Large Data Analysis and Visualization Symposium (LDAV'11), IEEE Visualization Conference, Providence RI, 2011. • Peterka, T., Ross, R.: Versatile Communication Algorithms for Data Analysis. 2012 EuroMPI Special Session on Improving MPI User and Developer Interaction IMUDI'12, Vienna, AT.
DIY applications • Peterka, T., Ross, R., Nouanesengsey, B., Lee, T.-Y., Shen, H.-W., Kendall, W., Huang, J.: A Study of Parallel Particle Tracing for Steady-State and Time-Varying Flow Fields. Proceedings IPDPS'11, Anchorage AK, May 2011. • Gyulassy, A., Peterka, T., Pascucci, V., Ross, R.: The Parallel Computation of Morse-Smale Complexes. Proceedings of IPDPS'12, Shanghai, China, 2012. • Nouanesengsy, B., Lee, T.-Y., Lu, K., Shen, H.-W., Peterka, T.: Parallel Particle Advection and FTLE Computation for Time-Varying Flow Fields. Proeedings of SC12, Salt Lake, UT. • Peterka, T., Kwan, J., Pope, A., Finkel, H., Heitmann, K., Habib, S., Wang, J., Zagaris, G.: Meshing the Universe: Integrating Analysis in Cosmological Simulations. Proceedings of the SC12 Ultrascale Visualization Workshop, Salt Lake City, UT. • Chaudhuri, A., Lee-T.-Y., Zhou, B., Wang, C., Xu, T., Shen, H.-W., Peterka, T., Chiang, Y.-J.: Scalable Computation of Distributions from Large Scale Data Sets. Proceedings of 2012 Symposium on Large Data Analysis and Visualization, LDAV'12, Seattle, WA.
Tom Peterka
Mathematics and Computer Science Division
Acknowledgments:
Facilities Argonne Leadership Computing Facility (ALCF)
Oak Ridge National Center for Computational Sciences (NCCS)
Funding DOE SDMAV Exascale Initiative DOE Exascale Codesign Center DOE SciDAC SDAV Institute
https://bitbucket.org/diatomic/diy
ATPESC Talk 8/11/14
“The purpose of computing is insight, not numbers.” –Richard Hamming, 1962