IntroductionThree-point Algorithm
Summary
Algorithms for Higher Order Spatial Statistics
István Szapudi
Institute for AstronomyUniversity of Hawaii
Future of AstroComputing Conference, SDSC, Dec 16-17
I. Szapudi Algorithms for Higher Order Statistics
IntroductionThree-point Algorithm
Summary
Outline
1 Introduction
2 Three-point Algorithm
I. Szapudi Algorithms for Higher Order Statistics
IntroductionThree-point Algorithm
Summary
Random Fields
DefinitionA random field is a spatial field with an associated probabilitymeasure: P(A)DA.
Random fields are abundant in Cosmology.The cosmic microwave background fluctuations constitutea random field on a sphere.Other examples: Dark Matter Distribution, GalaxyDistribution, etc.Astronomers measure particular realization of a randomfield (ergodicity helps but we cannot avoid “cosmic errors”)
I. Szapudi Algorithms for Higher Order Statistics
IntroductionThree-point Algorithm
Summary
Definitions
The ensemble average 〈A〉 corresponds to a functionalintegral over the probability measure.Physical meaning: average over independent realizations.Ergodicity: (we hope) ensemble average can be replacedwith spatial averaging.Symmetries: translation and rotation invariance
Joint Moments
F (N)(x1, . . . , xN) = 〈T (x1), . . . , T (xN)〉
I. Szapudi Algorithms for Higher Order Statistics
IntroductionThree-point Algorithm
Summary
Connected MomentsThese are the most frequently used spatial statistics
Typically we use fluctuation fields δ = T/〈T 〉 − 1
Connected moments are defined recursively
〈δ1, . . . , δN〉c = 〈δ1, . . . , δN〉 −∑
P
〈δ1 . . . ...δi〉c . . . 〈δj . . . δk 〉c . . .
With these the N-point correlation functions are
ξ(N)(1, . . . , N) = 〈δ1, . . . , δN〉c
I. Szapudi Algorithms for Higher Order Statistics
IntroductionThree-point Algorithm
Summary
Gaussian vs. Non-Gaussian distributionsThese two have the same two-point correlation function or P(k)
These have the same two-point correlation function!
I. Szapudi Algorithms for Higher Order Statistics
IntroductionThree-point Algorithm
Summary
Basic ObjectsThese are N-point correlation functions.
Special Cases
Two-point functions 〈δ1δ2〉Three-point functions 〈δ1δ2δ3〉Cumulants 〈δN
R 〉c = SN〈δ2R〉N−1
Cumulant Correlators 〈δN1 δM
2 〉cConditional Cumulants 〈δ(0)δN
R 〉c
In the above δR stands for the fluctuation field smoothed onscale R (different R’s could be used for each δ’s).Host of alternative statistics exist: e.g. Minkowskifunctions, void probability, minimal spanning trees, phasecorrelations, etc.
I. Szapudi Algorithms for Higher Order Statistics
IntroductionThree-point Algorithm
Summary
ComplexitiesCombinatorial explosion of terms
N-point quantities have a large configuration space:measurement, visualization, and interpretation becomecomplex.e.g, already for CMB three-point, the total number of binsscales as M3/2
CPU intensive measurement: MN scaling for N-pointstatistics of M objects.Theoretical estimationEstimating reliable covariance matrices
I. Szapudi Algorithms for Higher Order Statistics
IntroductionThree-point Algorithm
Summary
Algorithmic Scaling and Moore’s Law
Computational resources grow exponentially(Astronomical) data acquisition driven by the sametechnologyData grow with the same exponent
CorrolaryAny algorithm with a scaling worse then linear will becomeimpossible soon
Symmetries, hierarchical structures (kd-trees), MC,computational geometry, approximate methods
I. Szapudi Algorithms for Higher Order Statistics
IntroductionThree-point Algorithm
Summary
Example: Algorithm for 3ptOther algorithms use symmetries
!!4!4!3!2!1
I. Szapudi Algorithms for Higher Order Statistics
IntroductionThree-point Algorithm
Summary
Algorithm for 3pt Cont’d
Naively N3 calculations to find all triplets in the map:overwhelming (millions of CPU years for WMAP)Regrid CMB sky around each point according to theresolutionUse hierarchical algorithm for regridding: N log NCorrelate rings using FFT’s (total speed: 2minutes/cross-corr)The final scaling depends on resolutionN(log N + NθNα log Nα + NαNθ(Nθ + 1)/4)/2With another cos transform one and a double Hankeltransform one can get the bispectrumIn WMAP-I: 168 possible cross correlations, about 1.6million bins altogether.How to interpret such massive measurements?
I. Szapudi Algorithms for Higher Order Statistics
IntroductionThree-point Algorithm
Summary
3pt in WMAP
I. Szapudi Algorithms for Higher Order Statistics
IntroductionThree-point Algorithm
Summary
Recent Challanges
Processors becoming multicore (CPU and GPU)To take advantage of Moore’s law: parallelizationDisk sizes growing exponentially, but not the IO speedData size can become so large that reading mightdominate processingNot enough to just consider scaling
I. Szapudi Algorithms for Higher Order Statistics
IntroductionThree-point Algorithm
Summary
Alternative view of the algorithm: lossy compression
!!4!4!3!2!1
I. Szapudi Algorithms for Higher Order Statistics
IntroductionThree-point Algorithm
Summary
Compression
Compression can increase processing speed simply by theneed of reading less dataThe full compressed data set can be sent to all nodesThis enables parallelization in multicore or MapReduceframeworkFor any algorithm specific (lossy compression) is needed
I. Szapudi Algorithms for Higher Order Statistics
IntroductionThree-point Algorithm
Summary
Another pixellization as lossy compression
I. Szapudi Algorithms for Higher Order Statistics
IntroductionThree-point Algorithm
Summary
Summary
Fast algorithm for calculating 3pt functions with N log Nscaling instead of N3
Approximate algorithm with a specific lossy compressionphaseScaling with resolution and not with data elementsCompression in the algorithm enables multicore orMapReduce style parallelizationWith a different compression we have done approximatelikelihood analysis for CMB (Granett,PhD thesis)
I. Szapudi Algorithms for Higher Order Statistics