Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Sta$s$csinastronomyand
theSAMSI ASTRO Program
G. Jogesh Babu(with input from Eric Feigelson)
PennState
Whyastrosta$s$cs?Astronomersencounterasurprisingvarietyofsta$s$calproblemsintheirresearch:
– Theskyhasvastnumbersofstars&galaxiesandgasonallscales.– Moststarshaveorbi$ngplanets,mostgalaxieshaveamassiveblackhole– Astronomersacquirehugedatasetsofimages,spectra&$meseriesof
planets,stars,galaxies,quasars,supernovae,etc.– Variousproper$esofcosmicpopula$onsobservedandempiricallystudied
withallkindsoftelescopes(n>>p)– Proper$esaremeasuredrepeatedlybutwithirregularspacing.– Spa$aldistribu$onsinsky(2D),space(3D),andparameterspace(pD)is
complex(MVNassump$onusuallyinapplicable)EricFeigelsonandIstartedcollabora$nginlate1980sandtheterm`Astrosta$s$cs’wascoinedinmid1990s,whenwepublishedabookbythesamename.
• Modern statistics is vast in its scope and methodology. It is difficult to find what may be useful (jargon problem!), and there are usually several ways to proceed. Very confusing.
• Some statistical procedures are based on mathematical proofs which
determine the applicability of established results. It is perilous to violate mathematical truths! Some issues are debated among statisticians, or have no known solution.
• Scientific inferences should not depend on arbitrary choices in methodology & variable scale. Prefer nonparametric & scale-invariant methods. Try multiple methods.
• It can be difficult to interpret the meaning of a statistical result with respect to the scientific goal. Statistics is only a tool towards understanding nature from incomplete information.
We should be knowledgeable in our use of statistics
and judicious in its interpretation
Astronomy & Statistics: A glorious past For most of western history,
the astronomers were the statisticians! Ancient Greeks to 18th century
Best estimate of the length of a year from discrepant data? • Middle of range: Hipparcos (4th century B.C.) • Observe only once! (medieval) • Mean: Brahe (16th c), Galileo (17th c), Simpson (18th c) • Median (20th c)
19th century Discrepant observations of planets/moons/comets used to estimate orbital parameters using Newtonian celestial mechanics
• Legendre, Laplace & Gauss develop least-squares regression and normal error theory (c.1800-1820)
• Prominent astronomers contribute to least-squares theory (c.1850-1900)
The lost century of astrostatistics….
In the late-19th and 20th centuries, statistics moved towards human sciences (demography, economics, psychology, medicine, politics) and industrial applications (agriculture, mining, manufacturing). During this time, astronomy recognized the power of modern physics: electromagnetism, thermodynamics, quantum mechanics, relativity. Astronomy & physics were wedded into astrophysics. Thus, astronomers and statisticians substantially broke contact; e.g. the curriculum of astronomers heavily involved physics but little statistics. Statisticians today know little modern astronomy.
Astrosta$s$csatSAMSIAstrostatistics Program at SAMSI January 2006�
• Opening workshop January18-20,2006– Bayesian astrostatistics– Nonparametric inference– Astronomy for Statisticians
• Working groups – Exoplanets, surveys & population studies– Gravitational Lensing– Source detection & feature detection– Particle physics
• Concluded with SCMA IV at Penn State in June 2006
Astrosta$s$csatSAMSIAstrostatistics sub Program Fall 2012
Statistical and Computational Methodology for Massive Datasets • Workshop September19-21,2012
– Thesearchfortransients– Missions(Fermi,SDSS.DES,Plank,LSST,LIGO)– Sparsity(high-dimensionaldata,butlow-dimsignal)– Datamining
• Working groups – Discovery&Classifica$oninSynop$cSurveys;– Inference&Simula$oninComplexModels,– Stochas$cProcesses&AstrophysicalInference– GraphicalModels&GraphicsProcessors
Astrosta$s$csatSAMSI
Exoplanets Summer 2013
ModernSta$s$calandComputa$onalMethodsforAnalysisofKeplerData
June10-28,2013
Astrosta$s$csatSAMSI
Sta4s4cal,Mathema4calandComputa4onalMethodsforAstronomy 2016-2017�
• Opening workshop August22-26,2016– TimeDomainAstronomy(TDA)– Exoplanetdataanalysis,hierarchicalmodeling– Uncertainty,selec$oneffectsforgravita$onalwaves(GW)– Pulsar$mingarraysanddetec$onofGWs– Non-sta$onary,non-Gaussian,irregularlysampledprocesses– Sta$s$calissuesincosmology
SAMSIASTROProgramis$mely
• Gravita$onalwavesdetected100yearsaderEinstein’spredic$on
• FirstGW(GW150914)detectedbyLaserInterferometerGravita$onal-WaveObservatory(LIGO)confirmedEinstein’s1915generaltheoryofrela$vityappliedtoinspiralingbinaryblackholes.
• Theplanningmee$ngonSeptember21,2015includedLIGOscien$stswhohadonlyjustlearnedofthecandidatedetec$on,andhadtokeepitsecretun$lconfirmedandannouncedonFebruary11,2016.AsecondGWevent(GW151226)wasannouncedonJune15,2016.ItwasrecordedonDecember26,2015.
SAMSI2016-17WorkingGroups
• Synop$cTimeDomainSurveys
• Mul$variateandIrregularlySampledTimeSeries
• UncertaintyQuan$fica$onandAstrophysicalEmula$on
• AstrophysicalPopula$ons
• Sta$s$cs,Computa$on,andModelinginCosmology
Synop$cTimeDomainSurveys• TDAhasbeengeingricherwithdatasetsspanningdecades,
manyspectralbands.Includesbothdenseandsparselight-curvesforhundredsofmillionsofsources.
• Thevariety,volumeandvelocitysquarelyfallinBigDataregime.• Thelight-curvesodenhavelargegaps,areheteroskedas$c,and
theintrinsicvariability–odenpoorlyunderstood–addsanelementofuncertaintywhenmul$-banddataarenotobtainedsimultaneously.– binaryblack-holesearchesfromCRTTransientSurvey– Kepler-typeplanetsearch/characteriza$on(alsoinotherWGs)
GroupLeaders:AshishMahabal(Astro,Caltech),G.JogeshBabu(Stat,PSU)
BigQues$ons• Whatmathema$calandsta$s$calapproachescanbeusedtobestcharacterizeandquan$fysalientfeaturesofirregular,heteroscedas$c,gapped$meseries?
• Canweiden$fyspecificfeaturesets,templatesandmodels?Howcanweiden$fymanyweakfeaturesorafewstrongonesinsuchhighdimensional$meseriesBigData?Aretherespecificdomain-knowledgebasedfeaturesthatcanbeiden$fiedtoimproveclassdiscrimina$on?
• Howcansignificantoutliers/anomaliesandsubclassesbedetected?
Subgroups• Datachallenges• Crea$ngdesignerfeaturesforclassifica$on• Whereshouldthenextpointbe?Schedulingandop$mizingobserva$ons.
• Interpola$ng$me-series• Smart/fastwaysofincorpora$ngnon-structuredancillaryinforma$on
• Outlierdetec$on(clustering)• DomainAdapta$onforclassifica$on(combiningdiversedatasets)
• Light-curvedecomposi$on(e.g.ARIMA&ARFIMA)
Exoplanets• Amajorproblemforplanetdetec$onisstellarvariability• KeplerAutoRegressivePlanetSearch(KARPS)uses
ARIMA-typemodelstoremoveaperiodicstellarvaria$onsfollowedbytransitcombfiltertofindperiodicplanettransits
• PreliminaryKARPSanalysisofirregularlyspaced$meseriesusingcadencesfromHungarianAutomatedTelescopeSouth(HATS)network.
Two methods for detecting planets orbiting stars
Photometric2meserieswithperiodictransitswhenplanetpassesinfrontofstar
Radialvelocity2meserieswithperiodicvaria2onswhenplanetpullsstartowardsandawayfromus
A significant challenge is that astronomical time series are usually acquired with irregularly spaced observations.
Statistical challenges in planet detection
The discovery of planets in astronomical radial velocity or photometric time series (light curves) involves 3 statistical stages:
• Time domain removal/suppression of variations intrinsic to the star
• Frequency domain periodogram to reveal periodic signature of planetary orbit
ButthestaritselfoFenshowsvariabilityinbrightnessorradialvelocityduetomagne4cac4vity.Thishasbeenwell-studiedonourSunandstronglymagne4callyac4vestars.Butun4lthesensi4vephotometryoftheKeplermission,itwasnotrecognizedhowstellarmagne4cac4vityissopervasive.
Planetsandstellarac$vity:HideandseekintheCoRoT-7systemHaywoodetal.2014
GPlocalregressionfit
“[E]xoplanets may still be detected by exploiting differences in timescale, shape and wavelength dependence between the planetary and stellar signals. ... However, [stellar] variability, combined with residual instrumental systematics, is still limiting the detection of habitable planets by Kepler. ... [A] better understanding of, and ability to mitigate, stellar variability, continues to be crucial for the continued development of exoplanet studies.”
Suzanne Aigrain IAU 2015
Stellarvariabilityreduc2onmethodsNonparametricmodeling§ IndependentComponentAnalysis(Waldmannetal.2012)§ Waveletdecomposi4on(KeplerTeampipeline;Jenkinsetal.2010)§ GaussianProcessesregression(Aigrainetal.2012&manyothers)§ Correntropy,trendentropy,EmpiricalModeDecomposi4on,
SingularSpectrumAnalysis,…(Huijseetal.2012,Robertsetal.2013,Grecoetal.2016)
Parametricmodeling§ Principalcomponentsanalysis§ Autoregressivemodeling
PeriodicitysearchmethodsAcommontaskisthesearchforperiodicityfromexoplanetaryorbitsinstellar$meseries.Sta$s$caltechniquesinclude§ FastFourierTransformInfinite,sta$onary,evenlyspaced,Gaussiandatawith
sinusoidalsignal§ Lomb-Scargleperiodogram(Scargle1982)Least-squaresfiingofsinusoidsto
irregularlyspaced$meseries§ Phasedispersionminimiza4on(Stellingwerf1977)Binnedvariance-based
measuredesignedfornon-sinusoidalshapes,irregulardata&heteroscedas$cmeasurementerrors
§ Minimumstrengthlength(Dworetzky2003)UnbinnedmeasuresimilartoPDM§ Step-func4on(Gregory&Loredo1992)Bayesiansimultaneousmodellingof
signalshape&strength§ Boxleastsquares(Kovacsetal.2002)Matchedfilterforperiodicboxshape
expectedfromplanetarytransits§ TransitCombFilter(Caceresetal.2017)Matchedfilterforperiodicdouble-
spikeexpectedfromtransitsaderdifferencingoperatorisapplied
Mul$variateandIrregularlySampledTimeSeries
• ThenoisefromGWdetectorschangesonlong$mescaleswithfrequent“glitches”.Howcanwebuildrobustnoisemodelsandexploitinforma$oninauxiliaryenvironmentalchannels?
• Ground-basedphotometricsurveys,withbillionsofirregularlightcurves,detectstellarvariabilityfromamul$tudeofsources,fromvariablestarstosupernovae.Howdoweeffec$velyseparatetheseclassesinirregularlysampleddata?
• Howcanwemaximizethesensi$vityofspace-basedphotometricsurveys(Kepler,TESS)fordetec$ngsmallplanetsinthepresenceofstellarac$vity?
• Howdoweinferplanetmassesfromobserva$onaldataaffectedbystellarac$vity‘jiser’?GroupLeaders:BenFarr(Astro,U.Chicago),SoumenLahiri(Stat,NCSU)
GWfromPulsarTiming
• GWbackgroundproducesafluctua$oninthepulse$mesofarrival(TOAs)fromagivenpulsar.Studyusingcross-correla$onsofpulsarpairs.
• Trypar$alautocorrela$onsandmul$plecorrela$onstoextracttheeffectoftheGWB?
• Demorestetal.(ApJ2016)useBayesianmethodstocomputeupperlimitsontheIsotropicStochas$cGWBfrom9yeardataset.
UncertaintyQuan$fica$onandAstrophysicalEmula$on
• UncertaintyQuan$fica$on(UQ)andReducedOrderModeling(ROM)areatthecoreofmanyproblemsingravita$onandcosmology,fromdirectsimula$onsoftheEinsteinequa$onstotheinverseproblem.
• Leverageexper$sefromareassuchasgeneralizedpolynomialchaosandsimulatoremula$onbasedonstochas$cprocessestoapplytogravita$on,astrophysics,andcosmology.GroupLeaders:DerekBingham(SimonFraser),EarlLawrence(LANL)
AstrophysicalPopula$ons• Improvethesta$s$calmethodologytoinfertheunderlying
popula$onofexoplanets&gravita$onalwaves(GW)sources.• Theexoplanetscommunityneedstorobustlydetectandcharacterize
planetsinthepresenceofstellarac$vityfromphotometricandDopplersurveyswithoutafirst-principlesmodel.
• TheGWcommunityisinterestedindetec$nggravita$onalwavesourcesforwhichthedetailsoftheprimaryGWsignaland/orbackgroundsareunknown.
• Bothapplica$onsrequirealgorithmstoefficientlyexplorehigh-dimensionalparameterspacesandtoestablishconfidenceindetec$ons,despitecomplexandunknownsourcesofuninteres$ngbackgroundsignals.GroupLeaders:JessiCisewski(Stat,Yale);EricFord(Astro,PennState)
Sta$s$cs,Computa$on,andModelingin
Cosmology• Seektobringtogetherleadingresearchersin
cosmology,computa$onalspa$alandBayesiansta$s$cs,experimentaldesign,andcomputermodelingtodevelopmethodologynecessaryforansweringfundamentalques$onsabouttheoriginandlargescalestructureoftheuniverse.
• Howcanwemakeinferencesfordeterminis$cnonlineardynamicalsystems(inferenceofini$alcondi$onsandmodelparameters)?
GroupLeaders:JeffJewell(Astro,JPL),JoeGuinness(Stat,NCSU)
ConclusionSomeofthemostimportantscien$ficproblemsinastronomy&astrophysicsraisechallengingissuesinsta$s$calmethodology:exoplanets,gravita$onalwaves,cosmologyThelargesttelescopeprojectsproduceenormousimaging,$meseriesandtabulardatasetsrequiringsophis$catedmethodsOverthepastdecade,theSAMSIorganiza$onprovidesaforumwheretheseissuescanbepursuedbycross-disciplinaryteamsofsta$s$cians&astronomers