Olspert, N.; Pelt, J.; Käpylä, M. J.; Lehtinen, J. J

This is an electronic reprint of the original article.This reprint may differ from the original in pagination and typographic detail.

Powered by TCPDF (www.tcpdf.org)

This material is protected by copyright and other intellectual property rights, and duplication or sale of all or part of any of the repository collections is not permitted, except that material may be duplicated by you for your research use or educational purposes in electronic or print form. You must obtain permission for any other use. Electronic or print copies may not be offered, whether for sale or otherwise to anyone who is not an authorised user.

Olspert, N.; Pelt, J.; Käpylä, M. J.; Lehtinen, J. J.Estimating activity cycles with probabilistic methods I. Bayesian Generalised Lomb-ScarglePeriodogram with Trend

Published in:Astronomy and Astrophysics

DOI:10.1051/0004-6361/201732524

Published: 01/07/2018

Please cite the original version:Olspert, N., Pelt, J., Käpylä, M. J., & Lehtinen, J. J. (2018). Estimating activity cycles with probabilistic methods I.Bayesian Generalised Lomb-Scargle Periodogram with Trend. Astronomy and Astrophysics, 615, 1-13. [A111].https://doi.org/10.1051/0004-6361/201732524

https://doi.org/10.1051/0004-6361/201732524

https://doi.org/10.1051/0004-6361/201732524

arX

iv:1

712.

0823

5v1

[as

tro-

ph.S

R]

21

Dec

201

7Astronomy & Astrophysics manuscript no. paper c©ESO 2017December 25, 2017

Estimating activity cycles with probabilistic methods I.

Bayesian Generalised Lomb-Scargle Periodogram with Trend

N. Olspert1, J. Pelt2, M. J. Käpylä3, 1, and J. J. Lehtinen3, 1

1 ReSoLVE Centre of Excellence, Aalto University, Department of Computer Science, PO Box 15400, FI-00076 Aalto,Finland

2 Tartu Observatory, 61602 Tõravere, Estonia3 Max-Planck-Institut für Sonnensystemforschung, Justus-von-Liebig-Weg 3, D-37077 Göttingen, Germany

Received / Accepted

ABSTRACT

Context. Period estimation is one of the central topics in astronomical time series analysis, where data is often unevenlysampled. Especially challenging are studies of stellar magnetic cycles, as there the periods looked for are of the orderof the same length than the datasets themselves. The datasets often contain trends, the origin of which is either areal long-term cycle or an instrumental effect, but these effects cannot be reliably separated, while they can lead toerroneous period determinations if not properly handled.Aims. In this study we aim at developing a method that can handle the trends properly, and by performing extensiveset of testing, we show that this is the optimal procedure when contrasted with methods that do not include the trenddirectly to the model. The effect of the noise model on the results is also investigated.Methods. We introduce a Bayesian Generalised Lomb-Scargle Periodogram with Trend (BGLST), which is a probabilisticlinear regression model using Gaussian priors for the coefficients and uniform prior for the frequency parameter.Results. We show, using synthetic data, that when there is no prior information on whether and to what extent the truemodel of the data contains a linear trend, the introduced BGLST method is preferable to the methods which eitherdetrend the data or leave the data untrended before fitting the periodic model. Whether to use different from constantnoise model depends on the density of the data sampling as well as on the true noise model of the process.

Key words. methods: statistical, methods: numerical, stars: activity

1. Introduction

In the domain of astronomical data analysis the task of pe-riod estimation from unevenly spaced time series has beenrelevant topic for many decades. Depending on the context,the term “period” can refer to e.g. rotational period of thestar, orbiting period of the star or exoplanet, or the pe-riod of the activity cycle of the star. Knowing the precisevalue of the period is often very important as many otherphysical quantities are dependent on it. For instance, in thecase of active late-type stars with outer convection zones,the ratio between rotational period and cycle period can beinterpreted as a measure of the dynamo efficiency. There-fore, measuring this ratio gives us crucial information ofthe generation of magnetic fields in stars with varying ac-tivity levels. In practice, both periods are usually estimatedthrough photometry and/or spectrometry of the star.

For the purpose of analysing unevenly spaced time se-ries, many different methods have been developed over theyears. Historically one of the most used ones is the Lomb-Scargle (LS) periodogram (Lomb 1976; Scargle 1982). Sta-tistical properties of the LS and other alternative peri-odograms have been extensively studied and it has be-come a well-known fact that the interpretation of the re-sults of any spectral analysis method takes a lot of effort.

Send offprint requests to: N. Olsperte-mail: [email protected]

The sampling patterns in the data together with the finite-ness of the time span of observations lead to multitudeof difficulties in the period estimation. One of the mostpronounced difficulties is the emergence of aliased peaks(spectral leakage) (Deeming 1975; Lomb 1976; Scargle 1982;Horne & Baliunas 1986). In other words one can only see adistorted spectrum as the observations are made at discrete(uneven) time moments and during a finite time. However,algorithms for eliminating spurious peaks from spectrumhave been developed (Roberts et al. 1987). Clearly, differ-ent period estimation methods tend to perform differentlydepending on the dataset. For comparison of some of thepopular methods see Carbonell et al. (1992).

One of the other issues in spectral estimation ariseswhen the true mean of the measured signal is not known.The LS method assumes a zero-mean harmonic model withconstant noise variance, so that the data needs to be cen-tered before doing the analysis. While every dataset canbe turned into a dataset with a zero mean, in some cases,due to pathological sampling (if the empirical mean1 andthe true mean differ significantly) this procedure can leadto incorrect period estimates. In the literature this problemhas been addressed extensively and several generalizationsof the periodogram, which are invariant to the shifts in the

1 We use the terms “empirical mean” and “empirical trend”when referring to the values obtained by correspondingly fittinga constant or a line to the data using linear regression.

Article number, page 1 of 11

http://arxiv.org/abs/1712.08235v1

A&A proofs: manuscript no. paper

target (output) value, have been proposed. Such include themethod of Ferraz-Mello (1981), who used Gram-Schmidt or-thogonalization of the constant, cosine and sine functionsin the sample domain. The power spectrum is then definedas the square norm of the data projections to these func-tions. In Zechmeister & Kürster (2009) the harmonic modelof the LS periodogram is directly extended with the addi-tion of a constant term, which has become known as theGeneralised Lomb-Scargle (GLS) periodogram. Moreover,both of these studies give the formulations allowing non-constant noise variance. Later, using a Bayesian approachfor a model with harmonic plus a constant, it has beenshown that the posterior probability of the frequency, whenusing uniform priors, is very similar to the GLS spectrum(Mortier et al. 2015). The benefit of the latter method isthat the relative probabilities of any two frequencies canbe easily calculated. Usually in the models, likewise in thecurrent study, the noise is assumed to be Gaussian and un-correlated. Other than white noise models are discussed inVio et al. (2010). For more thorough overview of impor-tant aspects in spectral analysis please refer to VanderPlas(2017).

The focus of the current paper is on another yet unad-dressed issue, namely the effect of linear trend in the datato period estimation. The motivation to tackle this questionarose in the context of analysing the Mount Wilson time se-ries of chromospheric activity (hereafter MW) in a quest tolook for long-term activity cycles (Olspert et al. 2017), thelength of which is of the same order of magnitude than thedata set length itself. In a previous study by (Baliunas et al.1995) detrending was occassionally, but not systematically,used before the LS periodogram was calculated. The cycleestimates from this study have been later used extensivelyby many other studies, for example, to show the presence ofdifferent stellar activity branches (e.g. Saar & Brandenburg1999).

We now raise the following question – whether it is moreoptimal to remove the linear trend before the period searchor leave the data untrended? Likewise with centering, de-trending the data before fitting the harmonic model mayor may not lead to biased period estimates depending onthe structure of the data. The problems arise either dueto sampling effects or due to presence of very long peri-ods in the data. Based on the empirical arguments givenin Sect. 3, we show that it is more preferable to includethe trend component directly into the regression model in-stead of detrending the data a priori or leaving the datauntrended altogether. In the same section we also discussthe effects of noise model of the data to the period estimate.We note that the question that we address in the currentstudy is primarily relevant to the cases where the number ofcycles in the dataset is small. Otherwise, the long coherencetime (if the underlying process is truly periodic) allows tonail down periods, even if there are uneven errors or smalla trend. High number of cycles allows to get exact periodseven e.g. by cycle counting.

The generalised least-squares spectrum allowing arbi-trary components (including linear trend) was first dis-cussed in Vaníček (1971) and more recently a Bayesian ap-proach allowing several harmonics as well as trend compo-nents was introduced in Ford et al. (2011).

2. Method

Now we turn to the description of the present method,which is a generalization of the method proposed inMortier et al. (2015). We introduce a simple Bayesian lin-ear regression model where besides the harmonic compo-nent we have an offset plus a trend2. This is summarised inthe following equation:

y(ti) = A cos(2πfti−φ)+B sin(2πfti−φ)+αti+β+ ǫ(ti),

(1)

where y(ti) and ǫ(ti) are the observation and noise at timemoment ti, f = 1/P is the frequency of the cycle, A, B,α, β and φ are free parameters. Specifically, α is the slopeand β the offset. We assume that the noise is Gaussian andindependent between any two time moments, but we do notassume the equality of variances. For parameter inferencewe use a Bayesian model, where the posterior probabilityis given by

p(f, θ|D) ∝ p(D|f, θ)p(f, θ), (2)

where p(D|f, θ) is the likelihood of the data, p(f, θ) is the

prior probability of the parameters and θ = [A,B, α, β]T

isthe vector of nuisance parameters. Parameter φ is not opti-mized, but set to a frequency dependent value simplifyingthe inference such that cross terms with cos and sin com-ponents will vanish (see Sect. 2.1). The likelihood is givenby

p(D|f, θ) =(

N∏

i=1

1√2πσi

)exp

(−1

2

N∑

i=1

ǫ2iσ2i

), (3)

where ǫi = ǫ(ti) and σ2i is the noise variance at time mo-

ment ti. To make the derivation of the spectrum analyt-ically tractable we take independent Gaussian priors forA,B, α, β and a flat prior for the frequency f . This leadsto the prior probability given by

p(f, θ) = N (θ|µθ,Σθ), (4)

where µθ = [µA, µB, µα, µβ ]T

is the vector of prior meansand Σθ = diag(σ2

A, σ2B , σ

2α, σ

2β) is the diagonal matrix of

prior variances.The larger the prior variances the less information is

assumed to be known about the parameters. Based on whatis intuitively meaningful, in all the calculations we haveused the following values for the prior means and variances:

µA = 0, µB = 0, µα = αslope, µβ = βintercept,

σ2A = 0.5σ2

y, σ2B = 0.5σ2

y, σ2α =

σ2y

∆T 2, σ2

β = σ2y + β2

intercept,

(5)

where αslope and βintercept are the slope and intercept esti-mated from linear regression, σ2

y is the sample variance ofthe data and ∆T is duration of the data. If one does nothave any prior information about the parameters one couldset the variances to infinity and drop the correspondingterms from the equations, but in practice to avoid meaning-less results with unreasonably large parameter values someregularization would be required.

2 The implementation of the method introduced in this papercan be found at https://github.com/olspert/BGLST


https://github.com/olspert/BGLST

N. Olspert et al.: Estimating activity cycles with probabilistic methods

2.1. Derivation of the spectrum

Our derivation of the model closely follows the key pointspresented in Mortier et al. (2015). Consequently, we use asidentical notion of the variables as possible. Using the like-lihood and prior defined by Eqs. (3) and (4), the posteriorprobability of the Bayesian model given by Eq. (2) can beexplicitly written as

p(f, θ|D) ∝ p(D|f, θ)p(θ)= p(D|f, θ)N (θ|µθ,Σθ)

=

N∏

i=1

1√2πσi

k∏

i=1

1√2πσθi

e−12E ,

(6)

where

E =N∑

i=1

ǫ2iσ2i

+k∑

i=1

(θi − µθi)2

σ2θi

. (7)

Here θi, i = 1, .., k, k = 4, denotes the ith element of θ,i.e. either A, B, α, or β. From Eqs. (1) and (3) we see thatthe likelihood and therefore posterior probability for everyfixed frequency f is multivariate Gaussian w.r.t. the param-eters A, B, α and β. In principle we are interested findingthe optimum for the full joint posterior probability densityp(f, θ|D), but as for every fixed frequency the latter distri-bution is a multivariate Gaussian we can first marginaliseover all nuisance parameters, find the optimum for the fre-quency f e.g. by doing grid search and later analytically findthe posterior means and covariances of the other parame-ters. The marginal distribution for the frequency parameteris expressed as

p(f |D) ∝∫

p(D|f, θ)N (θ|µθ,Σθ)dθ. (8)

Here the integrals are assumed to be taken over the wholerange of parameter values, i.e. from −∞ to ∞. We introducethe following notations:

wi =1

σ2i

, wA =1

σ2A

, wB =1

σ2B

, wα =1

σ2α

, wβ =1

σ2β

, (9)

W =

N∑

i=1

wi + wβ , Y =

N∑

i=1

wiyi + wβµβ , (10)

C =N∑

i=1

wi cos(2πfti − φ), (11)

S =

N∑

i=1

wi sin(2πfti − φ), (12)

T =

N∑

i=1

witi, (13)

Y C =

N∑

i=1

wiyi cos(2πfti − φ) + wAµA, (14)

Y S =N∑

i=1

wiyi sin(2πfti − φ) + wBµB, (15)

CC =

N∑

i=1

wi cos2(2πfti − φ) + wA, (16)

SS =

N∑

i=1

wi sin2(2πfti − φ) + wB , (17)

T T =N∑

i=1

wit2i + wα, (18)

Y T =

N∑

i=1

wiyiti + wαµα, (19)

TC =

N∑

i=1

witi cos(2πfti − φ), (20)

T S =

N∑

i=1

witi sin(2πfti − φ), (21)

Y Y =N∑

i=1

wiy2i + wAµ

2A + wBµ

2B + wαµ

2α + wβµ

2β . (22)

If the value of φ is defined such that the sine and cosinefunctions are orthogonal (for the proof see Mortier et al.2015), namely

φ =1

2arctan

[∑Ni=1 wi sin(4πfti)∑N

i=1 wi cos(4πfti)

], (23)

then we have

E = CCA2 − 2Y CA+ 2αTCA+ 2βCA

+ SSB2 − 2Y SB + 2αTSB + 2βSB

+ T Tα2 − 2Y Tα+ 2Tβα+Wβ2 − 2Y β + Y Y ,

(24)

where we have grouped the terms with A on the first line,terms with B on the second line and the rest of the termson the third line.

In the following we will repeatedly use the formula ofthe following definite integral:∫

∞

−∞

e−ax2−2bxdx =

√π

ae

b2

a , a > 0. (25)

To calculate the integral in Eq. (8), we start by integrating

first over A and B. Assuming that CC and SS are greaterthan zero and applying Eq. (25), we will get the solutionfor the integral containing terms with A:

∫∞

−∞

exp

(− CCA2 − 2Y CA+ 2αTCA+ 2βCA

2

)dA

=

√2π

CCexp

((Y C − αTC − βC)2

2CC

)

= exp

(−αY CTC

CC+

α2TC2

2CC+

αβTCC

CC

)

· exp(−βY CC

CC+

β2C2

2CC

)

·√

2π

CCexp

(Y C

2

2CC

).



(26)

In the last expression we have grouped onto separate linesthe terms with α, terms with β not simultaneously con-taining α, and constant terms. Similarly for the integralcontaining terms with B:

∫∞

−∞

exp

(− SSB2 − 2Y SB + 2αTSB + 2βSB

2

)dB

=

√2π

SSexp

((Y S − αTS − βS)2

2SS

)

= exp

(−αY STS

SS+

α2T S2

2SS+

αβTSS

SS

)

· exp(−βY SS

SS+

β2S2

2SS

)

·√

2π

SSexp

(Y S

2

2SS

).

(27)

Now we gather the coefficients for all the terms with α2

from the last line of Eq. (24) (keeping in mind the factor-1/2) as well as from Eqs. (26), (27) into new variable K:

K =1

2

(−T T +

TC2

CC+

T S2

SS

). (28)

We similarly introduce new variable L for the coefficientsinvolving all terms with α from the same equations:

L =

(Y T − βT +

−Y CTC + βTCC

CC+

−Y ST S + βTSS

SS

).

(29)

After these substitutions, integrating over α can again beaccomplished with the help of Eq. (25) and assuming K <0:

∫∞

−∞

exp(Kα2 + Lα) =

√π

−Kexp

(L2

−4K

)

=

√π

−Kexp

((M +Nβ)2

−4K

)

=

√π

−Kexp

(N2β2

−4K

)exp

(2MNβ

−4K

)exp

(M2

−4K

),

(30)

where

M = Y T − Y CTC

CC− Y ST S

SS(31)

and

N =TCC

CC+

T SS

SS− T. (32)

At this point what is left to do is the integration over β. Tosimplify things once more, we gather the coefficients for allthe terms with β2 from Eqs. (24), (26), (27), (30) into newvariable P :

P =C2

2CC+

S2

2SS− W

2− N2

4K. (33)

We similarly introduce new variable Q for the coefficientsinvolving all terms with β from the same equations:

Q = − Y CC

CC− Y SS

SS+ Y − 2MN

4K. (34)

With these substitutions we are ready to integrate over βusing again Eq. (25) while assuming P < 0:

∫∞

−∞

exp(Pβ2 +Qβ) =

√π

−Pexp

(Q2

−4P

). (35)

After gathering all remaining constant terms fromEqs. (24), (26), (27), (30), we finally obtain:

p(f |D) ∝ 2π2

√(CCSSKP )

· exp(Y C

2

2CC+

Y S2

2SS− M2

4K− Q2

4P− Y Y

2

).

(36)

For the purpose of fitting the regression curve to thedata one would be interested in obtaining the optimal val-ues for the nuisance parameters A,B, α, β. This can be eas-ily done after fixing the frequency to its optimal value usingEq. (36) with grid search, and noticing that p(θ|D, fopt) isa multivariate Gaussian distribution. In the following welist the corresponding posterior means of the parameters3:

µβ = − Q

2P, (37)

µα = −Nµβ +M

2K, (38)

µA =Y C − TCµα − Cµβ

CC, (39)

µB =Y S − T Sµα − Sµβ

SS. (40)

Formula for the full covariance matrix for the parameters,assuming a model with constant noise variance, can befound in Murphy (2012, Chapter 7.6.1).

If we now consider the case when SS = 0 (for moredetails see Mortier et al. 2015), we see that also S = 0,

Y S = 0 and T S = 0. Consequently the integral for B isproportional to a constant. We can define the analogues of

3 This is similar to Empirical Bayes approach as we use thepoint estimate for f .



the constants K through Q for this special case as

KC =1

2

(−T T +

TC2

CC

), (41)

LC =

(Y T − βT +

−Y CTC + βTCC

CC

), (42)

MC = Y T − Y CTC

CC, (43)

NC =TCC

CC− T, (44)

PC =C2

2CC− W

2− N2

C

4KC

, (45)

QC = − Y CC

CC+ Y − 2MCNC

4KC

. (46)

Finally, we arrive at the expression for the data likelihood

p(f |D) ∝ 2π2

√(CCKCPC)

· exp(Y C

2

2CC− M2

C

4KC

− Q2C

4PC

− Y Y

2

).

(47)

Similarly we can handle the situation when CC = 0. In thederivation we also assumed that K < 0 and P < 0. Ourexperiments with test data showed that the condition forK < 0 was always satisfied, but occasionally P obtainedvalues zero and probably due to numerical rounding errorsalso very low positive values. These special cases we han-dled by dropping the corresponding terms from Eqs. (36)and (47). Theoretically confirming or disproving that theconditions K < 0 and P ≤ 0 always hold is however out ofthe scope of current study.

2.2. Dealing with multiple harmonics

The formula for the spectrum is given by Eq. (36), whichrepresents the posterior probability of the frequency f ,given data D and our harmonic model MH i.e. p(f |D,MH).Although being a probability density, it is still convenientto call this frequency-dependent quantity a spectrum. Wepoint out, that if the true model matches the given model,MH, then the interpretation of the spectrum is straightfor-ward, namely being the probability distribution of the fre-quency. This will give us a direct way for error estimation,e.g. by fitting a Gaussian to the spectral line (Bretthorst1988, Chapter 2). When the true model has more than oneharmonic, the interpretation of the spectrum is not anymoredirect due to mixing of the probabilities from different har-monics. The correct way to address this issue would be touse more complex model with at least as many harmonicsthat there are expected to be in the underlying process (oreven better, to infer the number of components from data).However, as a simpler workaround, the ideas of cleaning thespectrum introduced in Roberts et al. (1987) can be usedto iteratively extract significant frequencies from the spec-trum calculated using a model with single harmonic.

2.3. Significance estimation

To estimate the significance of the peaks in the spectrumwe perform a model comparison between the given modeland a model without harmonics, i.e. only with offset andtrend. One practical way to do this is to calculate

∆BIC = BICMnull− BICMH , (48)

where Mnull is the linear model without harmonic, BIC =ln(n)k − 2ln(L) is the Bayesian Information Criterion(BIC), n is the number of data points, k the number of

model parameters and L = p(x|θ,M) is the likelihood ofdata for model M using the parameter values that maxi-mize the likelihood. This formula is an approximation to thelogarithmic Bayes factor, more precisely ∆BIC ≈ 2 lnK,where K is the Bayes factor. Strength of evidence formodels with 2 lnK > 10 are considered very strong, with6 ≤ 2 lnK < 10, strong and with 2 ≤ 2 lnK < 6 positive.For a harmonic model MH, θ include optimal frequency,and coefficients of the harmonic component, trend and off-set, for Mnull only the last two coefficients.

In the derivation of the spectrum we assumed that thenoise variance of the data points is known, which is usuallynot the case. In probabilistic models, this parameter shouldalso be optimized, however in practice often sample varianceis taken as the estimate for it. This is also the case withLS periodogram. However, it has been shown that whennormalizing the periodogram with the sample variance thestatistical distribution slightly differs from the theoreticallyexpected one (Schwarzenberg-Czerny 1998). More realisticapproach, especially in the case when the noise cannot beassumed stationary is to use e.g. subsample variances in asmall sliding window around the data points.

As a final remark in this section, we want to emphasizethat the probabilistic approach does not remove the burdenof dealing with spectral aliases due to data sampling. Thereis always a chance of false detection, thus the interpretationof the spectrum must be done with care.

3. Experiments

In this section we undertake some experiments to comparethe performance of the introduced method with LS and/orGLS periodograms. As the effect of an offset in the ran-domly sampled data to the period estimate has been welldescribed in Mortier et al. (2015), then we skip the discus-sion of similar examples. Instead, we will discuss the effectsof a linear trend in the data and a nonconstant noise vari-ance to the period estimate, as we find them to be lessaddressed topics in the literature.

3.1. Effect of a linear trend

Let us start with the question of how much the presence ofa linear trend in the data affects the period estimate. Tomeasure the performance of the BGLST method introducedin Sect. 2, we do the comparison to plain GLS and GLSwith preceding linear detrending (GLS-T). Throughout thissection we assume a constant noise variance. We draw thedata with time span ∆T ≈ 30 time units randomly fromthe process with following harmonic covariance function:

cov(ti, tj) = σS cos(2πf(ti − tj)) + σTtitj + δijσ2N, (49)



0.1

0.2

0.3

0.4

0.5

0.6

0.7

S1

(a)

BGLST GLS-T GLS

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6

k

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

S2

(b)

Fig. 1. Results of the experiments with varying linear trendusing uniform sampling. (a) Performance measure S1 as functionof k. (b) The same for performance measure S2.

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

S1

(a)

BGLST GLS-T GLS

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6

k

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

0.50

S2

(b)

Fig. 2. Results of the experiments with varying linear trend us-ing sampling similar to MW datasets. (a) Performance measureS1 as function of k. (b) The same for performance measure S2.

where σS, σT, and σN are the signal, trend, and noisevariances, respectively, and δij is a Kronecker delta. Wevaried the trend variance σT = k(σS + σN)/∆T , suchthat k ∈ [0, 0.1, 0.2, 0.4, 0.8, 1.6]. For each k we generatedN = 2000 time series, where each time the mean of thesignal was drawn uniformly from interval [−1, 1], S/N ra-tio from [0.2, 0.8] and period P = 1/f from [2, 2/3∆T ]. Werepeated the experiments with two forms of sampling: uni-form and the one based on the samplings of MW datasets.The prior means and variances for our model have beenchosen according to Eq. (5).

In these experiments we compare three different meth-ods – the BGLST, GLS-T and GLS. We measure the per-formance of each method using the following two statistics:

S1 =1

N

N∑

i=1

I(∆i < ∆Ai )I(∆i < ∆B

i ), (50)

where I is the indicator function, ∆i = |fi − f truei | is the

absolute error of the period estimate in the i-th experimentusing given method (either BGLST, GLS-T or GLS) and∆A

i and ∆Bi are the corresponding values for the other two

methods. The second performance statistic we measure isthe average relative error of the estimates:

S2 =1

N

N∑

i=1

δi, (51)

where δi = ∆i/ftrue.

The results for the experiments with uniform samplingare shown in Fig. 1 and for sampling taken from MWdataset in Fig. 2. From panels (a) of the figures we see thatthe fraction of number of experiments where the estimate ofBGLST method is better than the estimate of the other twomethods is increasing with trend variance. When there is notrend (k = 0) our method coincides with the GLS methodas expected, but as we see, detrending has negative effectin that case. When the true trend increases then GLS-Tstarts to outperform the GLS, but both are far from theBGLST method. From panels (b) we see that when thereis no trend in the dataset at all, all the three methods havethe same average relative error and while for our method itstays the same or slightly decreases with increasing k, forboth other methods the errors start to increase rapidly.

3.1.1. Special cases

Next we illustrate the benefit of using BGLST model withtwo examples, where the differences to the other modelsare the well emphasized. For that purpose we first gener-ated such a dataset where the empirical slope significantlydiffers from the true slope. The test contained only oneharmonic with a frequency of 0.014038 and a slope 0.01.We again fit three models – BGLST, GLS and GLS-T. Thecomparison of the results are shown in Fig. 3. As is evidentfrom Fig. 3(a), the empirical trend 0.0053 recovered by theGSL-T method (blue dashed line) differs significantly fromthe true trend (black dotted line). Both the GSL and GSL-Tmethods perform very poorly in fitting the data, while onlythe BGLST model produces a fit that represents the datapoints adequately. Moreover, BGLST recovers very close tothe true trend value 0.009781. From Fig. 3(b) it is evidentthat the BGLST model retrieves a frequency closest to the



0 20 40 60 80 100 120 140 160 180

t

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

y

(a)

0.005 0.010 0.015 0.020 0.025

f

0.0

0.2

0.4

0.6

0.8

1.0

Power

(b)

Fig. 3. Comparison of the results using different models. Thetrue model of the data contains one harmonic, trend and additivewhite Gaussian noise. (a) Data (black crosses), BGLST model(red continuous curve), GLS-T model (blue dashed curve), GLSmodel (green dash-dotted curve), true trend (black dotted line),trend from BGLST model (red continuousline), empirical trend(blue dashed line). (b) Spectra of the corresponding models withvertical lines marking the locations of maxima. The black dottedline shows the position of the true frequency.

true one, while the other two methods return too low valuesfor it. The performance of the GLS model is the worst, asexpected. The frequency estimates for BGLST, GLS-T andGLS are correspondingly: 0.013969, 0.013573 and 0.01288.This simple example shows that when there is a real trendin the data, due to the sampling patterns, detrending canbe erroneous, but it still leads to better results than notdoing the detrending at all.

Finally we would like to show a counterexample wheredetrending is not the preferred option. This happens whenthere is a harmonic signal in the data with a very longperiod, in this test case 0.003596. The exact situation isshown in Fig. 4(a), wherefrom it is evident that the GLS-T method can determine the harmonic variation itself as atrend component, and lead to a completely erroneous fit.Fig. 4(b) we show the corresponding spectra. As expected,the GLS model, coinciding with the true model, gives thebest estimate, while the BGLST model is not far off. Thedetrending, however, leads to significantly worse estimate.The estimates in the same order are the following: 0.003574,0.003673 and 0.005653. The value of the trend learned byBGLST method was -0.00041, which is very close to zero.

The last example clearly shows that even in the simplestcase of pure harmonic, if the dataset does not contain exactnumber of periods the spurious trend component arises. Ifpretrended, then this introduces bias into period estima-tion. In the case of BGLST method the sinusoid can befitted to its fragments and zero trend can be recovered.

0 20 40 60 80 100 120 140 160 180

t

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

y

(a)

0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016 0.018

f

0.0

0.2

0.4

0.6

0.8

1.0

Power

(b)

Fig. 4. Comparison of the results using different models. Thetrue model contains one long harmonic with additive whiteGaussian noise. (a) Data (black crosses), BGLST model (redcontinuous curve) and its trend component (red line), GLS-Tmodel with trend added back (blue dashed curve), empiricaltrend (blue dashed line), GLS model (green dash-dotted curve).(b) Spectra of the corresponding models with vertical lines mark-ing the locations of maxima. The black dotted line shows theposition of the true frequency.

3.2. Effect of a nonconstant noise variance

We continue with investigating the effect of a nonconstantnoise variance to the period estimate. In these experimentsthe data is generated from a purely harmonic model witha zero mean and no linear trend. We compare the resultsfrom the BGLST to the LS method. As there is no trendand offset in the model, we set the prior means for thesetwo parameters to zero and variances to very low values.This essentially means that we are approaching the GLSmethod with a zero mean. In all the experiments the timerange of the observations is ti ∈ [0, T ], where T = 30 units,the period is drawn from P ∼ Uniform(5, 15) and the signalvariance is σ2

S = 1. For each experiment we draw two valuesfor noise variance σ2

1 ∼ Uniform(2σ2S, 10σ

2S) and σ2

2 = σ21/k,

where k ∈ [1, 2, 4, 8, 16, 32] which are used in different se-tups as indicated in the second column of Table 1. For eachk we repeat the experiments N = 2000 times.

Let ∆i = |fi − f truei | denote the absolute error of the

BGLST period estimate in the i-th experiment and ∆LSi =

|fLSi −f true

i | the same for the LS method. Then we measurea performance statistic

S3 =1

N

N∑

i=1

I(∆i < ∆LSi ), (52)

where I is the indicator function. Let δi = ∆i/ftrue and

δLSi = ∆LSi /f true be the relative errors of the period deter-



Table 1. Description of experimental setups with nonconstantnoise variance. The first column indicates the number of thesetup, the second column shows how the variance σ2

i for i-thdata point was selected and third column the criteria of drawingthe time moment ti for i-th data point.

No. Form of noise Type of sampling1 σ2

i = σ21 +

tiT(σ2

2 − σ21) ti ∼ Uniform(0,T)

2 σ2i =

{σ22 , if ti < T/2

σ21 , if ti ≥ T/2

ti ∼ Uniform(0,T)

3σ2i = σ2

2+σ2(emp)i

−σ2(emp)min

σ2(emp)max −σ

2(emp)min

(σ21 − σ2

2)Based on MW dataset

Notes. In the 3rd row the σ2(emp)i

denotes the empirical vari-

ance of the i-th datapoint and σ2(emp)min and σ

2(emp)max the minimum

and maximum intra-seasonal variances. In other words we haverenormalized the intra-seasonal variances to the interval from σ2

1

and σ22 .

mination. The second performance statistic we measure isthe relative difference in the average relative errors betweenthe methods i.e.

S4 = 1−∑N

i=1 δi∑N

i=1 δLSi

. (53)

The list of experimental setups with the descriptions ofthe models are shown in Table. 1. In the first experiment thenoise variance is linearly increasing from σ2

1 to σ22 . In prac-

tice this could correspond to decaying measurement accu-racy over time. In the second experiment the noise varianceabruptly jumps from σ2

2 to σ21 in the middle of the time se-

ries. This kind of situation could be interpreted as a changeof one instrument to another, more accurate, one. In bothof these experiments the sampling is uniform. In the thirdexperiment we use samplings patterns of randomly chosenstars from MW dataset and the noise variance is calculatedbased on empirical intra-seasonal variances. In the first twoexperiments the number of data points was n = 200 and inthe third between 200 and 400 depending on the randomlychosen dataset, which we downsampled.

In Fig. 5 the results of different experimental setups areshown, where the performance statistics S3 and S4 are plot-ted as function of

√k = σ1/σ2. We see that when the true

noise is constant in the data (k = 1), the BGLST methodperforms identically to the LS method while for greatervalues of k the difference grows bigger between the meth-ods. For the 2nd experiment, where the noise level abruptlychanges, the advantage of using a model with nonconstantnoise seems to be the best, while for the other experimentsthe advantage is slightly smaller, but still substantial forlarger k-s.

In Fig. 7 we have shown the models with maximallydiffering period estimates from the 2nd experiment for

√k =

5.66. On the left column of the figures there is shown thesituation where our method outperforms the most the LSmethod and on the right column vice versa. For colour andsymbol coding, please see to the caption of Fig. 7. This plotis a clear illustration of the fact that even though on averagethe method with nonconstant noise variance is better thanLS method, for each particular dataset this might not bethe case. Nevertheless it is also apparent from the figurethat when the LS method is winning over BGLST method,

0.50

0.55

0.60

0.65

S3

(a)

Exp. no. 1 Exp. no. 2 Exp. no. 3

1 2 3 4 5

σ1/σ2

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

S4

(b)

Fig. 5. Results of the experiments with known nonconstantnoise variance. (a) Performance measures S3 as function ofσ1/σ2. (b) The same for performance measure S4.

0.45

0.50

0.55

0.60

0.65

0.70

S3

(a)

Exp. no. 1 Exp. no. 2 Exp. no. 3

1 2 3 4 5

σ1/σ2

−0.05

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

S4

(b)

Fig. 6. Results of the experiments with unknown nonconstantnoise variance. The noise variance is empirically estimated fromthe data. (a) Performance measures S3 as function of σ1/σ2. (b)The same for performance measure S4. The black dotted hori-zontal line shows the break even point between the two methods.



0 5 10 15 20 25

t

−8

−6

−4

−2

0

2

4

6

8

y(a)

0 5 10 15 20 25

t

−8

−6

−4

−2

0

2

4

6

8

y

(b)

0.02 0.04 0.06 0.08 0.10 0.12 0.14

f

0.0

0.2

0.4

0.6

0.8

1.0

Power

(c)

0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16

f

0.0

0.2

0.4

0.6

0.8

1.0

Power

(d)

Fig. 7. Comparison of the methods for experiment with setup 2. (a) and (b) Data points with black crosses, red continuouscurve – BGLST model, blue dashed line – LS model fit. (c) and (d) Spectra of the models with same colours and line styles. Thevertical lines correspond to the optimal frequencies found and the black dotted line shows the true frequency. The plots on twocolumns correspond to the biggest difference in the period estimates from both methods w.r.t. each other. On the left our methodoutperforms LS method, on the right vice versa.

the gap tends to be slightly smaller than in the oppositesituation.

In the previous experiments the true noise variance wasknown, but this rarely happens in practice. However, whenthe data sampling is sufficiently dense, we can estimate thenoise variance empirically using e.g. narrow sliding windowaround each data point. We now repeat the experimentsusing such an approach. In experiment setups 1 and 2 weused windows with the length 1 unit and in the setup 3 theintra-seasonal variances. In the first two cases we increasedthe number of data points to 1000 and in the latter case wedid not downsample the data. The performance statisticsfor these experiments are shown in Fig. 6. We see that whenthe true noise variance is constant (k = 1) then using empir-ically estimated noise in the model leads to slightly worseresults than with constant noise model, however roughlystarting from the value of k = 1.5 in all the experimentsthe former approach starts to outperform the latter. Thelocation of the break-even point obviously depends on howprecisely the true variance can be estimated from the data.This, however depends on how dense is the data samplingand how short is the expected period in the data, becausewe want to avoid counting signal variance as a part of thenoise variance estimate.

3.3. Real datasets

As the last examples we consider time series of two starsfrom the MW dataset. In Figs. 8 and 9 we show the differ-ences between period estimates for the stars HD37394 andHD3651 correspondingly. For both stars we see that the

linear trends fitted directly to the data (blue dash-dottedline) significantly differ from the trend component in theharmonic model (red dash-dotted line) with both types ofnoise models (compare the left and right columns of theplots), and consequently, also the period estimates in be-tween the methods always tend to vary somewhat.

Especially in the case of HD37394, the retrieved periodestimates differ significantly. Assuming a constant noisemodel for this star results in the BGLST and GLS/GLS-T to deviate strongly, the BGLST model producing a sig-nificantly larger period estimate. With intra-seasonal noisemodel, all three estimates agree more to each other, how-ever, BGLST still gives the longest period estimate. In theprevious study by Baliunas et al. (1995) the cycle periodfor this star has been reported to be 3.6 yrs which exactlymatches the period estimate of GLS-T estimate with con-stant noise model4. The period estimate of BGLST method,being less sensitive to the noise model, is 5.8 yrs.

In the case of HD3651 the period estimates of the mod-els are not that sensitive to the chosen noise model, butthe dependence on how the trend component is handled,is significant. Interestingly none of these period estimatesmatch the estimate from Baliunas et al. (1995) (13.8 yrs),where the GLS-T was used. All three estimates shown hereare above 16 yrs. As can be seen from the model fits inFig. 8(a) and (b), they significantly deviate from the data,so we must conclude that the true model is not likely tobe harmonic. Conceptually the generalization of the modelproposed in the current paper to the nonharmonic case is

4 In their study they have dropped the data prior 1980, but thishas negligible effect on the results



1980 1982 1984 1986 1988 1990 1992 1994

t [yr]

0.30

0.35

0.40

0.45

0.50

0.55

0.60S-index

(a)

1980 1982 1984 1986 1988 1990 1992 1994

t [yr]

0.30

0.35

0.40

0.45

0.50

0.55

0.60(b)

BGLST GLS-T GLS

0.1 0.2 0.3 0.4 0.5

f

0.0

0.2

0.4

0.6

0.8

1.0

Power

(c)

0.1 0.2 0.3 0.4 0.5

f

0.0

0.2

0.4

0.6

0.8

1.0(d)

Fig. 8. Comparison of the results for star HD37394 using BGLST, GLS-T and GLS models with constant noise model on the plotson the left column and with intra-seasonal noise model on the right column. (a) and (b) Data (black crosses), BGLST model (redcurve), GLST-T model with trend added back (blue dashed curve), GLS model (green dash-dotted curve), the trend componentof BGLST model (red line) and the empirical trend (blue dashed line). (c) and (d) Spectra of the models with same colours. Thedashed lines mark the locations of corresponding maxima.

1970 1975 1980 1985 1990 1995

t [yr]

0.12

0.14

0.16

0.18

0.20

0.22

0.24

S-index

(a)

1970 1975 1980 1985 1990 1995

t [yr]

0.12

0.14

0.16

0.18

0.20

0.22

0.24(b)

BGLST GLS-T GLS

0.1 0.2 0.3 0.4 0.5

f

0.0

0.2

0.4

0.6

0.8

1.0

Power

(c)

0.1 0.2 0.3 0.4 0.5

f

0.0

0.2

0.4

0.6

0.8

1.0(d)

Fig. 9. Comparison of the results for star HD3651 using BGLST, GLS-T and GLS models. The meaning of the panels and colourcoding is identical to the in Fig. 8.



straightforward, but becomes analytically and numericallyless tractable. Fitting periodic and quasi-periodic modelsto the MW data, which are based on Gaussian Processes,are discussed in Olspert et al. (2017).

4. Conclusions

In this paper we introduced a Bayesian regression modelwhich involves, besides harmonic and offset components,also a linear trend component. The main focus of this pa-per was on addressing the effect of linear trends in data tothe period estimate. We showed that when there is no priorinformation on whether and to what extent the true modelof the data contains linear trend, it is more preferable to in-clude the trend component directly to the regression modelrather than either detrending the data or leaving the datauntrended before fitting the periodic model.

We also showed that if the true noise model of the data isfar from being constant, then by neglecting this knowledge,period estimates start to deteriorate. Unfortunately in prac-tice the true noise model is almost never known and in manycases impossible to estimate empirically (e.g. sparse sam-pling rate compared to the true frequency). In this studywe, however, focused only on the long period search task,which made estimating the noise variance on narrow lo-cal subsamples possible. Based on the experiments we sawthat if the true noise is not constant and the extremes ofthe S/N ratio differ at least two times, then using a modelwith empirically estimated noise variance is well justified.

Acknowledgements. This work has been supported by the Academyof Finland Centre of Excellence ReSoLVE (NO, MJK, JP), FinnishCultural Foundation grant no. 00170789 (NO) and Estonian Re-search Council (Grant IUT40-1; JP). MJK and JL acknowledge the’SOLSTAR’ Independent Max Planck Research Group funding.

The HK_Project_v1995_NSO data derive from the MountWilson Observatory HK Project, which was supported by bothpublic and private funds through the Carnegie Observatories, theMount Wilson Institute, and the Harvard-Smithsonian Center forAstrophysics starting in 1966 and continuing for over 36 years. Thesedata are the result of the dedicated work of O. Wilson, A. Vaughan,G. Preston, D. Duncan, S. Baliunas, and many others.

References

Baliunas, S. L., Donahue, R. A., Soon, W. H., et al. 1995, ApJ, 438,269

Bretthorst, L. G. 1988, Bayesian Spectrum Analysis and ParameterEstimation, Vol. 48 (Springer)

Carbonell, M., Oliver, R., & Ballester, J. L. 1992, A&A, 264, 350Deeming, T. J. 1975, Ap&SS, 36, 137Ferraz-Mello, S. 1981, AJ, 86, 619Ford, E. B., Moorhead, A. V., & Veras, D. 2011, ArXiv e-prints

[arXiv:1107.4047]Horne, J. H. & Baliunas, S. L. 1986, ApJ, 302, 757Lomb, N. R. 1976, Astrophysics and Space Science, 39, 447Mortier, A., Faria, J. P., Correia, C. M., Santerne, A., & Santos, N. C.

2015, A&A, 573, A101Murphy, K. P. 2012, Machine learning: a probabilistic perspective

(Cambridge, MA)Olspert, N., Lehtinen, J., Grigorievskiy, A., Käpylä, M. J., & Pelt, J.

2017, ArXiv e-prints [arXiv:1712.XXXX]Roberts, D. H., Lehar, J., & Dreher, J. W. 1987, AJ, 93, 968Saar, S. H. & Brandenburg, A. 1999, ApJ, 524, 295Scargle, J. D. 1982, 263, 835Schwarzenberg-Czerny, A. 1998, MNRAS, 301, 831VanderPlas, J. T. 2017, ArXiv e-prints [arXiv:1703.09824]Vaníček, P. 1971, Ap&SS, 12, 10Vio, R., Andreani, P., & Biggs, A. 2010, A&A, 519, A85Zechmeister, M. & Kürster, M. 2009, A&A, 496, 577


Documents

Olspert, N.; Pelt, J.; Käpylä, M. J.; Lehtinen, J. J