43
A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2015 http://www.astro.cornell.edu/~cordes/A6523 Lecture 18 Matched filtering Localization

Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

A6523 Signal Modeling, Statistical Inference and

Data Mining in Astrophysics

Spring 2015 http://www.astro.cornell.edu/~cordes/A6523

Lecture 18 – Matched filtering – Localization

Page 2: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Exp Astron (2015) 39:1–10DOI 10.1007/s10686-014-9432-z

ORIGINAL ARTICLE

Kolmogorov-Smirnov like test for time-frequencyFourier spectrogram analysis in LISA Pathfinder

Luigi Ferraioli · Michele Armano · Heather Audley · Giuseppe Congedo ·Ingo Diepholz · Ferran Gibert · Martin Hewitson · Mauro Hueller ·Nikolaos Karnesis · Natalia Korsakova · Miquel Nofrarias · Eric Plagnol ·Stefano Vitale

Received: 15 April 2013 / Accepted: 1 December 2014 / Published online: 19 December 2014© Springer Science+Business Media Dordrecht 2014

Abstract A statistical procedure for the analysis of time-frequency noise maps ispresented and applied to LISA Pathfinder mission synthetic data. The procedure isbased on the Kolmogorov-Smirnov like test that is applied to the analysis of time-frequency noise maps produced with the spectrogram technique. The influence ofthe finite size windowing on the statistic of the test is calculated with a Monte Carlosimulation for 4 different windows type. Such calculation demonstrate that the teststatistic is modified by the correlations introduced in the spectrum by the finite sizeof the window and by the correlations between different time bins originated by over-lapping between windowed segments. The application of the test procedure to LISAPathfinder data demonstrates the test capability of detecting non-stationary features

L. Ferraioli (!) · E. PlagnolAPC, Universite Paris Diderot, CNRS/IN2P3, CEA/Ifru, Observatoire de Paris, Sorbonne Paris Cite,10 Rue A. Domon et L. Duquet, 75205 Paris Cedex 13, Francee-mail: [email protected]

M. ArmanoSRE-OD ESAC, European Space Agency, Camino bajo del Castillo s/n, Urbanizacion Villafrancadel Castillo, Villanueva de la Canada, 28692 Madrid, Spain

H. Audley · I. Diepholz · M. Hewitson · N. KorsakovaAlbert-Einstein-Institut, Max-Planck-Institut fuer Gravitationsphysik und Universitat Hannover,Callinstr. 38, 30167 Hannover, Germany

G. Congedo · M. Hueller · S. VitaleUniversity of Trento and INFN, via Sommarive 14, 38123 Povo (Trento), Italy

F. Gibert · N. Karnesis · M. NofrariasFacultat de Ciencies, Institut de Ciencies de l’Espai, (CSIC-IEEC), Campus UAB, Torre C-5, 08193Bellaterra, Spain

Present Address:L. FerraioliInstitut fur Geophysik, ETH Zurich, Sonneggstrasse 5, 8092 Zurich, Switzerland

Page 3: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Exp Astron (2015) 39:95–117DOI 10.1007/s10686-015-9445-2

ORIGINAL ARTICLE

Performance analysis of GPU-accelerated filter-basedsource finding for HI spectral line image data

Stefan Westerlund · Christopher Harris

Received: 13 June 2014 / Accepted: 2 February 2015 / Published online: 27 February 2015© Springer Science+Business Media Dordrecht 2015

Abstract Searching for sources of electromagnetic emission in spectral-line radioastronomy interferometric data is a computationally intensive process. Parallel pro-gramming techniques and High Performance Computing hardware may be used toimprove the computational performance of a source finding program. However, it isdesirable to further reduce the processing time of source finding in order to decreasethe computational resources required for the task. GPU acceleration is a methodthat may achieve significant increases in performance for some source finding algo-rithms, particularly for filtering image data. This work considers the application ofGPU acceleration to the task of source finding and the techniques used to achievethe best performance, such as memory management. We also examine the changesin performance, where the algorithms that were GPU accelerated achieved a speedupof around 3.2 times the 12 core per node CPU-only performance, while the programas a whole experienced a speedup of 2.0 times.

Keywords Source finding · Radio astronomy · High performance computing ·Parallel processing · GPU computing

S. Westerlund (!)ICRAR/The University of Western Australia, M468 35 Stirling Highway,Crawley, WA 6009, Australiae-mail: [email protected]

C. HarrisiVEC@UWA/The University of Western Australia, M024 35 Stirling Highway,Crawley, WA 6009, Australiae-mail: [email protected]

Page 4: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Matched Filtering and Detection •  MF gives the optimal S/N and, if required, the best point estimates for

localization. •  Localization tasks (timing, Doppler shifts) are less concerned with the

actual amplitude. Different cases include: –  Template known: find translation parameter via least-squares or by cross

correlation, paying attention to any needed interpolation. The highest precision is far easier in the Fourier domain

–  Template not known: can use a family of templates (a template bank ordered by one or more parameters) and find maximum S/N over the set.

•  Detection is more dependent on S/N than on localization precision. –  Detection criteria:

•  S/N > mσ where m corresponds to some false-alarm probability if the underlying PDF is known or can be assumed to be Gaussian.

•  Want NtrialsPfa < 1 in an analysis with Ntrials statistical trials. –  Issue: If the template is not known, a large template bank increases Ntrials

significantly. –  E.g. detection of chirped waveforms in LIGO data (NS-NS, NS-BH binaries):

templates ~known from GR, but depend on unknown masses; the occurrence time is also not known so the search parameters include: t0 = event time and the chirp mass; also the amplitude

–  One can calculate the probability of detection Pdet for different detection criteria used on the same data set. Similarly for the null case (no signal) one can calculate the false-alarm probability for each detection scheme. A plot of Pdet vs Pfa is an ROC curve.

Page 5: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Matched Filtering

Template matched

Pulse shape Template Correlation function

Convolution

TOA

Template too narrow

Template too wide

Page 6: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Timing Error from Radiometer Noise

Interstellar pulse broadening, when large, increases ΔtS/N in two ways:

–  SNR decreases by a factor W / [W2+τd2]1/2 –  W increases to [W2+τd2]1/2

à Large errors for high DM pulsars and low-frequency observations

rms TOA error from template fitting with additive noise:

Gaussian shaped pulse:

N6 = N / 106

Low-DM pulsars: DISS (and RISS) will modulate SNR

Page 7: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Timing Error from Pulse-Phase Jitter

•  fϕ = PDF of phase variation •  a(ϕ) = individual pulse shape •  Ni = number of independent pulses summed •  mI = intensity modulation index ≈ 1 •  fJ = fraction jitter parameter = ϕrms / W ≈ 1

Gaussian shaped pulse:

N6 = Ni / 106

Page 8: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

arX

iv:0

709.

3727

v1 [

astro

-ph]

24

Sep

2007

**FULL TITLE**ASP Conference Series, Vol. **VOLUME**, **YEAR OF PUBLICATION****NAMES OF EDITORS**

Transdet: a matched-filter based algorithm for transitdetection – application to simulated COROT light curves

P. Borde†

Harvard-Smithsonian Center for Astrophysics, Cambridge, MA 02138,USA

F. Fressin

Gemini, Observatoire de la Cote d’Azur, 06304 Nice, France

M. Ollivier and A. Leger

IAS, Universite Paris-Sud, 91405 Orsay, France

D. Rouan

LESIA, Observatoire de Paris, 92195 Meudon, France

Abstract. We present a matched-filter based algorithm for transit detectionand its application to simulated COROT light curves. This algorithm stemsfrom the work by Borde, Rouan & Leger (2003, A&A 405, 1137). We describethe di!erent steps we intend to take to discriminate between planets and stellarcompanions using the three photometric bands provided by COROT. These stepsinclude the search for secondary transits, the search for ellipsoidal variability,and the study of transit chromaticity. We also discuss the performance of thisapproach in the context of blind tests organized inside the COROT exoplanetconsortium.

1. Introduction

COROT (Baglin 2003), scheduled for launch by the end of 2006, will be thefirst space mission with the capability to detect extrasolar planets with sizesdown to a couple of Earth radii (Rouan et al. 2000). It will observe a total of60,000 stars in 5 runs of 150 days each. In a previous paper, we worked out atheoretical estimate of the planet detection capability of COROT (Borde et al.2003). In this paper, we first describe our transit detection algorithm, dubbedTransdet, then we report its performance on simulated light curves (LCs), andfinally, we discuss the tools we are developing to discriminate between genuineexoplanets and astrophysical false positives (binary stars).

1Michelson Postdoctoral Fellow. Current address: Michelson Science Center, Caltech, Pasadena,CA 91125, USA.

1

2 Borde et al.

2. Key ideas about Transdet

At low signal-to-noise (SNR), a transit signal can be approximated by a rectan-gular signal oscillating between a high level H (outside the transit) and a lowlevel L (during the transit). The amplitude of this signal depends on the ratioof the planet surface to that of the star: H !L = !H, where ! " (Rp/R!)2. Thetime spent at level L, or transit duration " , is much shorter than the transitperiod T . We note t0 the date of the beginning of the first transit falling insidethe observation window (150 days for COROT).

In order to detect the transit signal in a noisy LC, we adopt the matched-filter approach that consists in correlating the LC with rectangular signals ofthe type described above. We call test-signals these rectangular signals. Each ofthem is characterized by a parameter triplet (T , " , t0). The correlation reaches apeak when (T , " , t0) = (T, ", t0). Practically, LCs must be first high-pass filteredto remove irrelevant low-frequency variations (stellar fluctuations) that wouldbias the correlation, and we take the negative of the LCs so that transits wouldproduce positive correlation peaks.

For, this method to work, the three-dimensional space (T , " , t0) must beexplored with a su!cient resolution in the given range accessible to COROT(1 # T # 75 days, 1 # " # 10 hrs, 0 # t0 # T ). The choice of this resolu-tion results from a trade-o" between computing time and detection e!ciency.Typically, we use steps (#T , #" , #t0) small enough so that the maximum corre-lation signal would be no less than 75 % of its theoretical maximum value. Thisrequirement translates into the computation of 14 $ 106 correlations per LC, atask completed in 90 s with a fully IDL-coded algorithm on a Pentium M at1.6 Ghz. However, preliminary tests indicate that an optimized IDL–C hybridcode could cut down the computing time by a factor 7–8, which would makepossible a higher detection e!ciency.

3. Detection threshold

3.1. Theoretical detection threshold

Practically, we use as our primary detection metric not the correlation itself,but the SNR on the correlation defined as

SNR1 " (k")1/2 !H/$, (1)

where $ is the standard deviation of the noise a"ecting the high-pass filteredLC, and k is the number of transits in the observation window.

A theoretical detection threshold can be simply set assuming: (1) a totalnumber of LCs, (2) a false detection rate, and (3) Gaussian white noise (seeBorde 2003, p. 45). Requiring less than 1 false detection, we obtain SNR1 > 6.5for 1,000 LCs, and SNR1 > 7.0 for 60,000 LCs.

3.2. Empirical detection threshold

In order to train the co-investigators of the COROT exoplanet consortium andcompare the performances of existing transit detection algorithms, a first tran-sit detection blind test (BT1) was organized inside the consortium in 2004

Page 9: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Plasma  Propaga+on  Effects  •   Radio  propaga+on  phenomena  (many)  •   Dedispersion  methods  •   Kolmogorov-­‐like  microstructure  in  the  ionized  ISM    •   Astronomical  impacts:  

•  Pulsar  surveys  •  Precision  pulsar  +ming  to  detect  gravita+onal  waves  •  Pulsars  orbi+ng  Sgr  A*  (GR  tests,  dark  maJer)  •  ScaJering  in  the  intergalac+c  medium  

Page 10: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Matched Filtering and Deconvolution of the Interstellar Medium

•  Dispersion in the ISM – Frequency dependent index of refraction à

change in the phase of EM waves – δ(phase) à time delay – The phase itself is frequency dependent so the

time delay is too

•  Dispersion acts as a linear filter •  It can be removed exactly once a single

parameter is known (the ‘dispersion measure’ DM)

Page 11: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Nanoshots  from  the  Crab  Pulsar  

Requires  coherent  dedispersion  

Teff ~ 1042 K

Page 12: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Image courtesy of NRAO/AUI and Joeri van Leeuwen (UC Berkeley) / ESO / AURA

~ 6000 years

pulse is actually spread out over time (ms-sec) by interstellar dispersion

Page 13: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Dispersion  

High frequencies arrive earlier than low frequencies DM is unique to each pulsar DM changes with time due to motion of the pulsar and turbulence in the interstellar medium

tDM(ν) = 4.15×DM msν2

GHz

Page 14: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Star+ng  Point:  Index  of  Refrac+on  •  Dispersion  rela+on  in  free  space:  

•  ω  =  kc  •  phase  velocity  =  group  velocity  =  ω/k  =  c    

•  Cold  plasma  (collisions  not  important):  –  No  magne+c  field:  

•  phase  velocity  =  c/nr(ω)  >  c      (speed  of  phase  fronts)  •  group  velocity  =  dω/dk  =  cnr(ω)  <  c  (speed  of  a  wave  packet)  

– With  magne+c  field:  birefringent  (leW,  right)  

–  cyclotron  frequency  

ω = kc/nr(ω) nr(ω) =�1− ω2

p/ω2�1/2

ωp = plasma frequency = 4πnee2

m

nl,r ≈ 1− ω2p/2ω

2 ∓ ω2pωB�/2ω

3

ωB� = eB cos θmec

2

Page 15: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Propaga+on  through  the  interstellar  plasma  

birefringence

Page 16: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Dedispersion  Two  methods:  

Coherent:    •  operates  on  the  voltage  propor+onal  to  the  electric  field  accepted  by  the  antenna,  feed  and  receiver  

•  computa+onally  intensive  because  it  requires  sampling  at  the  rate  of  the  total  bandwidth  

•  “exact”  

Post-­‐detec+on:    •  operates  on  intensity  =  |voltage|2  •  computa+onally  less  demanding  •  an  approxima+on  

Page 17: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Basic  data  unit  =  a  dynamic  spectrum  

time

Fr

eque

ncy

106 – 108 samples x 64 µs

64 to

202

4 ch

anne

ls

Fast-dump spectrometers:

•  Analog filter banks

•  Correlators

•  FFT (hardware)

•  FFT (software)

•  Polyphase filter banks

P

Page 18: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Post-­‐detec+on  Dedispersion:  Sum  intensity  over  frequency  aWer  correc+ng  for  dispersion  delay  

A consequence of the uncertainty principle for Fourier transforms: ΔνΔt ~ 1

Page 19: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Dispersed Pulse Coherently dedispersed pulse

Δt = 8.3 µs DM ν-3 Δν

Page 20: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Temporal  Coherence  •  Temporal  coherence:  describes  the  rela+onship  of  different  Fourier  components  that  make  up  a  signal  vs.  +me:  

Coherent:  a  delta  func+on!   Incoherent:  white  noise  

Page 21: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Coherent  Dedispersion  pioneered  by  Tim  Hankins  (1971)  

Dispersion  delays  in  the  +me  domain  represent  a  phase  perturba+on  of  the  electric  field  in  the  Fourier  domain:    

Coherent  dedispersion  involves  mul+plica+on  of  Fourier  amplitudes  by  the  inverse  func+on,    For  the  non-­‐uniform  ISM,  we  have    which  is  known  to  high  precision  for  known  pulsars.  The  algorithm  consists  of    Applica+on  requires  very  fast  sampling  to  achieve  usable  bandwidths.  

An example of matched filtering

One parameter: DM

Page 22: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Micropulses  coherently  dedispersed    (Hankins  1971)  

Page 23: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Backend  stage  of  mixing:  quadrature  baseband  mixing  

Real part

Imaginary part

�(t) = I(t) + iQ(t) = complex baseband signal

voltage at intermediate frequency in heterodyned system:v(t) = Re

��(t)e2πif0t

�= real

Page 24: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Coherent  Dedispersion  pioneered  by  Tim  Hankins  (1971)  

Coherent  dedispersion  works  by  explicit  deconvolu+on:  

Comments and Caveats:

•  Software implementation with FFTs to accomplish deconvolution (Hankins 1971)

•  Hardware implementations: real-time FIR filters (e.g. Backer et al. 1990s-present)

•  Resulting time resolution: 1 / (total bandwidth)

•  Requires sampling at Nyquist rate of 2 samples × bandwidth

⇒ Computationally demanding

•  Actual time resolution often determined by interstellar scattering (multipath)

•  Most useful for low-DM pulsars and/or high-frequency observations

Page 25: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Matched Filtering Methods •  What is it generically? •  Formal derivation (two domains) •  Examples •  Issues related to sampling and

interpolation

Page 26: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Matched Filtering in Astronomy •  Point source detection in surveys •  Asteroid detection •  Gravitational wave detection •  Galaxy detection •  Galaxy cluster finding •  Match filtering approach for signal acquisition in radio-

pulsar navigation •  Large scale structure in the universe •  Cluster detection in databases •  Radio images •  Precision localization (time, frequency, space)

•  Pulsar timing •  Radial velocities and exoplanets •  Astrometry

Page 27: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

A6523

Localization in Space, Time, or FrequencyMatched FilteringMatched filtering is an optimal method for detecting a signal of known shape in the presenceof additive noise. It also plays a role in estimation of signal parameters.We will use a time-domain signal as a prototype. Consider the model where A is deterministicand known and n is additive, zero-mean WSS noise with arbitrary spectrum:

x(t) = a0A(t) + n(t).

We want a filter that whose output maximizes the signal-to-noise ratio of the output.Matched filtering is different from Wiener filtering, which yields an estimate for a signal thathas a minimum least-squares error from the true signal.Signals in other domains can be treated identically:

Frequency domain: x(ν) = a0A(ν) + n(ν).Image domain: I(θ) = a0A(θ) + n(θ).

Spatial domain: I(x) = a0A(x) + n(x).Arbitrary domain: U(y) = a0A(y) + n(y) where (e.g.) y) = (x, ν, t).

1

Page 28: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Comments and variations on the theme:

1. In simple cases, all we need to know is the time-domain shape (e.g. a pulse shape) to findthe optimal filter for estimating arrival times; the spectral shape for Doppler velocities; theimage shape to do astrometry; the object shape to localize spatially.

2. A(t) could be stochastic but if we know its shape (somehow), the matched filter solution isthe same as for the deterministic case.A more complicated situation might be where we use one kind of tracer to detect sourcesand use the results as a matched filter to detect the sources in another tracer.Example: Use locations of galaxies from a continuum survey as a matched filter for detect-ing a spectral line in the same galaxies by stacking.

3. Any departure of actual data from the assumed model will have consequences:

(a) the filter is no longer optimal(b) detection probabilities are likely to be smaller(c) estimated parameters will have larger errors (random and systematic)

4. Types of departures include:

(a) A(t) not known.(b) a0 not constant, e.g. a0(t).(c) a0 could itself be stochastic due to multiplicative effects such as atmospheric seeing,

plasmas, gain variations in instrumentation.

2

Page 29: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Matched filter:We want the linear filter h(t) that maximizes the SNR of the output and thus maximizes thedetection probability. Usually we would convolve the filter with the data but it is convenient todefine h so that we cross correlate it with the data instead. [The equivalent filter that would beconvolved with the data is h(−t).]Assume initially that h(t) is aligned with the signal A(t) so that we only need consider the SNRof the output when h and A are aligned to give the maximum correlation. More generally wewould have a signal

x(t) = a0A(t− t0) + n(t)

where the event time t0 is not known and we want to estimate it.Thus while we would generally consider the full correlation function,

y(τ ) = h(t) ∗ x(t) =�

dt x(t)h(t + τ ) (1)

(where ‘*’ means correlation), here we initially consider the special time τ = 0 where, byconstruction, we say that y is maximized if t0 = 0.In the following we use the noise statistics

�n(t)� = 0 �n(t)n(t �� = Rn(t− t �).

3

Page 30: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

The mean and variance of the output correlation y are

�y� = a0

�dtA(t)h(t)

σ2y =

��dtdt� �n(t)n(t�)�h(t)h(t�) =

��dtdt�Rn(t− t�)h(t)h(t�).

Define the SNR asS =

�y�σy

=a0

�dtA(t)h(t)

σy.

We want the smoothing function h(t) that maximizes S (or S2). We can write the variationalproblem as the change in S2 due to a change in the filter function, δh

δ(S2) = 0 = δ

��y�2

σy2

�=

σy2δ(�y�2)− �y�2δ(σy2)σy4

. (2)

This implies

2σy2δ(�y�)− �y�2δ(σy2) = 2σy

2�y�a0�

dtA(t)δh(t)−��

dtdt�Rn(t− t�)δ [h(t)h(t�)]

= 2σy2�y�a0

�dtA(t)δh(t)− 2

��dtdt�Rn(t− t�)δh(t)

= 0

4

Page 31: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Collecting terms,�

dt δh(t)

�σ2ya0A(t)− �y�

�dt�Rn(t− t�)h(t�)

�= 0.

If the solution h(t) yields a maximum SNR then the integrand factor that multiplies δh(t) iszero for any δh(t), so

�dt�Rn(t− t�)h(t�) =

a0σ2y

�y� A(t). (3)

To use this equation we have to know (or assume) the shape of the autocorrelation function forthe noise as well as the signal shape A(t).

5

Page 32: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

White Noise:This case, the simplest, has Rn(t − t�) = σ2

nWnδ(t − t�) where Wn has time units to keep theoverall units correct. This gives

�y� = a0

�dtA(t)h(t)

σy2 = σ2

nWn

�dt h2(t)

and the maximum S/N is given by the expression,

h(t) = A(t)

�dt h2(t)�

dtA(t)h(t).

While appearing to be an implicit solution for h(t), the integral quantities are just a proportion-ality constant.A particular solution is h(t) = A(t), as is anything proportional to A(t). It is now obvious whythis approach is called matched filtering.Interpretation:

• The optimal smoothing function is the shape of the signal when the noise is white.

• When the signal has an unknown time shift t0, the entire cross-correlation function is cal-culated and its maximum found and assessed as statistically significant or not.

6

Page 33: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Demonstration that the MF solution gives the best signal to noise ratio: The S/N for anarbitrary filter is

S =

�a0σn

� �dtA(t)h(t)

���dtdt� ρn(t− t�)h(t)h(t�)

�1/2 .

Let S0 be the S/N for the MF solution.Let the filter be h(t) = A(t)+ b(t) where b(t) is an arbitrary function but we will consider b andA to be orthogonal functions, i.e.

�dtA(t)b(t) = 0. (If they weren’t to begin with we could

use the Gram-Schmidt orthogonalization procedure to define them as such.)1

Evaluate S and calculate the ratio S/S0, which is generally

S

S0=

�dtA(A + b)

��dtA2

�dt (A + b)2

�1/2

→orthogonal

�dtA2

��dtA2

�dt (A2 + b2)

�1/2

=

� �dtA2

�dt (A2 + b2)

�1/2

≤ 1

1For two functions u, v, we can define two orthogonal functions as p = u and q = v − u(u, v)/(u, u) =v − u

�dt uv/

�dt u2.

7

Page 34: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Low-pass Noise: Non-white noise with a short correlation time (relative to the width of thesignal) can be treated in the same way.Let Rn(t− t�) = σ2

nρn(t− t�) where ρn(0) = 1 and its width � width of A.Returning to our original expression for S, we have

S =

�a0σn

� �dtA(t)h(t)

���dtdt� ρn(t− t�)h(t)h(t�)

�1/2

≈�a0σn

� �dtA(t)h(t)

��dτ ρn(τ )

�dt h2(t)

�1/2

≈�a0σn

� �dtA(t)h(t)

�Wn

�dt h2(t)

�1/2 ,

where the characteristic time scale of the noise is

Wn =

�dτ ρn(τ ).

We can now apply the Cauchy-Schwarz inequality for the numerator��

dtA(t)h(t)

�2≤

�dtA2(t)

�dt h2(t)

8

Page 35: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

which means that

S =

�a0σn

� �dtA(t)h(t)

�Wn

�dt h2(t)]

�1/2

≤�a0σn

� ��dtA2(t)

�dt h2(t)

�1/2�Wn

�dt h2(t)

�1/2

≤�a0σn

� ��dtA2(t)

�1/2√Wn

We have equality when h(t) = A(t), i.e. when the filter matches the signal.

——————————————————————————————————————The Cauchy-Schwartz inequality generalizes the statement that the sum of two vectors has a length that is less thanthe sum of the magnitudes of the two vectors:

|x+ y|2 ≤ (x+ y)2

x2 + y2 + 2x · y ≤ x2 + y2 + 2x y

x · y ≤ x y

This can be written as

j

xjyj ≤��

j

x2j�

j

y2j

�1/2

or in integral form�

dt x(t) y(t) ≤��

dt x2(t)

�dt y2(t)

�1/2

9

Page 36: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Simplification and InterpretationFor the matched filter case (equality) we can consider A(t) to be dimensionless with A(0) = 1.Then the integral �

dtA2(t) ≡ WA

defines the characteristic time scale WA. We can also define the signal to noise of the timeseries as the signal maximum a0 divided by the rms noise σn,

St ≡�a0σn

and then rewrite the SNR of the correlation function as

S = St

�WA

Wn

�1/2

= St�

Neff.

• As we have seen in other contexts, the ratio WA/Wn represents the number of independentnoise fluctuations that have been averaged to get the correlation amplitude. We then seethat the correlation function enhances the SNR in the time series by a factor

√Neff where

Neff is the effective number of independent fluctuations.

• Note that this simple result holds for WSS noise with a correlation function that is narrowcompared to the length of a time series.

• For red noise with a steep power-law spectrum, longer data sets have larger variance be-cause more of the power spectrum is sampled. For these processes, Neff ∼ 1.

10

Page 37: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Arbitrary WSS Noise: Generally, the solution is gotten by Fourier transforming Equation 3.Let the noise spectrum be Rn(τ ) ⇐⇒ Sn(f ) and define the FTs of A and h as A and h. Using3 (repeated here),

�dt�Rn(t− t�)h(t�) =

a0σ2y

�y� A(t),

we have

FT

��dt�Rn(t− t�)h(t�)

�= Sn(f )h(f ),

where

σ2y =

��dtdt�Rn(t− t�)h(t)h(t�) =

�df Sn(f )|h(f )|2

and

�y� = a0

�dtA(t)h(t) = a0

�df A(f )h∗(f ).

so the expression for h becomes

h(f ) =A(f )

Sn(f )

a0σ2y

�y� =A(f )

Sn(f )

�df Sn(f )|h(f )|2�df A(f )h∗(f )

.

and the solution is

h(f ) = constant × A(f )

Sn(f ).

11

Page 38: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Comments on the general matched filtering solution• For white noise, Sn(f ) = constant and we get our previous result. More generally, the

matched filter favors frequencies where the ratio of signal to noise is larger.

• Equivalently the filter reduces the noise in the output y at frequencies where the noise

spectrum is large and increases it where it is small.

• The amplitude of the signal a0 does not appear in the solution. It can be considered to be

part of the proportionality constant (normalization).

Also the actual noise variance cancels in the solution for h(f ).

Thus the amplitudes of both the signal and the noise factor out of the solution.

• The matched filter h(f ) has an inverse Fourier transform h(t) that is the optimal smoothingfunction. It maximizes the S/N of the output.

– The OSF depends on both the signal and the noise.

– For red noise with a steep power spectrum, the low frequencies are de-weighted in the

filtering compared to the high frequencies.

– Usage of Fourier methods is still susceptible to leakage effects so prewhitening meth-

ods need to be used.

12

Page 39: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Practical Applications in Signal Detection:Example: we know the shape of the function A(t) and the autocorrelation function of the noise

(or its spectrum). The optimal detection scheme is to construct the filter using this information

and investigate the cross correlation function Eq. 1.

Output amplitudes can be tested against a threshold yT that is some multiple of σy. Over an

ensemble one can then define the detection probability Pd and the false-alarm probability Pfa.

By changing the threshold both Pd and Pfa will change. Ideally one would like Pd = 1 with Pfa

but in reality there is a tradeoff between the two.

Periodic Signals: The matched filter for a periodic signal is simply a periodic train of the pulse

shape. When correlated with the measurements, the output is the same thing as “folding” the

data with the underlying period.

Spectra lines: Change t → λ or ν and the results are the same.

Source detection in images: Change A(t) → I(θ). This might be the PSF if point sources are

of interest or it might be a specific kind of source (e.g. spiral galaxies).

Multidimensional: A(t) → A(x) where x is N -dimensional.

13

Page 40: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Figure 1: Matched filtering of Gaussian pulse with itself as a template. One realization of the template and pulse are shown while tenrealizations of the CCF are shown. The SNR of the pulse is 5 (peak to rms noise). The pulse width (HW @ 1/e) is 25.3 samples.

14

Page 41: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Figure 2: Matched filtering of a narrower Gaussian pulse. One realization of the template and pulse are shown while ten realizations ofthe CCF are shown. The SNR of the pulse is 5 (peak to rms noise). The pulse width (HW @ 1/e) is 10.3 samples.

15

Page 42: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Figure 3: Matched filtering of a narrow Gaussian pulse with a broader template. One realization of the template and pulse are shownwhile ten realizations of the CCF are shown. The SNR of the pulse is 5 (peak to rms noise). The pulse width (HW @ 1/e) is 10.3 sampleswhile the template is 45.3 samples wide.

16

Page 43: Lecture 18 Matched filtering Localizationhosting.astro.cornell.edu/~cordes/A6523/A6523_2015_Lecture18.pdf · 3.1. Theoretical detection threshold Practically, we use as our primary

Figure 4: Matched filtering of Gaussian pulse with a narrower template. The SNR of the pulse is 5 (peak to rms noise). The pulse width

(HW @ 1/e) is 10.3 samples while the template is 4.3 samples wide.

17