Observational procedures and data reduction Lecture 4: Data reduction process

Observational procedures and data reduction

Lecture 4: Data reduction process

XVII Canary Islands Winter School of Astrophysics: ‘3D Spectroscopy’

Tenerife, Nov-Dec 2005

James E.H. Turner

Gemini Observatory

Data reduction process

Overview

● The last lecture gave an overview of the different data reduction stages

and what is involved at each step

● This lecture briefly discusses the reduction as process:

– Error and data quality propagation

– File formats

– A couple of example reduction sequences


Error propagation

● At the end of the reduction, it’s important to have a good estimate of

the errors in data values

– For faint sources, we can estimate the statistical significance of detection

– Want a measure of the reliability of ages, metallicities etc. derived from

line strength indices (Cardiel et al., 1997) or the intrinsic random errors

in velocity measurement

– etc.

● The raw data have quite well-defined errors due to photon statistics

and read noise

● After numerous processing stages, it is difficult at best (impossible at

worst) to estimate errors directly from the data values


Error propagation

● Solution

– Keep track of the errors in data values throughout the processing

– For each detector pixel, store an error value in a separate error image,

alongside the main science data array

– During each processing step, process the error array in parallel with the

science image, to reflect how the errors have changed

● For example, when adding two science images, add the corresponding error

images in quadrature


Error propagation

● Poisson statistics

– A process where discrete values vary statistically around a well-defined

mean, eg. counting photons, is described by a Poisson distribution:

with a mean (expected number of photons) of n=

– The standard deviation from the mean is simply =√● So when counting photons (really electrons), the statistical error is the

square root of the expected number of photons (electrons)

● In practice, estimate the error as the square root of the measured number of

electrons, since that is what we know

– For large , the Poisson distribution is a Gaussian disribution with =√


Error propagation

● Random sources of measurement error (noise) in the data

– Detector read noise

– Poisson noise from the science target and sky

– Poisson noise from detector dark current

● Also have systematic errors introduced during processing

– Eg. due to inaccuracies in flat fielding

– Usually present at the level of a few percent; difficult to reduce to zero

– These effects can be more difficult to account for, but typically the

statistical errors are dominant

● If we get enough signal-to-noise with an IFU to worry about errors of a few

percent, we’re usually going to be pretty happy!

From the pixel values}


Error propagation

● Detectors don’t usually report exactly 1 count per stored electron

– Poisson statistics apply to electrons, rather than detector counts (ADUs)

– The detector ‘gain’ equals the number of electrons per measured count

● Really an inverse gain, but that’s what it’s called!

● Controls how much light can be measured before saturating

● Typical gains are a few e-/ADU (CCDs), up to >10 e-/ADU (NIR)

– To estimate Poisson noise in electrons, multiply the counts by the gain

and take the square root

● When adding values, their errors add in quadrature (sum of squares)

– Therefore when propagating errors, we use the error array to store

variance (2) values, rather than the actual noise ()


Error propagation

● Error propagation procedure

– Start by creating a variance array containing the square of the detector

read noise, read2, which affects every pixel independently of the counts

● Read noise is counted in electrons, so if we are storing science data values

as detector counts, the variance should be (read/gain)2

● Alternatively, multiply the science array through by the gain to begin with

– Estimate the statistical variance in the measured counts, n, for each pixel

and add to the array of read noise values

● If working in electrons the statistical variance to add is just n gain

● In ADUs, the number is n / gain


Error propagation

– At each subsequent reduction step, manipulate the variance array

according to the operation being performed on the science array:

● When adding or subtracting images, their errors add in quadrature

– Simply add the variance arrays for each image

● When scaling an image (multiplying or dividing by a number), the error is

scaled accordingly

– Multiply the variance by the square of the scaling factor

● When multiplying/dividing images, their fractional errors add in quadrature

– Divide each input variance array by the square of the corresponding

image, add the results together and multiply by the square of the final

science image to get the final variance image


Error propagation

● For more complicated operations on the science data, ie. some arbitrary

function, f(n)

– Take the first derivative of f(n), to estimate how the output values vary

with small changes in the input values

– Multiply the variance by |df/dn| at the appropriate value of n

– At the end of the data reduction process, can take the square root of the

variance array to get to the final noise values

● Resampling

– In the raw data, each pixel has an independent statistical error

– If resampling causes smoothing, the errors in different pixels may

become correlated


Error propagation

– One could attempt to propagate a separate covariance matrix

Covariance = expected value of the product of deviations from the means

– Usually software doesn’t track covariance, but it’s important to be aware

that the variance numbers may not be exactly correct after resampling

● Eg. linear interpolation at the midpoint between 2 samples is an average

– The error on the result is therefore reduced by √2

– The number of pixels hasn’t changed, but each pixel has higher S/N!

– However, summing 2 of the resampled pixels does not reduce the error

by a further factor of √2 because the errors are no longer independent


Data quality

● As well as storing variance values alongside each science image, it is

useful to store data quality information

– Use an integer valued array to flag which pixels are good, bad, noisy etc.

in the main science array

● Each bit of the integer represents yes/no for a particular defect, allowing

more than one problem to be recorded for a particular pixel

● Different pixel values indicate, for example:

– Good pixel

– Cosmic ray

– Saturated pixel

– Hot pixel (etc…)

● The convention for the values depends on the processing software

– Useful for masking out values appropriately at each reduction stage


File storage format

● Data are typically stored in FITS files

– Flexible Image Transport System, overseen by a NASA technical panel

● Standard definition document available at http://fits.gsfc.nasa.gov/

– Each single FITS file can contain

● One or more N-dimensional image arrays

● ASCII header information, using keyword = value pairs

– Header keywords can have values of different data types

– Eg. OBJECT = ‘NGC1068’ or EXPTIME = 120

● One or more binary tables

– Using named columns (eg. XCOORD, YCOORD) and mixed data

types, rather than a simple array of numbers

● Other, less common formats of data


File storage format

– Within a FITS file, data can be divided into separate extensions

● The primary header contains keywords relevant to the whole file

– Eg. object name, telescope pointing, airmass, filter, central wavelength

● Each image, binary table etc. has its own numbered/named extension

– Contains both the data and any extra header keywords that are only

relevant to that dataset

– Example FITS file structure during processing: EXT# EXTTYPE EXTNAME EXTVE DIMENS BITPI INH OBJECT

0 trnS20040409S0166_s 16 Galaxy 1 BINTABLE MDF 32x21 8 2 IMAGE SCI 1 32x1022 -32 F Galaxy3 IMAGE VAR 1 32x1022 -32 F Variance4 IMAGE DQ 1 32x1022 32 F DQ5 IMAGE SCI 2 32x1022 -32 F Galaxy 6 IMAGE VAR 2 32x1022 -32 F Variance7 IMAGE DQ 2 32x1022 32 F DQ

[ … etc … ]


Reduced data formats

● Row-stacked spectra (etc.)

– One option is just to work with extracted spectra in 2D

– Limited to spectral analysis (eg. velocity measurement, not imaging)

– Still have to create a 2D spatial map from the results afterwards

● Datacube

– A 3D image array, with two spatial axes and one wavelength axis

– Easy to read and manipulate in IRAF, IDL, Python etc.

– Usually requires resampling the processed IFU data onto a 3D grid

● Except for IFUs that have a square lens grid to begin with

– If we want to oversample after interpolating, to produce ‘smoother’

images (good for visualization etc), the file sizes can become quite large



● Euro3D format

– Both image data (1D spectra) and information describing the spectra are

stored in a binary table

– Native format for the ‘E3D’ visualization tool (can also read datacubes)

– Closer to the raw data than a cube—attempts to avoid resampling until it

is necessary, during visualization or analysis

– Minimal file size, like row-stacked spectra, since there is no interpolation

until it is needed

– Requires having special software/libraries to work with the format




Example reduction sequence—optical

● GMOS IFU (optical fibre) data, using the Gemini IRAF package






Example reduction sequence—infrared

● GNIRS IFU (image slicer) data, using the Gemini IRAF package


Example reduction sequence—infrared


Summary

● The data reduction process is typically based on FITS files, with one

or more image extensions

● Propagating error and data quality arrays through the process is helpful

for understanding how accurate the results are

● The final data format for analysis depends on the application,

software, user preference etc.

– Euro3D format, datacubes or in some cases just row-stacked spectra

● The example reduction sequences for optical fibre data and NIR image

slicer data give an idea of how the steps are ordered for science data

Documents

Observational procedures and data reduction Lecture 4: Data reduction process