13
Data Handling Methods

Data Handling Methods - University of BirminghamData Handling Methods A Peak picking Fluorescence indices Regional Integration Data reduction Key considerations: speed (on-line data

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Handling Methods - University of BirminghamData Handling Methods A Peak picking Fluorescence indices Regional Integration Data reduction Key considerations: speed (on-line data

Data Handling Methods

Page 2: Data Handling Methods - University of BirminghamData Handling Methods A Peak picking Fluorescence indices Regional Integration Data reduction Key considerations: speed (on-line data

Data Handling Methods A

Peak picking

Fluorescence indices

Regional Integration

Data reduction

Key considerations:

speed (on-line data collection vs manual post processing).

Simplicity (simple linear regression against water quality vs detailed post processing to obtain structural information)

Initial dataset characteristics (EEM vs line scans).

300 350 400 450 5000

200

400

600

800

1000

Wavelength (nm)

Inte

nsity (

a.u

.)

Wa

ve

le

ng

th

(

nm

)

W a v e l e n g t h ( n m )

2 2 5 . 0 0

2 5 0 . 0 0

2 7 5 . 0 0

3 0 0 . 0 0

3 2 5 . 0 0

3 5 0 . 0 0

3 7 5 . 0 0

4 0 0 . 0 0

3 0 0 . 0 0 3 2 5 . 0 0 3 5 0 . 0 0 3 7 5 . 0 0 4 0 0 . 0 0 4 2 5 . 0 0 4 5 0 . 0 0 4 7 5 . 0 0 5 0 0 . 0 0

9 6 3 . 0 9

8 8 9 . 2 7

8 1 5 . 4 5

7 4 1 . 6 3

6 6 7 . 8 2

5 9 4 . 0 0

5 2 0 . 1 8

4 4 6 . 3 6

3 7 2 . 5 4

2 9 8 . 7 2

2 2 4 . 9 0

1 5 1 . 0 8

7 7 . 2 7

3 . 4 5

Page 3: Data Handling Methods - University of BirminghamData Handling Methods A Peak picking Fluorescence indices Regional Integration Data reduction Key considerations: speed (on-line data

Peak Picking

Simplest data extraction method: finding the maximum intensity (z) in a given region (x,y).

Intensity vs concentration relationships can be developed by linear regression.

Wavelength variations may also provide information about organic matter character.

300 350 400 450 5000

200

400

600

800

1000

Wavelength (nm)In

tensity (

a.u

.)

Wa

ve

le

ng

th

(

nm

)

W a v e l e n g t h ( n m )

2 2 5 . 0 0

2 5 0 . 0 0

2 7 5 . 0 0

3 0 0 . 0 0

3 2 5 . 0 0

3 5 0 . 0 0

3 7 5 . 0 0

4 0 0 . 0 0

3 0 0 . 0 0 3 2 5 . 0 0 3 5 0 . 0 0 3 7 5 . 0 0 4 0 0 . 0 0 4 2 5 . 0 0 4 5 0 . 0 0 4 7 5 . 0 0 5 0 0 . 0 0

9 6 3 . 0 9

8 8 9 . 2 7

8 1 5 . 4 5

7 4 1 . 6 3

6 6 7 . 8 2

5 9 4 . 0 0

5 2 0 . 1 8

4 4 6 . 3 6

3 7 2 . 5 4

2 9 8 . 7 2

2 2 4 . 9 0

1 5 1 . 0 8

7 7 . 2 7

3 . 4 5

Page 4: Data Handling Methods - University of BirminghamData Handling Methods A Peak picking Fluorescence indices Regional Integration Data reduction Key considerations: speed (on-line data

Bieroza et al.2009, STOTEN

Hudson et al.,2008, STOTEN

Baker et al., 2008,Chemosphere

Page 5: Data Handling Methods - University of BirminghamData Handling Methods A Peak picking Fluorescence indices Regional Integration Data reduction Key considerations: speed (on-line data

Fluorescence Indices

Typically ratios of intensity at two x,y wavelength pairs.

They attempt to characterise the organic matter present. For example:

McKnight et al. (2001) FI: attempts to characterise microbial vs terrestrial organic matter.

Lower plot shows relationship with land cover (Wilson and Xenopolous, 2009).

Page 6: Data Handling Methods - University of BirminghamData Handling Methods A Peak picking Fluorescence indices Regional Integration Data reduction Key considerations: speed (on-line data

Fluorescence Indices

Baker et al (2008): peak T/peak C intensity shown to correlate with organic matter function (e.g. hydrophobicity, aluminium adsoption and benzopyrenebinding).

Page 7: Data Handling Methods - University of BirminghamData Handling Methods A Peak picking Fluorescence indices Regional Integration Data reduction Key considerations: speed (on-line data

Fluorescence Indices

Parlanti et al (2000) developed a index.

Ratio of ‘peak C’ fluorescence intensity at 380 nm to that between 420 and 435 nm.

For example, applied by Wilson and Xenopoulos (2009) to relate fluorescence to water quality parameters.

Page 8: Data Handling Methods - University of BirminghamData Handling Methods A Peak picking Fluorescence indices Regional Integration Data reduction Key considerations: speed (on-line data

Fluorescence indices

Advantages:

Simple to use and understand

Quantifiable

Can be used to design in-situ fixed wavelength instruments

Disadvantages:

Choice of wavelength pairs often arbitrary

Wastes significant information

Extent to which indices are transferable is unclear

Page 9: Data Handling Methods - University of BirminghamData Handling Methods A Peak picking Fluorescence indices Regional Integration Data reduction Key considerations: speed (on-line data

Regional Integration

Chen et al (2003) first proposed a simple integration of fluorescence within specified regions.

Advantages: uses whole EEM.

Disadvantages: choice and definitions of regions may not be transferrable, wavelength variability not considered.

Page 10: Data Handling Methods - University of BirminghamData Handling Methods A Peak picking Fluorescence indices Regional Integration Data reduction Key considerations: speed (on-line data

Data Reduction

300 350 400 450 5000

200

400

600

800

1000

Wavelength (nm)

Inte

nsity (

a.u

.)

Wa

ve

le

ng

th

(

nm

)

W a v e l e n g t h ( n m )

2 2 5 . 0 0

2 5 0 . 0 0

2 7 5 . 0 0

3 0 0 . 0 0

3 2 5 . 0 0

3 5 0 . 0 0

3 7 5 . 0 0

4 0 0 . 0 0

3 0 0 . 0 0 3 2 5 . 0 0 3 5 0 . 0 0 3 7 5 . 0 0 4 0 0 . 0 0 4 2 5 . 0 0 4 5 0 . 0 0 4 7 5 . 0 0 5 0 0 . 0 0

9 6 3 . 0 9

8 8 9 . 2 7

8 1 5 . 4 5

7 4 1 . 6 3

6 6 7 . 8 2

5 9 4 . 0 0

5 2 0 . 1 8

4 4 6 . 3 6

3 7 2 . 5 4

2 9 8 . 7 2

2 2 4 . 9 0

1 5 1 . 0 8

7 7 . 2 7

3 . 4 5

Previously detailed techniques are all trying to reduce the data volume to something manageable, but they are doing so in a relatively arbitrary way.

Statistical / operational techniques are available that would reduce the data to manageable components. Principal Components Analysis

Principal Filter AnalysisArtificial Neural networks / Expert Systems

Page 11: Data Handling Methods - University of BirminghamData Handling Methods A Peak picking Fluorescence indices Regional Integration Data reduction Key considerations: speed (on-line data

PCA

A method to simplify a data set by means of linear transformations that chose a new coordinate system in which the first axis (first principal component, PC1) describes the greatest variance within the dataset.

Example: Spencer et al (2007). PC1 correlates strongly positively with peaks C and A intensity and emission wavelengths, absorbance and DOC and strongly negatively with salinity and spectral slope. PC2 is correlated strongly positively with DON, TDN, NH4 and peak T and B intensity.

Page 12: Data Handling Methods - University of BirminghamData Handling Methods A Peak picking Fluorescence indices Regional Integration Data reduction Key considerations: speed (on-line data

PCA - summary

Advantages:

A powerful exploratory data analysis tool, can reveal relationships in the data that wouldn’t otherwise be observed

Can be combined with any of the previous techniques (peak picking, regional integration, indices), as well as other water quality parameters

Can be performed in standard statistics packages.

Disadvantages:

Some statistics experience needed

Off-line technique

Only as good as the input data.

Page 13: Data Handling Methods - University of BirminghamData Handling Methods A Peak picking Fluorescence indices Regional Integration Data reduction Key considerations: speed (on-line data

Data Handling Methods: AConclusions

This is just a brief overview of just a few of the techniques available. Further advanced techniques are available (expert systems, discriminant analysis, parallel factor analysis)

All approaches are a compromise between maximizing information extracted from an EEM and practical considerations (time, data available)

No one solution fits all datasets.

300 350 400 450 5000

200

400

600

800

1000

Wavelength (nm)In

tensity (

a.u

.)

Wa

ve

le

ng

th

(

nm

)

W a v e l e n g t h ( n m )

2 2 5 . 0 0

2 5 0 . 0 0

2 7 5 . 0 0

3 0 0 . 0 0

3 2 5 . 0 0

3 5 0 . 0 0

3 7 5 . 0 0

4 0 0 . 0 0

3 0 0 . 0 0 3 2 5 . 0 0 3 5 0 . 0 0 3 7 5 . 0 0 4 0 0 . 0 0 4 2 5 . 0 0 4 5 0 . 0 0 4 7 5 . 0 0 5 0 0 . 0 0

9 6 3 . 0 9

8 8 9 . 2 7

8 1 5 . 4 5

7 4 1 . 6 3

6 6 7 . 8 2

5 9 4 . 0 0

5 2 0 . 1 8

4 4 6 . 3 6

3 7 2 . 5 4

2 9 8 . 7 2

2 2 4 . 9 0

1 5 1 . 0 8

7 7 . 2 7

3 . 4 5