Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Data Handling Methods
Data Handling Methods A
Peak picking
Fluorescence indices
Regional Integration
Data reduction
Key considerations:
speed (on-line data collection vs manual post processing).
Simplicity (simple linear regression against water quality vs detailed post processing to obtain structural information)
Initial dataset characteristics (EEM vs line scans).
300 350 400 450 5000
200
400
600
800
1000
Wavelength (nm)
Inte
nsity (
a.u
.)
Wa
ve
le
ng
th
(
nm
)
W a v e l e n g t h ( n m )
2 2 5 . 0 0
2 5 0 . 0 0
2 7 5 . 0 0
3 0 0 . 0 0
3 2 5 . 0 0
3 5 0 . 0 0
3 7 5 . 0 0
4 0 0 . 0 0
3 0 0 . 0 0 3 2 5 . 0 0 3 5 0 . 0 0 3 7 5 . 0 0 4 0 0 . 0 0 4 2 5 . 0 0 4 5 0 . 0 0 4 7 5 . 0 0 5 0 0 . 0 0
9 6 3 . 0 9
8 8 9 . 2 7
8 1 5 . 4 5
7 4 1 . 6 3
6 6 7 . 8 2
5 9 4 . 0 0
5 2 0 . 1 8
4 4 6 . 3 6
3 7 2 . 5 4
2 9 8 . 7 2
2 2 4 . 9 0
1 5 1 . 0 8
7 7 . 2 7
3 . 4 5
Peak Picking
Simplest data extraction method: finding the maximum intensity (z) in a given region (x,y).
Intensity vs concentration relationships can be developed by linear regression.
Wavelength variations may also provide information about organic matter character.
300 350 400 450 5000
200
400
600
800
1000
Wavelength (nm)In
tensity (
a.u
.)
Wa
ve
le
ng
th
(
nm
)
W a v e l e n g t h ( n m )
2 2 5 . 0 0
2 5 0 . 0 0
2 7 5 . 0 0
3 0 0 . 0 0
3 2 5 . 0 0
3 5 0 . 0 0
3 7 5 . 0 0
4 0 0 . 0 0
3 0 0 . 0 0 3 2 5 . 0 0 3 5 0 . 0 0 3 7 5 . 0 0 4 0 0 . 0 0 4 2 5 . 0 0 4 5 0 . 0 0 4 7 5 . 0 0 5 0 0 . 0 0
9 6 3 . 0 9
8 8 9 . 2 7
8 1 5 . 4 5
7 4 1 . 6 3
6 6 7 . 8 2
5 9 4 . 0 0
5 2 0 . 1 8
4 4 6 . 3 6
3 7 2 . 5 4
2 9 8 . 7 2
2 2 4 . 9 0
1 5 1 . 0 8
7 7 . 2 7
3 . 4 5
Bieroza et al.2009, STOTEN
Hudson et al.,2008, STOTEN
Baker et al., 2008,Chemosphere
Fluorescence Indices
Typically ratios of intensity at two x,y wavelength pairs.
They attempt to characterise the organic matter present. For example:
McKnight et al. (2001) FI: attempts to characterise microbial vs terrestrial organic matter.
Lower plot shows relationship with land cover (Wilson and Xenopolous, 2009).
Fluorescence Indices
Baker et al (2008): peak T/peak C intensity shown to correlate with organic matter function (e.g. hydrophobicity, aluminium adsoption and benzopyrenebinding).
Fluorescence Indices
Parlanti et al (2000) developed a index.
Ratio of ‘peak C’ fluorescence intensity at 380 nm to that between 420 and 435 nm.
For example, applied by Wilson and Xenopoulos (2009) to relate fluorescence to water quality parameters.
Fluorescence indices
Advantages:
Simple to use and understand
Quantifiable
Can be used to design in-situ fixed wavelength instruments
Disadvantages:
Choice of wavelength pairs often arbitrary
Wastes significant information
Extent to which indices are transferable is unclear
Regional Integration
Chen et al (2003) first proposed a simple integration of fluorescence within specified regions.
Advantages: uses whole EEM.
Disadvantages: choice and definitions of regions may not be transferrable, wavelength variability not considered.
Data Reduction
300 350 400 450 5000
200
400
600
800
1000
Wavelength (nm)
Inte
nsity (
a.u
.)
Wa
ve
le
ng
th
(
nm
)
W a v e l e n g t h ( n m )
2 2 5 . 0 0
2 5 0 . 0 0
2 7 5 . 0 0
3 0 0 . 0 0
3 2 5 . 0 0
3 5 0 . 0 0
3 7 5 . 0 0
4 0 0 . 0 0
3 0 0 . 0 0 3 2 5 . 0 0 3 5 0 . 0 0 3 7 5 . 0 0 4 0 0 . 0 0 4 2 5 . 0 0 4 5 0 . 0 0 4 7 5 . 0 0 5 0 0 . 0 0
9 6 3 . 0 9
8 8 9 . 2 7
8 1 5 . 4 5
7 4 1 . 6 3
6 6 7 . 8 2
5 9 4 . 0 0
5 2 0 . 1 8
4 4 6 . 3 6
3 7 2 . 5 4
2 9 8 . 7 2
2 2 4 . 9 0
1 5 1 . 0 8
7 7 . 2 7
3 . 4 5
Previously detailed techniques are all trying to reduce the data volume to something manageable, but they are doing so in a relatively arbitrary way.
Statistical / operational techniques are available that would reduce the data to manageable components. Principal Components Analysis
Principal Filter AnalysisArtificial Neural networks / Expert Systems
PCA
A method to simplify a data set by means of linear transformations that chose a new coordinate system in which the first axis (first principal component, PC1) describes the greatest variance within the dataset.
Example: Spencer et al (2007). PC1 correlates strongly positively with peaks C and A intensity and emission wavelengths, absorbance and DOC and strongly negatively with salinity and spectral slope. PC2 is correlated strongly positively with DON, TDN, NH4 and peak T and B intensity.
PCA - summary
Advantages:
A powerful exploratory data analysis tool, can reveal relationships in the data that wouldn’t otherwise be observed
Can be combined with any of the previous techniques (peak picking, regional integration, indices), as well as other water quality parameters
Can be performed in standard statistics packages.
Disadvantages:
Some statistics experience needed
Off-line technique
Only as good as the input data.
Data Handling Methods: AConclusions
This is just a brief overview of just a few of the techniques available. Further advanced techniques are available (expert systems, discriminant analysis, parallel factor analysis)
All approaches are a compromise between maximizing information extracted from an EEM and practical considerations (time, data available)
No one solution fits all datasets.
300 350 400 450 5000
200
400
600
800
1000
Wavelength (nm)In
tensity (
a.u
.)
Wa
ve
le
ng
th
(
nm
)
W a v e l e n g t h ( n m )
2 2 5 . 0 0
2 5 0 . 0 0
2 7 5 . 0 0
3 0 0 . 0 0
3 2 5 . 0 0
3 5 0 . 0 0
3 7 5 . 0 0
4 0 0 . 0 0
3 0 0 . 0 0 3 2 5 . 0 0 3 5 0 . 0 0 3 7 5 . 0 0 4 0 0 . 0 0 4 2 5 . 0 0 4 5 0 . 0 0 4 7 5 . 0 0 5 0 0 . 0 0
9 6 3 . 0 9
8 8 9 . 2 7
8 1 5 . 4 5
7 4 1 . 6 3
6 6 7 . 8 2
5 9 4 . 0 0
5 2 0 . 1 8
4 4 6 . 3 6
3 7 2 . 5 4
2 9 8 . 7 2
2 2 4 . 9 0
1 5 1 . 0 8
7 7 . 2 7
3 . 4 5