Chemo Metrics Application 070511

Embed Size (px)

Citation preview

  • 8/6/2019 Chemo Metrics Application 070511

    1/35

    Data TreatmentData Treatment

    ChemometricsChemometricsdefined as the application ofdefined as the application ofmathematical, statistical, graphical ormathematical, statistical, graphical or

    symbolic methods to maximize thesymbolic methods to maximize thechemical information that can bechemical information that can be

    extracted from the dataextracted from the data

  • 8/6/2019 Chemo Metrics Application 070511

    2/35

    Selection ofSuitable Microwave Digestion MethodSelection ofSuitable Microwave Digestion Method

    PCA (Principal component analysis)PCA (Principal component analysis)

    SIMCA (Soft Independent Modeling ofClass Analogies)SIMCA (Soft Independent Modeling ofClass Analogies)

    PROMETHEE (PreferenceRanking OrganizationPROMETHEE (PreferenceRanking Organization

    METHod for Enrichment Evaluation)METHod for Enrichment Evaluation)

    GAIA (GeometricalAnalysis for InteractiveAid)GAIA (GeometricalAnalysis for InteractiveAid)

    FuzzyClusteringFuzzyClustering

    (From Kokot, et al., 1992.Anal. Chim.Acta, 259, 267-279)

  • 8/6/2019 Chemo Metrics Application 070511

    3/35

    Principal Component Analysis (PCA)Principal Component Analysis (PCA)

    A summarization and data reductionA summarization and data reductiontechniquetechnique

    Examines the interrelationships among aExamines the interrelationships among alarge number of variables and thenlarge number of variables and thenattempts to explain them in terms of theirattempts to explain them in terms of theircommon underlying dimensions, referred tocommon underlying dimensions, referred to

    as componentsas components

  • 8/6/2019 Chemo Metrics Application 070511

    4/35

    PCAPCA

    Based on the the derivation of linearBased on the the derivation of linear

    combinations of the original variables tocombinations of the original variables toproduce principal componentsproduce principal componentscharacterized by scores and loadingscharacterized by scores and loadings

    PCPCjkjk= a= aj1j1xxk1k1 + a+ aj2j2xxk2k2 + ...... + a+ ...... + ajnjnxxknkn

    where PCwhere PCjkjk= the score for object k on component j, a= the score for object k on component j, ajiji = the loading of variable= the loading of variableiion component j, xon component j, xkiki = the measured value of a variable= the measured value of a variable iion object k and n =on object k and n =total number of original variablestotal number of original variables

    SCORESSCORES projections of objects in aprojections of objects in a

    particular componentparticular component

    LOADINGSLOADINGS reflect the contribution of eachreflect the contribution of eachvariable to a particular componentvariable to a particular component

  • 8/6/2019 Chemo Metrics Application 070511

    5/35

    PCAPCA

    BIPLOTBIPLOT displays scaleddisplays scaled

    scores and loadings in ascores and loadings in a

    PC planePC plane

    1st component1st component accounts for the largestaccounts for the largest

    amount of variationamount of variation

    Subsequent componentsSubsequent components

    decreasing amounts ofdecreasing amounts of

    data variancedata varianceFrom Kokot, et al., 1992.Anal. Chim.Acta, 259, 267-279

  • 8/6/2019 Chemo Metrics Application 070511

    6/35

    ExtractedExtracted

    Information:Information:

    The objects (methods of digestion)The objects (methods of digestion)appear to cluster in at least twoappear to cluster in at least twogroups based on the six metals (Cu,groups based on the six metals (Cu,Pb, Ni, Cr, Co, and Zn) variables.Pb, Ni, Cr, Co, and Zn) variables.

    Group IGroup I -- methods 4Cb, 7Ab, andmethods 4Cb, 7Ab, andHPb. No hydrofluoric acid (HF) inHPb. No hydrofluoric acid (HF) inacid mixtures.acid mixtures.

    Group II consists of methods thatGroup II consists of methods thatcontain HF in their acid digest.contain HF in their acid digest.

    The presence of HF plays a majorThe presence of HF plays a majorrole in the discrimination ofrole in the discrimination ofmethods into groups.methods into groups.

    A typical method of digestion, 8Ab,A typical method of digestion, 8Ab,appeared either as an outlier or aappeared either as an outlier or asingle member group.single member group.

    The metals Cr and Pb are two mostThe metals Cr and Pb are two mostdiscriminating variables.discriminating variables.

    From Kokot, et al., 1992.Anal. Chim.Acta, 259, 267-279

  • 8/6/2019 Chemo Metrics Application 070511

    7/35

    Soft Independent Modeling of Class AnalogiesSoft Independent Modeling of Class Analogies

    (SIMCA)(SIMCA)

    Uses PCA to model the shape

    and position of the object formed

    by the samples in row space for

    class definition

    The shape of a class depends

    on the number of components

    used in the model

    To predict the classification of

    future samples, it is necessary to

    determine what region of

    measurement space it occupies

  • 8/6/2019 Chemo Metrics Application 070511

    8/35

    SIMCASIMCA

    1. Compute for residual standard deviation (RSD)1. Compute for residual standard deviation (RSD)

    for a class as a whole ( mean distance between thefor a class as a whole ( mean distance between the

    objects of a class and the class model)objects of a class and the class model)

    2. Compute for RSD for each object (orthogonal2. Compute for RSD for each object (orthogonal

    distance between the object and the class model)distance between the object and the class model)

    3.Compute F value from the computed residuals. If3.Compute F value from the computed residuals. IfFcal < Fcrit , the unknown sample is a member ofFcal < Fcrit , the unknown sample is a member of

    a classa class

    Procedure:

  • 8/6/2019 Chemo Metrics Application 070511

    9/35

    Extracted Information :Extracted Information : Only method 4C andOnly method 4C and

    probably 11A could beprobably 11A could bepart of the training setspart of the training setsconsisted of digestionconsisted of digestionmethods with HF in theirmethods with HF in theiracid mixtures.acid mixtures.

    This means that methodsThis means that methods4C and 11A could perform4C and 11A could performrelatively well as thoserelatively well as thosedigestion methods with HFdigestion methods with HFincluded in their acidincluded in their acidmixtures based on themixtures based on thedefined variables.defined variables.From Kokot, et al., 1992.Anal. Chim. Acta, 259, 267-279

  • 8/6/2019 Chemo Metrics Application 070511

    10/35

    Preference Ranking Organization METHod forPreference Ranking Organization METHod for

    Enrichment Evaluation (PROMETHEE) andEnrichment Evaluation (PROMETHEE) and

    Geometrical Analysis for Interactive Aid (GAIA)Geometrical Analysis for Interactive Aid (GAIA)

    PROMETHEEPROMETHEE

    Designed to rank number of actions (objects)Designed to rank number of actions (objects)

    in the context of constraints present in orin the context of constraints present in orimposed on the dataimposed on the data

    Ranking is performed according to a set ofRanking is performed according to a set ofuser supplied preference conditions which areuser supplied preference conditions which areapplied to the criteria (variables)applied to the criteria (variables)

  • 8/6/2019 Chemo Metrics Application 070511

    11/35

    Method Metal content (g g-1

    )Cu Pb Co Zn

    2B 103 155 12.6 441

    4A 99.5 166 15 432

    4B 91.4 159 16 433

    6A 103 145 13.7 432

    8B 98 159 13 421

    8C 99 164 13 435

    NBS 2704 98.6 5.0 161 17 14.0 0.6 438 12

    objects variables

    preference conditions

  • 8/6/2019 Chemo Metrics Application 070511

    12/35

  • 8/6/2019 Chemo Metrics Application 070511

    13/35

    Actions (methods of digestion) that arecomparable are joined by one or more

    arrows, Any comparable action to the left of

    another is preferred,

    Any actions that are incomparable

    remain unconnected.

    Interpretation of flow chart

  • 8/6/2019 Chemo Metrics Application 070511

    14/35

    Extracted Information:Extracted Information:

    For theFor the BSRBSRset of data (denoted by a b in the label, e.g.set of data (denoted by a b in the label, e.g.7Ab), methods 12b and 4Bb outranked the others but they7Ab), methods 12b and 4Bb outranked the others but they

    could not be compared because each method performedcould not be compared because each method performeddifferently on the six metal variables.differently on the six metal variables.

    ForFor NBS 2704NBS 2704 data (denoted by labels without a b, e.g.data (denoted by labels without a b, e.g.7A), the performance of method 8C is comparable to7A), the performance of method 8C is comparable tomethods 4A and 8B. However, 8C is located on the left of 4Amethods 4A and 8B. However, 8C is located on the left of 4A

    and 8B thus the former method is preferred than the latter.and 8B thus the former method is preferred than the latter.

    From Kokot, et al., 1992.Anal. Chim. Acta, 259, 267-279

  • 8/6/2019 Chemo Metrics Application 070511

    15/35

    PROMETHEE IIPROMETHEE II

    Applied to eliminate the indecisive resultApplied to eliminate the indecisive resultand to produce a simple ranking scaleand to produce a simple ranking scale

    Compute net outranking flow value:Compute net outranking flow value:difference between the associated positivedifference between the associated positive

    and negative outranking flowsand negative outranking flows

    Results are less reliable thanResults are less reliable than those ofthose of

    PROMETHEE IPROMETHEE I

  • 8/6/2019 Chemo Metrics Application 070511

    16/35

    Extracted Information:Extracted Information: The net flowThe net flow values for thevalues for the

    two most preferredtwo most preferred BSRBSRmethods are very similar andmethods are very similar and

    are well above the value forare well above the value forthe next methods 11Bb, 6Ab,the next methods 11Bb, 6Ab,and 8Ab.and 8Ab.

    ForFor NBS 2704NBS 2704, method 8C, method 8Chad considerably higherhad considerably higher value than that of each of thevalue than that of each of the

    next two methods.next two methods. HF in the acid mixturesHF in the acid mixtures

    (methods 4Bb, 12b, and(methods 4Bb, 12b, and11Bb) plays a major role in11Bb) plays a major role inthe digestion ofBSRsamplethe digestion ofBSRsample

    HCl in the acid mixturesHCl in the acid mixtures(methods 8C, 8B, and 4A)(methods 8C, 8B, and 4A)determines the efficiency ofdetermines the efficiency ofdigestion of NBS 2704digestion of NBS 2704sample.sample.

    PROMETHEE II ranking for complete NBS 2704, BSRand

    polished combined data. (From Kokot, et al., 1992. Anal.

    Chim.Acta, 259, 267-279)

  • 8/6/2019 Chemo Metrics Application 070511

    17/35

    GAIAGAIA Method for investigating theMethod for investigating thePROMETHEE resultsPROMETHEE results

    Net outranking flows areNet outranking flows aredecomposed to suit for PCAdecomposed to suit for PCA

    Biplot facilitates theBiplot facilitates theinterpretation of theinterpretation of thesignificance of the criteriasignificance of the criteria

    From Kokot, et al., 1992.Anal. Chim. Acta, 259, 267-279

    The GAIA biplot shows the discrimination of the methods ofThe GAIA biplot shows the discrimination of the methods ofdigestion according to the acid digest composition (PC2) and thedigestion according to the acid digest composition (PC2) and the

    origin of the rock/sediment sample (PC1).origin of the rock/sediment sample (PC1).

    The diagram is very similar to exploratory PCA, however, theThe diagram is very similar to exploratory PCA, however, the

    cluster separation appears to be sharper.cluster separation appears to be sharper.

    Extracted Information:

  • 8/6/2019 Chemo Metrics Application 070511

    18/35

    FUZZYCLUSTERINGFUZZYCLUSTERING Attempts to assign a degree of class membership forAttempts to assign a degree of class membership for

    a given object over several classesa given object over several classes

    Classification is performed with the aid of aClassification is performed with the aid of amembership functionmembership function

    m(x) = 1m(x) = 1 c /xc /x a/a/pp

    where a and c are constants, p is a positive orwhere a and c are constants, p is a positive orconstructed with reference to the dataconstructed with reference to the data

    Sum of the membership values for each object is 1Sum of the membership values for each object is 1

    Main advantage: Facilitates the distinction betweenMain advantage: Facilitates the distinction betweenobjects that clearly belong to one clusterobjects that clearly belong to one cluster(membership value of 1 or close to 1) and those that(membership value of 1 or close to 1) and those thatare members of several clusters (membership valueare members of several clusters (membership value

    of 1/(no. of clusters)).of 1/(no. of clusters)).

  • 8/6/2019 Chemo Metrics Application 070511

    19/35

    Fuzzy Clustering

    From Kokot, et al., 1992.Anal. Chim. Acta, 259, 267-279

    Results are in good agreement with

    the other chemometrics procedures.

    For NBS 2704, methods 2B, 4C, 7A,and 11Aare the important class

    members of one cluster. This cluster is

    characterized by the exclusion of HF in

    the acid mixtures.

    The second cluster is composed ofmethods 4A, 4B, 8B, and 8C. These

    methods have HF in their acid digest.

    In the case ofBSR, a 3-cluster model

    is more appropriate for analysis. The

    use of a 2-cluster model is heavilyinfluenced by the atypical method 8Ab.

    Some methods are in the intermediate

    positions, e.g. 2A. They are classified as

    members of two clusters.

    Extracted Information:

  • 8/6/2019 Chemo Metrics Application 070511

    20/35

    It was shown from the previous examples that allIt was shown from the previous examples that allthe chemometrics procedures provide consistentthe chemometrics procedures provide consistentinformation about outliers, groupings and trends.information about outliers, groupings and trends.

    However, only the multicriteria decisionHowever, only the multicriteria decision--makingmakingPROMETHEE provides the rank orderPROMETHEE provides the rank orderinformation which help in the selection of suitableinformation which help in the selection of suitablemicrowave digestion method.microwave digestion method.

    SIMCA and FC methods are most preferred forSIMCA and FC methods are most preferred forthe purposes of sample classification.the purposes of sample classification.

    Selection ofSuitable Digestion Method

  • 8/6/2019 Chemo Metrics Application 070511

    21/35

    ChemometricsChemometrics

    Extraction of Latent InformationExtraction of Latent Information

  • 8/6/2019 Chemo Metrics Application 070511

    22/35

    Ordination DiagramOrdination Diagram

    Used in the determination of carrierUsed in the determination of carrier

    substances for trace metals in sedimentssubstances for trace metals in sediments

    Involved simple correlation analysis on theInvolved simple correlation analysis on theset of major and trace elementsset of major and trace elements

    The positive correlation coefficientThe positive correlation coefficient

    matrices obtained are graphically picturedmatrices obtained are graphically pictured

    in a 2in a 2--dimensional diagramdimensional diagram

  • 8/6/2019 Chemo Metrics Application 070511

    23/35

    Interpretation ofOrdination Diagram

    The proximity of two variables on the diagram is a

    measure of their statistical dependence.

    If a trace element is significantly correlated to one orseveral major elements, it is possible that the mineral

    phase containing the major elements can be their

    carrier.

    The validity of this hypothesis should be verified by a

    chemical speciation analysis.

  • 8/6/2019 Chemo Metrics Application 070511

    24/35

    Ordination DiagramOrdination Diagram (From Jaquet, et al. 1982. Hydrobiologia, 91, 139-146.)

    Three major carrier substances in the whole study area:

    1. Organic matter - Cd, Pb,Ag, Cu and Hg

    2. Phosphates - Zn and Sn

    3. Silicates Ni, V, Co, Be, Cr and Zn

    contaminateddolomiticsilicate

    mixed silicatemixed carbonateautochtonous carbonate

  • 8/6/2019 Chemo Metrics Application 070511

    25/35

    Ordination DiagramOrdination Diagram

    autochtonous carbonate mixed carbonate mixed silicate

    dolomiticsilicate contaminated

    Organic matter does not act as a carrier for any metal in

    facies 1 (autochtonous carbonate) unlike most of the otherfacies. This exceptional behaviour has been attributed to the

    fact that organic carbon in facies 1 is mostly autochtonous

    whereas in other facies, particularly in facies 7, the

    allochtonous, anthropogenic organic matter predominates.

    (From Jaquet, et al. 1982. Hydrobiologia, 91, 139-146.)

  • 8/6/2019 Chemo Metrics Application 070511

    26/35

    Linear Regression AnalysisLinear Regression Analysis

    Develops linear equations from collected experimental data toDevelops linear equations from collected experimental data tomake predictions about the values of a dependent variablemake predictions about the values of a dependent variable

    based on the values of one or more independent variables.based on the values of one or more independent variables.

    Simple Linear RegressionSimple Linear Regression one independent variable is used toone independent variable is used topredict the value of the dependent variablepredict the value of the dependent variable

    Eqn.:Eqn.: Y = a + bXY = a + bXwhere Y = dependent variablewhere Y = dependent variable

    a = constant; intercepta = constant; intercept

    b = slope; regression coefficient orb = slope; regression coefficient or coefficientcoefficient

    X = independent variableX = independent variable

    Multiple Linear RegressionMultiple Linear Regression more than one independentmore than one independentvariable is used to predict the criterion.variable is used to predict the criterion.

    Eqn.:Eqn.: Y = a + bY = a + b11XX11 + b+ b22XX22+ ..........+ b+ ..........+ bnnXXnn

  • 8/6/2019 Chemo Metrics Application 070511

    27/35

    Application: Normalization ProceduresApplication: Normalization Procedures

    Metal: Grain size normalizationMetal: Grain size normalization Metal:Reference metal normalizationMetal:Reference metal normalization

    MultiMulti--element normalizationelement normalization

    Concept: Should the concentration of the metal be

    related to changing sediment particle size,

    the concentration will change with aconstant relation to grain size or its proxy

    (Loring and Rantala, 1992. Earth Science Reviews, 235-283)

  • 8/6/2019 Chemo Metrics Application 070511

    28/35

    Why do we use normalization procedures?Why do we use normalization procedures?

    To reduce or eliminate grain size effects onTo reduce or eliminate grain size effects on

    chemical datachemical data

    Identification of anomalousIdentification of anomalous metalmetal

    concentrations in sedimentsconcentrations in sediments

    Determination of factors that control theDetermination of factors that control the

    trace metal distribution in sedimentstrace metal distribution in sediments

    (multiple regression)(multiple regression)

  • 8/6/2019 Chemo Metrics Application 070511

    29/35

    (FromWindom, et al., 1989. Environ. Sci. Technol., 23, 314-320)

    Example: Metal: Reference Metal NormalizationExample: Metal: Reference Metal Normalization

    Al was used as a proxy forthe granular variations of the

    aluminosilicate fractions

    Concentration of metalscovary withAl except forCd

    Data points outside the

    95% confidence band wereconsidered contaminated

    The slopes of theseregression equations can be

    compared to the metal toaluminum ratios computedfor average continental rocksand for average continentalsoils

    95% confidence band

  • 8/6/2019 Chemo Metrics Application 070511

    30/35

    Hierarchical Cluster Analysis (HCA)Hierarchical Cluster Analysis (HCA)

    Seeks to minimize withinSeeks to minimize within--group variance andgroup variance andmaximize betweenmaximize between--group variance and representgroup variance and representthat information in the form of a twothat information in the form of a two--dimensionaldimensionalplot called dendrogramplot called dendrogram

    Result is a number of heterogeneous groups withResult is a number of heterogeneous groups withhomogeneous contentshomogeneous contents

    Classify objects or variables into several mutuallyClassify objects or variables into several mutuallyexclusive groups based on the similarity of theexclusive groups based on the similarity of thecharacteristics they possesscharacteristics they possess

    Develop hypothesis about the nature of the data orDevelop hypothesis about the nature of the data orexamine previously stated hypothesisexamine previously stated hypothesis

  • 8/6/2019 Chemo Metrics Application 070511

    31/35

    HCAHCADendrogram

    Cluster I: The group of Fe, Mn, Zn, Pb and Li. The presence

    naturally occurring Li, Fe and Mn in this group suggests that the

    other elements (Zn and Pb) may also be of similar (natural) origin

    or they may have been distributed evenly in the coastal sediments

    by the tidal activity. The oxides of Fe and Mn probably play an

    important role in their distribution.

    Cluster II: The group of organic carbon and Cu indicates the

    role of organic matter in the distribution ofCu.

    Cluster I

    Cluster II

    (FromAngelidis andAloupi, 2000. Mar. Poll. Bull., 77-82)

    Cluster III Cluster IV

  • 8/6/2019 Chemo Metrics Application 070511

    32/35

    Cluster III: The group ofAl, Cr, and Ni. The fact that Cr and Ni

    apper in the same group with the naturally derivedAl, suggests

    that weathering of natural rocks may play an important role in the

    distribution of those metals in the sediments of the study area.

    Cluster IV: The group ofCd. Cadmium forms a group of its own

    which indicates that the metal has a different distribution process

    compared to the other metals.

    Dendrogram

    Cluster III Cluster IV

    (FromAngelidis andAloupi, 2000. Mar. Poll. Bull., 77-82)

  • 8/6/2019 Chemo Metrics Application 070511

    33/35

    Principal Component Analysis (PCA)Principal Component Analysis (PCA)

    A summarization and data reduction techniqueA summarization and data reduction technique Examines the interrelationships among a largeExamines the interrelationships among a large

    number of variables and then attempts to explainnumber of variables and then attempts to explainthem in terms of their common underlyingthem in terms of their common underlying

    dimensions, referred to as componentsdimensions, referred to as components Provides visual display of the data that is oftenProvides visual display of the data that is often

    more enlightening than comparison of only one ormore enlightening than comparison of only one ortwo variables at a timetwo variables at a time

    Used to delimit areas of most contaminatedUsed to delimit areas of most contaminatedsediments and the relative importance of thesediments and the relative importance of themajor metal anthropogenic inputsmajor metal anthropogenic inputs

  • 8/6/2019 Chemo Metrics Application 070511

    34/35

    Spatial distribution of metalsSpatial distribution of metals

    is explained by two PCs whichis explained by two PCs whichaccount for 77.9% of theaccount for 77.9% of thevariance.variance.

    Identified three end members:Identified three end members:

    thethe clean Buzzards Bayclean Buzzards Bay

    sedimentssediments thethe less contaminatedless contaminated

    outer harbor sedimentsouter harbor sediments

    thethe contaminated innercontaminated innerharbor sedimentsharbor sediments

    The first PC has separated theThe first PC has separated theclean Buzzards Bay samplesclean Buzzards Bay samplesfrom the contaminatedfrom the contaminatedsamples in New Bedfordsamples in New BedfordHarbor.Harbor.

    The second PC has furtherThe second PC has furtherseparated the samples from theseparated the samples from theNew Bedford Harbor based onNew Bedford Harbor based onthe types of metals present inthe types of metals present inthe sediments.the sediments.

    (From Shine, et al., 1995. Environ. Sci. Technol., 29, 1781-1788)

    PC APC A

  • 8/6/2019 Chemo Metrics Application 070511

    35/35

    Co, Mn and Ni define theCo, Mn and Ni define the

    clean Buzzards Bayclean Buzzards Baysedimentssediments

    Zn and Pb define the outerZn and Pb define the outerportion of New Bedfordportion of New BedfordHarborHarbor

    Cu, Cd and Cr define theCu, Cd and Cr define thecontaminated innercontaminated innerportion of New Bedfordportion of New BedfordHarborHarbor

    Each of these threeEach of these threeclusters of metals haveclusters of metals havesimilar loadings as thesimilar loadings as thegeographical clusters ingeographical clusters inthe score plot.the score plot.

    (From Shine, et al., 1995. Environ. Sci. Technol., 29, 1781-1788)

    PC APC A