Chemometric functions in Excel

Preview:

DESCRIPTION

Chemometric functions in Excel. Oxana Rodionova & Alexey Pomerantsev Semenov Institute of Chemical Physics rcs@chph.ras.ru. Distance Learning Course in Chemometrics for Technological and Natural-Science Mastership Education. Unfulfilled need in chemometric education in Russia - PowerPoint PPT Presentation

Citation preview

01.12.08 1

Chemometric functions in Excel Chemometric functions in Excel

Oxana Rodionova & Alexey PomerantsevOxana Rodionova & Alexey Pomerantsev

Semenov Institute of Chemical PhysicsSemenov Institute of Chemical Physics

rcs@chph.ras.rurcs@chph.ras.ru

01.12.08 2

Distance Learning Course in Chemometrics Distance Learning Course in Chemometrics for Technological and Natural-Science for Technological and Natural-Science Mastership EducationMastership Education

3000 km

4000 km

• Unfulfilled need in chemometric education in Russia

• Low number of qualified specialists in chemometrics

• Large distances, e.g. Moscow – Barnaul is about 3000 km

• No modern chemometrics books in Russian

• No available chemometric software

• No support from officials: government, Academy, etc

Barnaul

• Easy available everywhere => INTERNET

• Interactive layout: all calculations should be clear and repeatable

• Web friendly environment for the calculations => EXCEL

• Necessity to make and use our own (free) software => EXCEL Add-In

01.12.08 3

Chemometric Chemometric calculations in calculations in Excel Excel

Excel UserInterface

VBAFunctions

С++DLL

DA

TAR

es

ults

Input

Calculations

Provides user with all possibilities of Excel interface, worksheet calculations, worksheet functions, charts, etc.

VBA helps to simplify routine work

All calculations are made "on the fly“ and very fast

01.12.08 4

InstallationInstallation

http://rcs.chph.ras.ru/down/sacs.ziphttp://rcs.chph.ras.ru/down/sacs.zip

Chemometrics. xlaChemometrics. xla put in the AddInn folderput in the AddInn folder

(C:\Documents and Settings\ (C:\Documents and Settings\ <User>\Application Data\ <User>\Application Data\ Microsoft\AddIns\)Microsoft\AddIns\)

Chemometrics.dllChemometrics.dll

put in your Windows folderput in your Windows folder (C:\WINDOWS\)(C:\WINDOWS\)

Load Chemometrics.xla by < Excel Options> <Add-Ins> in the open Workbook

01.12.08 5

Matrix calculations in ExcelMatrix calculations in Excel

={TRANSPOSE(B6:F10)}

={MMULT(B6:F10,TRANSPOSE(Barr))}

B6:F10

Barr

Ctrl-Shift-Enter

01.12.08 6

Principal Component Analysis (PCA)Principal Component Analysis (PCA)

Initial data

Loading matrix

XI

J A

Score matrix

TI= +

×

Error matrix

EI

J

PT

J

A

PJ

A

X=TPT+E

01.12.08 7

Chemometrics XLA. PCA ScoresChemometrics XLA. PCA Scores

={ScoresPCA(Xcal,5,1,Xtst)}

CenteringAND/ORweighting

nPC

XcalXcal

XtstXtst

01.12.08 8

Chemometrics XLA. PCA LoadingsChemometrics XLA. PCA Loadings

=TRANSPOSE(LoadingsPCA(Xcal,5,1))}CenteringAND/ORweighting

nPCExcel worksheet function

XcalXcal

01.12.08 9

List of chemometric functionsList of chemometric functionsPCA ScoresPCA <for calibration or test samples>

LoadingsPCA

PLS ScoresPLS <X-scores for calibration or test samples>

UScoresPLS <Y-scores for calibration or test samples>

LoadingsPLS <P-loadings>

WLoadingsPLS

QLoadingsPLS

PLS2 ScoresPLS2 <X-scores for calibration or test samples>

UScoresPLS2 <Y-scores for calibration or test samples>

LoadingsPLS2 <P-loadings>

WLoadingsPLS2

QLoadingsPLS2

Options:

• Centering AND/OR scaling

• Number of PCs

01.12.08 10

ScoresPCAScoresPCA

ScoresPCA (rMatrix [, nPCs] [,nCentWeightX] [, rMatrixNew] ) 

X data (calibration set)

Number of PC (A)

centering and/or scaling

1 centering

2 scaling

3 both

Test set

X[IJ] T[I A]

01.12.08 11

Validation RulesValidation Rules

If rMatrixNew is omitted then only calibration scores are calculated

If rMatrixNew is specified then only test scores are calculated

If rMatrixNew coincides with rMatrix then cross-validation is

calculated10% -out

cross-validation

01.12.08 12

LoadingsPCALoadingsPCA

LoadingsPCA (rMatrix [, nPCs] [,nCentWeightX]) 

X data (calibration set)

Number of PC (A)

centering and/or scaling

1 centering

2 scaling

3 both

X[IJ] P[J A]

01.12.08 13

Explorative Data AnalysisExplorative Data Analysis

Case study 1: People

01.12.08 14

PeoplePeople

01.12.08 15

Dataset in Excel Workbook (People.xls)Dataset in Excel Workbook (People.xls)

Number of objects (n) = 32

Number of variables (m) = 12

01.12.08 16

Data PreprocessingData Preprocessing

Aim: to transform the data into the most suitable form for data analysis

01.12.08 17

AutoscalingAutoscaling

mean centering scaling

autoscaling

+

=

01.12.08 18

PeoplePeople: : Scores & Loadings (PC1 vs. PC2)Scores & Loadings (PC1 vs. PC2)

-2

0

2

4

-4 -2 0 2 4 6

t1

t2

FSFS

FS

FS

FS

FS

FS

FS

FN

FN

FNFN

FN

FN

FN

FN

MS

MSMS

MS

MSMS

MSMS

MN

MN

MNMN

MN

MN

MNMN

-2

0

2

4

-4 -2 0 2 4 6

t1

t2

Height

Weight

Hairs

Shoes

Age

IncomeBeer

Wine

Sex

Strength

Region

IQ

-0.3

0.0

0.3

0.6

-0.4 -0.2 0.0 0.2 0.4

P1

P2 a)

“Map of Samples” “Map of Variables”

01.12.08 19

PeoplePeople: : Scores & Loadings (PC1 vs. PC3)Scores & Loadings (PC1 vs. PC3)

MNMN

MN

MN

MNMN

MN

MN

MSMS

MS MS

MS

MSMS

MS

FN

FN

FN

FN

FNFN

FN

FN

FS

FS

FS

FS

FS

FS

FSFS

-3

-1

1

3

-4 -2 0 2 4 6

t1

t3

Score plot Loading plot

IQ

Region

StrengthSex

Wine

Beer

Income

Age

ShoesHairs

Weight

Height

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

-0.4 -0.2 0.0 0.2 0.4

P1

P3 a)

18

20

21

2627

26

30

33

2324

2427

30

36

32

35

36

4240

41

32 3337

41

40

49

37

50

43

55

4648

-3

-2

-1

0

1

2

3

-4 -2 0 2 4 6

t1

t3

01.12.08 20

Case study 2: HPLC-DADCase study 2: HPLC-DAD

01.12.08 21

MeasurementsMeasurements

15

913

1721

2529

220

249

277

306

334

0.0

0.2

0.4

0.6

0.8

1.0

1.2

AU

time

wavelength

01.12.08 22

Dataset in Excel WorkbookDataset in Excel Workbook

X(3028)

01.12.08 23

Pure compoundsPure compounds A andA and BB

X=CST+E

0.0

0.2

0.4

0.6

0.8

1.0

220 240 260 280 300 320 340

l, nm

AU A

BC (t )

0.0

0.2

0.4

0.6

0.8

1.0

1.2

0 5 10 15 20 25 30

time

A

B

If we observe X can we predict C and S ?

01.12.08 24

30292827262524232221201918

1716

1514

1312 11

10

9

8

7

6

54

3

2

1

t 1

t 2

Score plotScore plot

B

A

C (t )

0.0

0.2

0.4

0.6

0.8

1.0

1.2

0 5 10 15 20 25 30

time

A

B

01.12.08 25

Conclusions from the Score PlotConclusions from the Score Plot

1. Linear regions = Pure compounds

2. Curved line= Co-elution

3. Closer to the origin = Lower intensity

4. Number of bends = Number of different compounds

01.12.08 26

Factor analysis vs. PCA analysisFactor analysis vs. PCA analysis

X

E1

+

=

CST×

2

J

I

I

J

X

E2

+

=

TPT×

A

J

I

I

J

01.12.08 27

Scores and LoadingsScores and Loadings

S , P

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

220 240 260 280 300 320 340

wave length

A

B

p1

p2

C , T

-0.8

0.2

1.2

2.2

3.2

0 5 10 15 20 25 30

time

A

B

t1

t2

01.12.08 28

Procrustes transformationProcrustes transformation

X ≈ CST

X ≈ TPT

I = RRT = Identity matrix

X ≈ T(RRT)PT = (TR)(PR)T

C ≈ TR S ≈ PR

R = Rstretch ×Rrotation

^ ^

01.12.08 29

Scores TransformationScores Transformation

3029282726252423222120191817 16 15 1413 12 11 10

9

8

7

6

54

3

21

t 1

t 2

12

3

4 5

6

7

8

9

10111213

1415161718192021222324252627282930

t 1

t 2

Stretching

12

3

45

6

7

8

9

101112131415161718192021222324252627282930

t 1

t 2

01.12.08 30

Procrustes analysis resultsProcrustes analysis results

C (t )

0.0

0.2

0.4

0.6

0.8

1.0

1.2

0 5 10 15 20 25 30

time

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0C hat(t )

A

B

Ahat

Bhat

0.0

0.2

0.4

0.6

0.8

1.0

220 240 260 280 300 320 340

wavelength l, nm

S(l)

0.0

0.2

0.4

0.6

0.8

1.0

1.2S hat(l)

A

B

Ahat

Bhat

01.12.08 31

Conclusions Conclusions

1. Scaling and centering is problem dependent

2. In this example number of PCs = Number of

different compounds

01.12.08 32

RegressionRegression

01.12.08 33

Principal Component Regression (PCR)Principal Component Regression (PCR)

Xp1

t

pAt...

tAt1

...

P

T

a = + e yT

1) PCA

2) MLR

01.12.08 34

Projection on Latent Structures (PLS)Projection on Latent Structures (PLS)

w1 t

wA t...

Xp1

t

pAt...

tAt1

... Yu1 uA

...

...q1

t

qAtQ

U

P

T

W

01.12.08 35

Projection on Latent Structures (PLS)Projection on Latent Structures (PLS)

B = + e YT

01.12.08 36

PLS and PLS2PLS and PLS2

b = + e yT1

1 1

B = + E YTM

M M

PLS

PLS2

01.12.08 37

ScoresPLSScoresPLS

ScoresPLS (rMatrixX, rMatrixY

[, nPCs] [, nCentWeightX] [, nCentWeightY] [, rMatrixXNew])

X data (calibration set)

Number of PC (A)

centering and/or scaling of X

1 centering

2 scaling

3 both

X Test set

Y data (calibration set)

centering and/or scaling of Y

1 centering

2 scaling

3 both

X[IJ], Y[I1] T[IA]

01.12.08 38

UScoresPLSUScoresPLS

UScoresPLS (rMatrixX, rMatrixY

[, nPCs] [, nCentWeightX] [, nCentWeightY] [, rMatrixXNew] [,

rMatrixYNew])

X data (calibration set)

Number of PC (A)

centering and/or scaling of X

1 centering

2 scaling

3 both

X Test set

Y data (calibration set)

centering and/or scaling of Y

1 centering

2 scaling

3 both

Y Test set

X[IJ] , Y[I1] U[I A]

01.12.08 39

WLoadingsPLSWLoadingsPLS

WLoadingsPLS (rMatrixX, rMatrixY

[, nPCs] [, nCentWeightX] [, nCentWeightY])

X data (calibration set)

Number of PC (A)

centering and/or scaling of X

1 centering

2 scaling

3 both

Y data (calibration set)

centering and/or scaling of Y

1 centering

2 scaling

3 both

X[IJ] , Y[I1] W[J A]

01.12.08 40

LoadingsPLSLoadingsPLS

LoadingsPLS (rMatrixX, rMatrixY

[, nPCs] [, nCentWeightX] [, nCentWeightY])

X data (calibration set)

Number of PC (A)

centering and/or scaling of X

1 centering

2 scaling

3 both

Y data (calibration set)

centering and/or scaling of Y

1 centering

2 scaling

3 both

X[IJ] , Y[I1] P[JA]

01.12.08 41

QLoadingsPLSQLoadingsPLS

QLoadingsPLS (rMatrixX, rMatrixY

[, nPCs] [, nCentWeightX] [, nCentWeightY])

X data (calibration set)

Number of PC (A)

centering and/or scaling of X

1 centering

2 scaling

3 both

Y data (calibration set)

centering and/or scaling of Y

1 centering

2 scaling

3 both

X[IJ], Y[I1] Q[1 A]

01.12.08 42

ScoresPLS2ScoresPLS2

ScoresPLS2 (rMatrixX, rMatrixY

[, nPCs] [, nCentWeightX] [, nCentWeightY] [, rMatrixXNew])

X data (calibration set)

Number of PC (A)

centering and/or scaling of X

1 centering

2 scaling

3 both

X Test set

Y data (calibration set)

centering and/or scaling of Y

1 centering

2 scaling

3 both

X[IJ], Y[IK] T[I A]

01.12.08 43

UScoresPLS2UScoresPLS2

UScoresPLS2 (rMatrixX, rMatrixY

[, nPCs] [, nCentWeightX] [, nCentWeightY] [, rMatrixXNew] [,

rMatrixYNew])

X data (calibration set)

Number of PC (A)

centering and/or scaling of X

1 centering

2 scaling

3 both

X Test set

Y data (calibration set)

centering and/or scaling of Y

1 centering

2 scaling

3 both

Y Test set

X[IJ], Y[IK] U[I A]

01.12.08 44

LoadingsPLS2LoadingsPLS2

LoadingsPLS2 (rMatrixX, rMatrixY

[, nPCs] [, nCentWeightX] [, nCentWeightY])

X data (calibration set)

Number of PC (A)

centering and/or scaling of X

1 centering

2 scaling

3 both

Y data (calibration set)

centering and/or scaling of Y

1 centering

2 scaling

3 both

WLoadingsPLS2WLoadingsPLS2

QLoadingsPLS2QLoadingsPLS2

X[IJ], Y[IK] P[J A] or W[J A] or Q[K A]

01.12.08 45

Seventh Winter Symposium on Seventh Winter Symposium on ChemometricsChemometrics

near Tula city, February 2010

100 km