18
A NEW USE OF TARGET FACTOR ANALYSIS (TFA) John H. Kalivas, Kevin Higgins Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA Erik Andries Department of Mathematics Central New Mexico Community College Albuquerque, New Mexico, Idaho 87106 USA

A NEW USE OF TARGET FACTOR ANALYSIS (TFA) John H. Kalivas, Kevin Higgins Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA Erik

Embed Size (px)

Citation preview

Page 1: A NEW USE OF TARGET FACTOR ANALYSIS (TFA) John H. Kalivas, Kevin Higgins Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA Erik

A NEW USE OF TARGET FACTOR ANALYSIS (TFA)

John H. Kalivas, Kevin Higgins

Department of Chemistry

Idaho State University

Pocatello, Idaho 83209 USA

Erik Andries

Department of Mathematics

Central New Mexico Community College

Albuquerque, New Mexico, Idaho 87106 USA

Page 2: A NEW USE OF TARGET FACTOR ANALYSIS (TFA) John H. Kalivas, Kevin Higgins Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA Erik

Classification Situation

• Numerous classification approaches– KNN, LDA, MD, ANN, SVM, …

• As the number of classes increases for a problem, the more difficult classification can become

• Target factor analysis (TFA) and net analyte signal (NAS)– TFA and NAS have concurrent calculations of

analogous angles between a test sample vector and respective spaces spanned by library classes

– Useful for binary or multiclass situations

2

Page 3: A NEW USE OF TARGET FACTOR ANALYSIS (TFA) John H. Kalivas, Kevin Higgins Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA Erik

Requirements

• Xi = m × n library information matrix for the ith class– m = number of samples– n = number of measurements

• Wavelengths for spectra, other physical or chemical variables

– Samples making up a library class must span variances making up the class• Instrument profile, temperature effects, measurement

process, others

• y = m × 1 test sample measurement vector

3

Page 4: A NEW USE OF TARGET FACTOR ANALYSIS (TFA) John H. Kalivas, Kevin Higgins Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA Erik

ti

t

USV

I P y P VV

y

y

y

y

y

X

2

2

1 2

2

where

sin θ , NAS selectivity

θ sin

Orthogonal Projection Spatial Angle (OPSA)

• Identical to TFA and NAS– Use same orthogonal projection

4

yy

yLib

Xiθ

Page 5: A NEW USE OF TARGET FACTOR ANALYSIS (TFA) John H. Kalivas, Kevin Higgins Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA Erik

Process

• No data preprocessing• Perform SVD of each library class• Retain d eigenvectors (class-wise) where

1 ≤ d ≤ k and k = rank(X) ≤ min(m,n)• Compute OPSA, MD, and KNN for the test

sample relative to each library class– Use leave one out cross-validation (LOOCV)

• Library class with smallest angle or MD is the test sample classification

• KNN classification trends evaluated

5

Page 6: A NEW USE OF TARGET FACTOR ANALYSIS (TFA) John H. Kalivas, Kevin Higgins Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA Erik

Assessment

• Accuracy = (TP + TN)/(TP +TN + FP + FN) – TP = true positives– TN = true negatives– FP = false positives– FN = false negatives

• Receiver operator characteristic (ROC)– True positive rate = sensitivity = TP/(TP + FN)– False positive rate = 1- specificity = 1 – TN/(TN + FP)

6

Page 7: A NEW USE OF TARGET FACTOR ANALYSIS (TFA) John H. Kalivas, Kevin Higgins Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA Erik

Determining Eigenvectors

• Numerous approaches exist to determine the minimum number of eigenvectors to span X

• Determination of rank by augmentation (DRAUG)– Malinowski ER. J. Chemom. 2011; 25: 323-328

• Distinguishes primary eigenvectors (chemical, instrumental, etc.) from secondary eigenvectors (experimental error) independent of the experimental uncertainties distribution

7

Page 8: A NEW USE OF TARGET FACTOR ANALYSIS (TFA) John H. Kalivas, Kevin Higgins Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA Erik

Plastic Data

• Six classes (six of seven commercial plastic types 1-6)– Allen V, Kalivas JH, Rodriguez RG. Applied Spec. 1999; 53:

672-681

• Raman spectroscopy (850 – 1800 cm-1, 1093 wavenumbers per spectrum)– Type 1 = polyethylene terephthalate (PET); 30 samples– Type 2 = high-density polyethylene (HDPE); 29 samples– Type 3 = polyvinyl chloride (PVC); 13 samples– Type 4 = low-density polyethylene (LDPE); 22 samples– Type 5 = polypropylene (PP); 23 samples– Type 6 = polystyrene (PS); 29 samples

8

Page 9: A NEW USE OF TARGET FACTOR ANALYSIS (TFA) John H. Kalivas, Kevin Higgins Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA Erik

2 4 6 8 10 120.85

0.9

0.95

1

Number of Eigenvectors

Fra

ctio

n o

f V

ari

an

ce

Plastic Score and Scree Plots

9

-0.2 -0.1 0

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

PC1

PC

2

Type 1Type 2Type 3Type 4Type 5Type 6

• Unique clusters are not formed• Most of the spectral variance is

captured with the first eigenvector

Score Plot Scree Plot

Page 10: A NEW USE OF TARGET FACTOR ANALYSIS (TFA) John H. Kalivas, Kevin Higgins Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA Erik

0 0.05 0.1 0.15 0.20

0.2

0.4

0.6

0.8

1

False Positive Rate (1-Specificity)

Tru

e P

osi

tive

Ra

te (

Se

nsi

tivity

)

0 0.01 0.02

0.88

0.9

0.92

0.94

0.96

0.98

1

54

3

11

2

12

2 4 6 8 10 12

0.7

0.75

0.8

0.85

0.9

0.95

1

Number of Eigenvectors

Acc

ura

cy

Plastic Classification Results

10

Library plastica Accuracy (%) Sensitivity (%) Specificity (%) OPSA MD OPSA MD OPSA MD

Type 1 (9) 100 94 100 83 100 97

Type 2 (9) 100 97 100 93 100 99

Type 3 (4) 100 85 100 54 100 91

Type 4 (6) 100 86 100 59 100 92

Type 5 (9) 100 78 100 35 100 87

Type 6 (11) 100 98 100 93 100 99

aParenthesis values are DRAUG eigenvector number rounded to nearestwhole number

Numbers indicate number of eigenvectors

Total Accuracy Across All

Classes

OPSAMD

ROC Plot

1 3 5 7 9 110.3

0.4

0.5

0.6

0.7

0.8

0.9

Number of Nearest Neighbors

KNN

SpecificitySensitivityAccuracy

Page 11: A NEW USE OF TARGET FACTOR ANALYSIS (TFA) John H. Kalivas, Kevin Higgins Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA Erik

Archeological Data

• Four classes (four archeological sources of obsidian)– Kowalski BR, Schatzki TF, Stross FH. Anal. Chem. 1972; 44:

2176-2180

• 10 trace metal concentrations from X-ray fluorescence spectroscopy (Fe, Ti, Ba, Ca, K, Mn, Rb, Sr, Y, and Zr)– Source 1 = 10 samples– Source 2 = 9 samples– Source 3 = 23 samples– Source 4 = 21 samples

11

Page 12: A NEW USE OF TARGET FACTOR ANALYSIS (TFA) John H. Kalivas, Kevin Higgins Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA Erik

1 3 5 7 9

0.88

0.9

0.92

0.94

0.96

0.98

1

Number of Nearest Neighbors

2 4 6 8

0.7

0.75

0.8

0.85

0.9

0.95

1

Number of Eigenvectors

Acc

ura

cy

Archeological Classification Results

12

OPSAMD

Library sourcea Accuracy (%) Sensitivity (%) Specificity (%) OPSA MD OPSA MD OPSA MD

Source 1 (2) 100 80 100 60 100 87

Source 2 (4) 100 100 100 100 100 100

Source 3 (4) 100 98 100 96 100 99

Source 4 (3) 100 100 100 100 100 100

0.08 0.1 0.12 0.14 0.16-0.2

-0.1

0

0.1

0.2

PC1

PC

2

2 4 6 80.9

0.92

0.94

0.96

0.98

1

Number of Eigenvectors

Fra

ctio

n o

f V

ari

an

ce

Source 1Source 2Source 3Source 4

Score Plot Scree Plot

Total Accuracy Across All

Classes

aParenthesis values are DRAUG eigenvector number rounded to nearest whole number

KNN

SpecificitySensitivityAccuracy

Page 13: A NEW USE OF TARGET FACTOR ANALYSIS (TFA) John H. Kalivas, Kevin Higgins Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA Erik

Gasoil Data

• Three classes (three commercial sources of gasoil)– Wentzell P, Andrews D, Walsh J, Cooley J, Spencer P. Can. J.

Chem. 1999; 77: 391-400

• Ultraviolet spectroscopy (200 – 400 nm, 572 wavelengths per spectrum)– Source 1 = 59 samples– Source 2 = 25 samples– Source 3 = 30 samples

13

Page 14: A NEW USE OF TARGET FACTOR ANALYSIS (TFA) John H. Kalivas, Kevin Higgins Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA Erik

1 3 5 7 9 110.65

0.7

0.75

0.8

0.85

0.9

0.95

Number of Nearest Neighbors

5 10 15 20

0.7

0.75

0.8

0.85

0.9

0.95

1

Number of Eigenvectors

Acc

ura

cy

Library sourcea Accuracy (%) Sensitivity (%) Specificity (%) OPSA MD OPSA MD OPSA MD

Source 1 (11) 100 100 100 100 100 100

Source 2 (8) 95 89 92 84 96 92

Source 3 (11) 98 82 97 73 98 87

5 10 15 200.85

0.9

0.95

1

Number of Eigenvectors

Fra

ctio

n o

f V

ari

an

ce

-0.15 -0.1 -0.05 0-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

PC1

PC

2

Gasoil Classification Results

14

OPSAMDSource 1

Source 2Source 3

Score Plot Scree Plot

Total Accuracy Across All

Classes

aParenthesis values are DRAUG eigenvector number rounded to nearest whole number

KNNSpecificitySensitivityAccuracy

Page 15: A NEW USE OF TARGET FACTOR ANALYSIS (TFA) John H. Kalivas, Kevin Higgins Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA Erik

Extra Virgin Olive Oil (EVOO) Data• Six classes (six adulterant oils)

– Poulli KI, Mousdis GA, Georgiou CA. Food Chem. 2007; 105: 369-375

• Synchronous fluorescence spectroscopy (250 – 400 nm at Δ20nm,151 wavelengths per spectrum)

– Adulterant 1 = corn– Adulterant 2 = olive-pomace– Adulterant 3 = soybean– Adulterant 4 = sunflower– Adulterant 5 = rapeseed– Adulterant 6 = walnut

• 31 samples each at 0.5 to 95 % adulterant

15

Page 16: A NEW USE OF TARGET FACTOR ANALYSIS (TFA) John H. Kalivas, Kevin Higgins Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA Erik

5 10 15 20 250.7

0.75

0.8

0.85

0.9

0.95

1

Number of Eigenvectors

Acc

ura

cy

5 10 15 20 250.75

0.8

0.85

0.9

0.95

1

Number of Eigenvectors

Fra

ctio

n o

f V

ari

an

ce

-0.4 -0.3 -0.2 -0.1

-0.4

-0.2

0

0.2

PC1

PC

2

Library adulteranta Accuracy (%) Sensitivity (%) Specificity (%) OPSA MD OPSA MD OPSA MD

Corn (8) 98 89 93 68 99 94

Olive-pomace (4) 100 92 100 77 100 95

Rapeseed (6) 93 88 81 65 96 93

Soybean (6) 100 93 100 80 100 96

Sunflower (6) 97 87 90 61 98 92

Walnut (4) 99 84 97 51 99 90

EVOO Classification Results

OPSAMD

Corn, Olive-pomace,Rapeseed, Soybean,Sunflower, Walnut

Score Plot Scree Plot

Total Accuracy Across All

Classes

1 3 5 7 9 110

0.2

0.4

0.6

0.8

1

Number of Nearest Neighbors

SpecificitySensitivityAccuracy

KNN

Page 17: A NEW USE OF TARGET FACTOR ANALYSIS (TFA) John H. Kalivas, Kevin Higgins Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA Erik

-0.4 -0.3 -0.2 -0.1

-0.4

-0.2

0

0.2

PC1

PC

2

0.819672

5.20833

9.76966

13.6673

17.7037

39.4578

92.4966

-0.4 -0.3 -0.2 -0.1

-0.4

-0.2

0

0.2

PC1

PC

2

Library adulteranta

Minimum adulterant concentration (%) OPSA MD

Corn (8) 1.73 14.91

Olive-pomace (4) 0.85 14.53

Rapeseed (6) 15.50 20.64

Soybean (6) 1.05 17.06

Sunflower (6) 4.02 18.62

Walnut (4) 0.82 21.24

EVOO Concentrations

17

Corn, Olive-pomace,Rapeseed, Soybean,Sunflower, Walnut

Concentration Coded Score Plot

Score Plot

% S

un

flo

wer

aParenthesis values are DRAUGeigenvector number rounded to nearestwhole number

Page 18: A NEW USE OF TARGET FACTOR ANALYSIS (TFA) John H. Kalivas, Kevin Higgins Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA Erik

Summary• TFA or NAS angular measure OPSA out-performs MD

and KNN over a variety of data sets– If normalize y to unit length, same results if use (TFA)

• Score plots need not be obvious• Need to determine number of eigenvectors (basis

vectors) to characterize each library class• Samples making up a library class need to span

variances making up that library class– Instrument profile– Temperature effects– Others

18

2

y

2

22

sin θ

y

yy