88
Optimization of Chromatogram Alignment Using A Class Separability Criterion Gopal Yalla Department of Mathematics and Computer Science Department of Chemistry College of the Holy Cross April 28, 2015 Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 1 / 38

Optimization of Chromatogram Alignment Using A Class

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Optimization of Chromatogram Alignment Using A Class

Optimization of Chromatogram Alignment Using A ClassSeparability Criterion

Gopal Yalla

Department of Mathematics and Computer ScienceDepartment of Chemistry

College of the Holy Cross

April 28, 2015

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 1 / 38

Page 2: Optimization of Chromatogram Alignment Using A Class

Outline

1 Introduction to Chromatography2 Theory and Techniques3 Experimental Data4 Data Preprocessing5 Results6 Extended Results7 Acknowledgements

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 2 / 38

Page 3: Optimization of Chromatogram Alignment Using A Class

Gas Chromatography

The gas chromatograph (GC)) is the main instrument used for separatingthe components of a mixture.

Two Phases: Mobile Phase and Stationary phase

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 3 / 38

Page 4: Optimization of Chromatogram Alignment Using A Class

Mass SpectrometryThe mass spectrometer (MS) identifies the amount and type of chemicalspresent in a sample.

Components are ionized and separated according mass.

The mass spectrum is a definite pattern of the number of ions presentat each mass level

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 4 / 38

Page 5: Optimization of Chromatogram Alignment Using A Class

Chromatograms

GC + MS produces chromatograms.

x-axis displays retention time in the GC column

y-azis displays molecular abundance in sample

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 5 / 38

Page 6: Optimization of Chromatogram Alignment Using A Class

Chromatograms

GC + MS produces chromatograms.

x-axis displays retention time in the GC column

y-azis displays molecular abundance in sample

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 5 / 38

Page 7: Optimization of Chromatogram Alignment Using A Class

Chromatographic Data Analysis

Peak Area Extractionæ Judgement of number and type of chemical components must be made

by the user.

æ Straightforward, but time consuming.

æ Sacrifice interesting trends.

æ Di�cult with complex data...

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 6 / 38

Page 8: Optimization of Chromatogram Alignment Using A Class

Chromatographic Data Analysis

Peak Area Extractionæ Judgement of number and type of chemical components must be made

by the user.

æ Straightforward, but time consuming.

æ Sacrifice interesting trends.

æ Di�cult with complex data...

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 6 / 38

Page 9: Optimization of Chromatogram Alignment Using A Class

Chromatographic Data Analysis

Peak Area Extractionæ Judgement of number and type of chemical components must be made

by the user.

æ Straightforward, but time consuming.

æ Sacrifice interesting trends.

æ Di�cult with complex data...

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 6 / 38

Page 10: Optimization of Chromatogram Alignment Using A Class

Chromatographic Data Analysis

Peak Area Extractionæ Judgement of number and type of chemical components must be made

by the user.

æ Straightforward, but time consuming.

æ Sacrifice interesting trends.

æ Di�cult with complex data...

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 6 / 38

Page 11: Optimization of Chromatogram Alignment Using A Class

Peak Area Extraction (Con’t)

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 7 / 38

Page 12: Optimization of Chromatogram Alignment Using A Class

Alignment Issue

When dealing with multiple samples, fluctuations in peak height andpeak location occur.

Without peak location alignment, trends determined by chemometricmethods will be skewed or meaningless.

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 8 / 38

Page 13: Optimization of Chromatogram Alignment Using A Class

Alignment Issue

When dealing with multiple samples, fluctuations in peak height andpeak location occur.

Without peak location alignment, trends determined by chemometricmethods will be skewed or meaningless.

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 8 / 38

Page 14: Optimization of Chromatogram Alignment Using A Class

Alignment Techniques

.

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 9 / 38

Page 15: Optimization of Chromatogram Alignment Using A Class

Alignment TechniquesCorrelation Optimized Warping (COW): Given two parameters segmentsize (m) and max warp (t), a chromatogram P is aligned to a targetchromatogram T .

Dynamic Programming : Solves combinatorial optimization problems.

COW uses two matrices, F and U of size (S + 1) ◊ (L + 1).

.Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 9 / 38

Page 16: Optimization of Chromatogram Alignment Using A Class

COW AlgorithmCorrelation Optimized Warping (COW): Given two parameters segmentsize (m) and max warp (t), a chromatogram P is aligned to a targetchromatogram T .

⌥⌃ ⌅⇧What is the optimal choice of COW parameters?

.

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 10 / 38

Page 17: Optimization of Chromatogram Alignment Using A Class

COW AlgorithmCorrelation Optimized Warping (COW): Given two parameters segmentsize (m) and max warp (t), a chromatogram P is aligned to a targetchromatogram T .

Choice of target chromatogram is based on similarity index,

SIj =NŸ

n=1|r(xj , xn)| .

Where r(·, ·) represents Pearson’s correlation coe�cient.

⌥⌃ ⌅⇧What is the optimal choice of COW parameters?

.

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 10 / 38

Page 18: Optimization of Chromatogram Alignment Using A Class

COW AlgorithmCorrelation Optimized Warping (COW): Given two parameters segmentsize (m) and max warp (t), a chromatogram P is aligned to a targetchromatogram T .

⌥⌃ ⌅⇧What is the optimal choice of COW parameters?

.Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 10 / 38

Page 19: Optimization of Chromatogram Alignment Using A Class

Nomenclature and Terminology

a = scalars

a = column vector

A = data matrices

Row index n corresponds to sample chromatogram

Column index m corresponds to retention time

M total retention times

N total chromatogram

Nk total chromatograms in the kth class

K total classes

x(Q)kn is the nth chromatogram in the kth class processed with

correction method Q.

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 11 / 38

Page 20: Optimization of Chromatogram Alignment Using A Class

Alignment Metrics: Warping E�ect

Warping E�ect = Simplicity + Peak Factor

Simplicity ([0, 1]): How close is data to rank 1 matrix

simplicity =Rÿ

r=1

Q

aSVD

Q

aX/

ııÙKÿ

k=1

Nkÿ

n=1

Mÿ

m=1x2

knm

R

b

R

b4

Peak Factor ([0, 1]): How much the shape and peak area ofchromatograms have been changed by warping

peak factor =1N

Kÿ

k=1

Nkÿ

n=1(1 ≠ min(ckn, 1)2)

where ckn =

-----Î x(COW)

kn Î ≠ Î xkn ÎÎ xkn Î

----- represents a relative error between

aligned and unaligned chromatogram.

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 12 / 38

Page 21: Optimization of Chromatogram Alignment Using A Class

Alignment Metrics: Warping E�ect

Warping E�ect = Simplicity + Peak Factor

Simplicity ([0, 1]): How close is data to rank 1 matrix

simplicity =Rÿ

r=1

Q

aSVD

Q

aX/

ııÙKÿ

k=1

Nkÿ

n=1

Mÿ

m=1x2

knm

R

b

R

b4

Peak Factor ([0, 1]): How much the shape and peak area ofchromatograms have been changed by warping

peak factor =1N

Kÿ

k=1

Nkÿ

n=1(1 ≠ min(ckn, 1)2)

where ckn =

-----Î x(COW)

kn Î ≠ Î xkn ÎÎ xkn Î

----- represents a relative error between

aligned and unaligned chromatogram.Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 12 / 38

Page 22: Optimization of Chromatogram Alignment Using A Class

Alignment Metric: Hotelling Trace CriterionHotelling Trace Criterion

HTC Incorporates both within class and between class variation inthe data set.

¿ HTC

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 13 / 38

Page 23: Optimization of Chromatogram Alignment Using A Class

Alignment Metric: Hotelling Trace CriterionHotelling Trace Criterion

HTC Incorporates both within class and between class variation inthe data set.

ø HTC

¿ HTC

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 13 / 38

Page 24: Optimization of Chromatogram Alignment Using A Class

Alignment Metric: Hotelling Trace CriterionHotelling Trace Criterion

HTC Incorporates both within class and between class variation inthe data set.

¿ HTC

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 13 / 38

Page 25: Optimization of Chromatogram Alignment Using A Class

Hotelling Trace Criterion• Define the sample mean vector and sample covariance matrix for the kth class as:

xk =1

Nk

Nkÿ

n=1xkn , Sk =

1Nk ≠ 1

Nkÿ

n=1(xkn ≠ xk)(xkn ≠ xk)

t .

• Let Pk = Nk/N be the probability of occurrence of class k. The grand meanvector is given by:

¯x =Kÿ

k=1Pk xk .

• The within-class scatter matrix and between-class scatter matrix is defined as:

Swc =Kÿ

k=1PkSk , Sbc =

Kÿ

k=1Pk(xk ≠ ¯x)(xk ≠ ¯x)t .

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 14 / 38

Page 26: Optimization of Chromatogram Alignment Using A Class

Hotelling Trace Criterion• Define the sample mean vector and sample covariance matrix for the kth class as:

xk =1

Nk

Nkÿ

n=1xkn , Sk =

1Nk ≠ 1

Nkÿ

n=1(xkn ≠ xk)(xkn ≠ xk)

t .

• Let Pk = Nk/N be the probability of occurrence of class k. The grand meanvector is given by:

¯x =Kÿ

k=1Pk xk .

• The within-class scatter matrix and between-class scatter matrix is defined as:

Swc =Kÿ

k=1PkSk , Sbc =

Kÿ

k=1Pk(xk ≠ ¯x)(xk ≠ ¯x)t .

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 14 / 38

Page 27: Optimization of Chromatogram Alignment Using A Class

Hotelling Trace Criterion• Define the sample mean vector and sample covariance matrix for the kth class as:

xk =1

Nk

Nkÿ

n=1xkn , Sk =

1Nk ≠ 1

Nkÿ

n=1(xkn ≠ xk)(xkn ≠ xk)

t .

• Let Pk = Nk/N be the probability of occurrence of class k. The grand meanvector is given by:

¯x =Kÿ

k=1Pk xk .

• The within-class scatter matrix and between-class scatter matrix is defined as:

Swc =Kÿ

k=1PkSk , Sbc =

Kÿ

k=1Pk(xk ≠ ¯x)(xk ≠ ¯x)t .

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 14 / 38

Page 28: Optimization of Chromatogram Alignment Using A Class

Hotelling Trace Criterion (Con’t)

The HTC is defined as: ⌥⌃

⌅⇧J = tr

!S≠1

wc Sbc"

When K = 2, HTC reduces to the Mahalanobis distance

J = (x1 ≠ x2)tS≠1(x1 ≠ x2)

When K = 2 and M = 1, HTC reduces to the square of a t-statistic

J = t21

2N

2

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 15 / 38

Page 29: Optimization of Chromatogram Alignment Using A Class

Hotelling Trace Criterion (Con’t)

The HTC is defined as: ⌥⌃

⌅⇧J = tr

!S≠1

wc Sbc"

When K = 2, HTC reduces to the Mahalanobis distance

J = (x1 ≠ x2)tS≠1(x1 ≠ x2)

When K = 2 and M = 1, HTC reduces to the square of a t-statistic

J = t21

2N

2

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 15 / 38

Page 30: Optimization of Chromatogram Alignment Using A Class

Hotelling Trace Criterion (Con’t)

The HTC is defined as: ⌥⌃

⌅⇧J = tr

!S≠1

wc Sbc"

When K = 2, HTC reduces to the Mahalanobis distance

J = (x1 ≠ x2)tS≠1(x1 ≠ x2)

When K = 2 and M = 1, HTC reduces to the square of a t-statistic

J = t21

2N

2

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 15 / 38

Page 31: Optimization of Chromatogram Alignment Using A Class

Experimental Data5 Classes of Biodiesel:

Soy (6 di�erent samples)Canola (3 di�erent samples)Tallow (3 di�erent samples)Waste Grease (2 di�erent samples)Hybrid (1 sample)

} Each sample tested3 di�erent runs

45 Total Chromatograms

Sample Chromatogram:

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 16 / 38

Page 32: Optimization of Chromatogram Alignment Using A Class

Experimental Data5 Classes of Biodiesel:

Soy (6 di�erent samples)Canola (3 di�erent samples)Tallow (3 di�erent samples)Waste Grease (2 di�erent samples)Hybrid (1 sample)

} Each sample tested3 di�erent runs

45 Total Chromatograms

Chemical Structure:

FAMEs (Fatty acid methyl ester)

Variable length of carbon chain and number of double bonds.

Sample Chromatogram:

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 16 / 38

Page 33: Optimization of Chromatogram Alignment Using A Class

Experimental Data5 Classes of Biodiesel:

Soy (6 di�erent samples)Canola (3 di�erent samples)Tallow (3 di�erent samples)Waste Grease (2 di�erent samples)Hybrid (1 sample)

} Each sample tested3 di�erent runs

45 Total Chromatograms

Chemical Structure:

FAMEs (Fatty acid methyl ester)

Variable length of carbon chain and number of double bonds.

Sample Chromatogram:

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 16 / 38

Page 34: Optimization of Chromatogram Alignment Using A Class

Experimental Data5 Classes of Biodiesel:

Soy (6 di�erent samples)Canola (3 di�erent samples)Tallow (3 di�erent samples)Waste Grease (2 di�erent samples)Hybrid (1 sample)

} Each sample tested3 di�erent runs

45 Total Chromatograms

Reaction Process:

Sample Chromatogram:

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 16 / 38

Page 35: Optimization of Chromatogram Alignment Using A Class

Experimental Data5 Classes of Biodiesel:

Soy (6 di�erent samples)Canola (3 di�erent samples)Tallow (3 di�erent samples)Waste Grease (2 di�erent samples)Hybrid (1 sample)

} Each sample tested3 di�erent runs

45 Total Chromatograms

Sample Chromatogram:

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 16 / 38

Page 36: Optimization of Chromatogram Alignment Using A Class

Data Preprocessing: Timeline

1 Baseline Correction2 COW Alignment3 Normalization & Mean Centering4 Principal Component Transformation5 Computed Metrics

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 17 / 38

Page 37: Optimization of Chromatogram Alignment Using A Class

Baseline Problem

Need to correct for non-linear increase in baseline caused from:Gradual increase in oven temperature

Column Bleeding

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 18 / 38

Page 38: Optimization of Chromatogram Alignment Using A Class

Baseline Correction

Use asymmetric least squares smoothing to determine baseline vector bÕ

that minimizes

f (bÕ) = Îwt(bÕ ≠ xkn)Î2 + ⁄ÎDbÕÎ2

w is a vector of weights⁄ is a relaxation parameterD is a second di�erence matrixηΠis the Euclidean norm

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 19 / 38

Page 39: Optimization of Chromatogram Alignment Using A Class

Baseline Correction

Use asymmetric least squares smoothing to determine baseline vector bÕ

that minimizes

f (bÕ) = Îwt(bÕ ≠ xkn)Î2 + ⁄ÎDbÕÎ2

w is a vector of weights⁄ is a relaxation parameterD is a second di�erence matrixηΠis the Euclidean norm

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 19 / 38

Page 40: Optimization of Chromatogram Alignment Using A Class

Baseline Correction

Use asymmetric least squares smoothing to determine baseline vector bÕ

that minimizes

f (bÕ) = Îwt(bÕ ≠ xkn)Î2 + ⁄ÎDbÕÎ2

w is a vector of weights⁄ is a relaxation parameterD is a second di�erence matrixηΠis the Euclidean norm

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 19 / 38

Page 41: Optimization of Chromatogram Alignment Using A Class

Baseline Correction

Use asymmetric least squares smoothing to determine baseline vector bÕ

that minimizes

f (bÕ) = Îwt(bÕ ≠ xkn)Î2 + ⁄ÎDbÕÎ2

w is a vector of weights⁄ is a relaxation parameterD is a second di�erence matrixηΠis the Euclidean norm

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 19 / 38

Page 42: Optimization of Chromatogram Alignment Using A Class

Baseline Correction: Finding Peaks• Let

xkn = s + b + ‘

where s is true peak height, b is true smooth basline, and ‘ is normalrandom error with small deviation ‡‘.

• Let mi be median vector of points in xkn over an appropriate windowcentered at time index i .

wi =

Y_]

_[

0 if |xkni | > mi ± 2‡‘

1 if |xkni | Æ mi ± 2‡‘

.

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 20 / 38

Page 43: Optimization of Chromatogram Alignment Using A Class

Baseline Correction: Finding Peaks• Let

xkn = s + b + ‘

where s is true peak height, b is true smooth basline, and ‘ is normalrandom error with small deviation ‡‘.

• Let mi be median vector of points in xkn over an appropriate windowcentered at time index i .

æ m ¥ b

æ xkn ¥ b + ‘

æ ‡‘ ¥ 1.4826 ◊ median (|xkn ≠ m|).

wi =

Y_]

_[

0 if |xkni | > mi ± 2‡‘

1 if |xkni | Æ mi ± 2‡‘

.

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 20 / 38

Page 44: Optimization of Chromatogram Alignment Using A Class

Baseline Correction: Finding Peaks• Let

xkn = s + b + ‘

where s is true peak height, b is true smooth basline, and ‘ is normalrandom error with small deviation ‡‘.

• Let mi be median vector of points in xkn over an appropriate windowcentered at time index i .

wi =

Y_]

_[

0 if |xkni | > mi ± 2‡‘

1 if |xkni | Æ mi ± 2‡‘

.Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 20 / 38

Page 45: Optimization of Chromatogram Alignment Using A Class

Baseline Correction: ResultsUsing bÕ to estimate b gives,

x(BC)kn = xkn ≠ bÕ ¥ s + ‘

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 21 / 38

Page 46: Optimization of Chromatogram Alignment Using A Class

Baseline Correction: ResultsUsing bÕ to estimate b gives,

x(BC)kn = xkn ≠ bÕ ¥ s + ‘

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 21 / 38

Page 47: Optimization of Chromatogram Alignment Using A Class

Normalization and Mean Centering

• Each chromatogram x(BC,COW)kn should be normalized to account for

variations in injection volume.

x(BC,COW,NORM)kn =

AAkn

· x(BC,COW)kn

where Akn represents total area of each chromatogram, and A is averagetotal area of all chromatograms.

• Each chromatogram should be mean centered to the origin.

x(BC,COW,NORM,MC)kn = x(BC,COW,NORM)

kn ≠ x(BC,COW,NORM)

where x(BC,COW,NORM,MC)kn is the sample mean chromatogram.

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 22 / 38

Page 48: Optimization of Chromatogram Alignment Using A Class

Normalization and Mean Centering

• Each chromatogram x(BC,COW)kn should be normalized to account for

variations in injection volume.

x(BC,COW,NORM)kn =

AAkn

· x(BC,COW)kn

where Akn represents total area of each chromatogram, and A is averagetotal area of all chromatograms.

• Each chromatogram should be mean centered to the origin.

x(BC,COW,NORM,MC)kn = x(BC,COW,NORM)

kn ≠ x(BC,COW,NORM)

where x(BC,COW,NORM,MC)kn is the sample mean chromatogram.

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 22 / 38

Page 49: Optimization of Chromatogram Alignment Using A Class

Principal Component AnalysisHTC was evaluated on the principal component transformed data.

.

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 23 / 38

Page 50: Optimization of Chromatogram Alignment Using A Class

Principal Component AnalysisHTC was evaluated on the principal component transformed data.

Let S represent the the sample covariance matrix of the entire set ofpreprocessed data, with eigenvalue decomposition:

S = U⇤Ut

Then ykn, the vector of PC’s, is computed via the transformation

ykn = Utx(BC,COW,NORM,MC)kn

Eigenvalues correspond to how much variation is explained in each PC.

.

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 23 / 38

Page 51: Optimization of Chromatogram Alignment Using A Class

Principal Component AnalysisHTC was evaluated on the principal component transformed data.

.

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 23 / 38

Page 52: Optimization of Chromatogram Alignment Using A Class

HTC Evaluated on PCs• Let zkn = (ykn1, ykn2, · · · , yknL)t denote the L ◊ 1 vector corresponding to thefirst L PCs of ykn. The sample mean vector and sample covariance matrix for thekth class are given respectively by

zk =1

Nk

Nkÿ

n=1zkn , Sk =

1Nk ≠ 1

Nkÿ

n=1(zkn ≠ zk)(zkn ≠ zk)

t .

• The grand mean vector is given by

¯z =Kÿ

k=1Pk zk .

• The within-class scatter matrix and between-class scatter matrix is defined as:

Swc =Kÿ

k=1PkSk , Sbc =

Kÿ

k=1Pk(zk ≠ ¯z)(zk ≠ ¯z)t .

• HTC is given by, ⌥⌃ ⌅⇧J = tr (S≠1wc Sbc)

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 24 / 38

Page 53: Optimization of Chromatogram Alignment Using A Class

Computed Metrics

Density Plots for Warp E�ect & HTC:

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 25 / 38

Page 54: Optimization of Chromatogram Alignment Using A Class

Computed Metrics

Density Plots for Warp E�ect & HTC:

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 25 / 38

Page 55: Optimization of Chromatogram Alignment Using A Class

Computed Metrics

Density Plots for Warp E�ect & HTC:

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 25 / 38

Page 56: Optimization of Chromatogram Alignment Using A Class

Computed Metrics

Density Plots for Warp E�ect & HTC:

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 25 / 38

Page 57: Optimization of Chromatogram Alignment Using A Class

Results: PC1 vs. PC2Max Warp E�ect: (26,15) Max HTC (1 PC): (64,3)

soy (¶), canola (ù), tallow (⇤), waste grease (ú), hybrid (+).

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 26 / 38

Page 58: Optimization of Chromatogram Alignment Using A Class

Results: PC1 vs. PC2Max Warp E�ect: (26,15) Max HTC (2 PC): (55,8)

soy (¶), canola (ù), tallow (⇤), waste grease (ú), hybrid (+).

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 26 / 38

Page 59: Optimization of Chromatogram Alignment Using A Class

Results: PC1 vs. PC2Max Warp E�ect: (26,15) Max HTC (3 PC): (70,6)

soy (¶), canola (ù), tallow (⇤), waste grease (ú), hybrid (+).

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 26 / 38

Page 60: Optimization of Chromatogram Alignment Using A Class

Results: PC1 vs. PC3Max Warp E�ect: (26,15) Max HTC (1 PC): (64,3)

soy (¶), canola (ù), tallow (⇤), waste grease (ú), hybrid (+).

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 27 / 38

Page 61: Optimization of Chromatogram Alignment Using A Class

Results: PC1 vs. PC3Max Warp E�ect: (26,15) Max HTC (2 PC): (55,8)

soy (¶), canola (ù), tallow (⇤), waste grease (ú), hybrid (+).

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 27 / 38

Page 62: Optimization of Chromatogram Alignment Using A Class

Results: PC1 vs. PC3Max Warp E�ect: (26,15) Max HTC (3 PC): (70,6)

soy (¶), canola (ù), tallow (⇤), waste grease (ú), hybrid (+).

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 27 / 38

Page 63: Optimization of Chromatogram Alignment Using A Class

Results: PC2 vs. PC3Max Warp E�ect: (26,15) Max HTC (1 PC): (64,3)

soy (¶), canola (ù), tallow (⇤), waste grease (ú), hybrid (+).

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 28 / 38

Page 64: Optimization of Chromatogram Alignment Using A Class

Results: PC2 vs. PC3Max Warp E�ect: (26,15) Max HTC (2 PC): (55,8)

soy (¶), canola (ù), tallow (⇤), waste grease (ú), hybrid (+).

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 28 / 38

Page 65: Optimization of Chromatogram Alignment Using A Class

Results: PC2 vs. PC3Max Warp E�ect: (26,15) Max HTC (3 PC): (70,6)

soy (¶), canola (ù), tallow (⇤), waste grease (ú), hybrid (+).

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 28 / 38

Page 66: Optimization of Chromatogram Alignment Using A Class

Summary of Results

Based on our data, HTC leads to better alignment than warping e�ect

æ Greater Euclidean Distance between class meansRatios for Segment Length/Max Warp (55,8) to (26,15)

Class Soy Canola Tallow Waste Grease

Soy 0 - - -Canola 1.18 0 - -Tallow 1.13 1.09 0 -

Waste Grease 1.22 1.16 1.12 0

æ Smaller within-class variation.Ratios for Segment Length/Max Warp (55,8) to (26,15)

Class 1st Major Axis 2nd Major Axis

Soy 0.94 0.92Canola 1.06 0.80Tallow 0.86 1.30

Waste Grease 0.68 0.68

Clear parametric distinction.

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 29 / 38

Page 67: Optimization of Chromatogram Alignment Using A Class

Summary of Results

Based on our data, HTC leads to better alignment than warping e�ect

æ Greater Euclidean Distance between class meansRatios for Segment Length/Max Warp (55,8) to (26,15)

Class Soy Canola Tallow Waste Grease

Soy 0 - - -Canola 1.18 0 - -Tallow 1.13 1.09 0 -

Waste Grease 1.22 1.16 1.12 0

æ Smaller within-class variation.Ratios for Segment Length/Max Warp (55,8) to (26,15)

Class 1st Major Axis 2nd Major Axis

Soy 0.94 0.92Canola 1.06 0.80Tallow 0.86 1.30

Waste Grease 0.68 0.68

Clear parametric distinction.

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 29 / 38

Page 68: Optimization of Chromatogram Alignment Using A Class

Summary of Results

Based on our data, HTC leads to better alignment than warping e�ect

æ Greater Euclidean Distance between class meansRatios for Segment Length/Max Warp (55,8) to (26,15)

Class Soy Canola Tallow Waste Grease

Soy 0 - - -Canola 1.18 0 - -Tallow 1.13 1.09 0 -

Waste Grease 1.22 1.16 1.12 0

æ Smaller within-class variation.Ratios for Segment Length/Max Warp (55,8) to (26,15)

Class 1st Major Axis 2nd Major Axis

Soy 0.94 0.92Canola 1.06 0.80Tallow 0.86 1.30

Waste Grease 0.68 0.68

Clear parametric distinction.

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 29 / 38

Page 69: Optimization of Chromatogram Alignment Using A Class

Project Milestone!

1 Published Work in Journal of ChemometricsSoares Edward J., Yalla Gopal R., O’Connor John B., Walsh Kevin A.,and Hupp Amber M. (2015), Hotelling trace criterion as a figure ofmerit for the optimization of chromatogram alignment, J. Chemometrics,29, pages 200-212.

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 30 / 38

Page 70: Optimization of Chromatogram Alignment Using A Class

More Complex Data: Biodiesel-Diesel Blends210 chromatograms with three di�erent attributes

æ Feedstock: Pure Diesel, Soy, Canola, IRE Tallow, Texas Tallow,Waste Grease

æ Diesel Type: Flynn, Hess, Shell, Sunoco

æ Blend Ratio: B2, B5, B10, B20

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 31 / 38

Page 71: Optimization of Chromatogram Alignment Using A Class

Diesel ResultsBefore Alignment:

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 32 / 38

Page 72: Optimization of Chromatogram Alignment Using A Class

Diesel ResultsAfter Alignment and Optimization:

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 32 / 38

Page 73: Optimization of Chromatogram Alignment Using A Class

Diesel ResultsAfter Alignment and Optimization:

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 32 / 38

Page 74: Optimization of Chromatogram Alignment Using A Class

ClassificationB10 Biodiesel Samples

Shell Sunoco

Texas Tallow 12 5 (¶) 5 (*)

IRE Tallow 12 5 (¶) 5 (*)

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 33 / 38

Page 75: Optimization of Chromatogram Alignment Using A Class

ClassificationB10 Biodiesel Samples

Shell Sunoco

Texas Tallow 12 5 (¶) 5 (*)

IRE Tallow 12 5 (¶) 5 (*)

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 33 / 38

Page 76: Optimization of Chromatogram Alignment Using A Class

Broader Impact

1 Determine chemical components that contribute the most to theenergy content of fuel

æ Create synthetic biomaterial with energy content?2 Forensic / Environment Concerns

æ Determine origins and consequence of oil spill

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 34 / 38

Page 77: Optimization of Chromatogram Alignment Using A Class

Future Work

1 Algorithmic Development

æ COW has very long computation time.

æ No parametric pattern

2 Larger Sample Size for HTC Results

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 35 / 38

Page 78: Optimization of Chromatogram Alignment Using A Class

Future Work

1 Algorithmic Developmentæ COW has very long computation time.

æ No parametric pattern

2 Larger Sample Size for HTC Results

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 35 / 38

Page 79: Optimization of Chromatogram Alignment Using A Class

Future Work

1 Algorithmic Developmentæ COW has very long computation time.

æ No parametric pattern

2 Larger Sample Size for HTC Results

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 35 / 38

Page 80: Optimization of Chromatogram Alignment Using A Class

Future Work

1 Algorithmic Developmentæ COW has very long computation time.

æ No parametric pattern

2 Larger Sample Size for HTC Results

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 35 / 38

Page 81: Optimization of Chromatogram Alignment Using A Class

Acknowledgements

Thank you for listening!

Professor Amber Hupp

Professor Kevin Walsh

Colette Houssan

Mike Comiskey

Department of Mathematics & Computer Science

Department of Chemistry

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 36 / 38

Page 82: Optimization of Chromatogram Alignment Using A Class

Acknowledgements

Thank you for listening!

Professor Amber Hupp

Professor Kevin Walsh

Colette Houssan

Mike Comiskey

Department of Mathematics & Computer Science

Department of Chemistry

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 36 / 38

Page 83: Optimization of Chromatogram Alignment Using A Class

Acknowledgements

Thank you for listening!

Professor Amber Hupp

Professor Kevin Walsh

Colette Houssan

Mike Comiskey

Department of Mathematics & Computer Science

Department of Chemistry

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 36 / 38

Page 84: Optimization of Chromatogram Alignment Using A Class

Acknowledgements

Thank you for listening!

Professor Amber Hupp

Professor Kevin Walsh

Colette Houssan

Mike Comiskey

Department of Mathematics & Computer Science

Department of Chemistry

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 36 / 38

Page 85: Optimization of Chromatogram Alignment Using A Class

AcknowledgementsJournal of Chemometrics

University Syringe Program Grant from Hamilton Company (AMH).

Robert L. Ardizzone Fund for Junior Faculty Excellence (AMH).

College of the Holy Cross.

National Institute of Standards and Technology (NIST, Gaithersburg,MD)

Western Dubuque Biodiesel

ADM Company,

Keystone Biofuels,

TMT Biofuels,

Texas Green Manufacturing

Iowa Renewable Energy

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 37 / 38

Page 86: Optimization of Chromatogram Alignment Using A Class

AcknowledgementsJournal of Chemometrics

University Syringe Program Grant from Hamilton Company (AMH).

Robert L. Ardizzone Fund for Junior Faculty Excellence (AMH).

College of the Holy Cross.

National Institute of Standards and Technology (NIST, Gaithersburg,MD)

Western Dubuque Biodiesel

ADM Company,

Keystone Biofuels,

TMT Biofuels,

Texas Green Manufacturing

Iowa Renewable Energy

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 37 / 38

Page 87: Optimization of Chromatogram Alignment Using A Class

AcknowledgementsJournal of Chemometrics

University Syringe Program Grant from Hamilton Company (AMH).

Robert L. Ardizzone Fund for Junior Faculty Excellence (AMH).

College of the Holy Cross.

National Institute of Standards and Technology (NIST, Gaithersburg,MD)

Western Dubuque Biodiesel

ADM Company,

Keystone Biofuels,

TMT Biofuels,

Texas Green Manufacturing

Iowa Renewable EnergyGopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 37 / 38

Page 88: Optimization of Chromatogram Alignment Using A Class

Thank you Professor Soares!

Couldn’t have done it without you Sauce!

Gopal Yalla (Holy Cross) Analysis of Biofuels April 28, 2015 38 / 38