33
A T-Cell Cross- Regulation Binary Classifier Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Embed Size (px)

Citation preview

Page 1: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

A T-Cell Cross-Regulation Binary Classifier

Work in ProgressPresented by Ian Woodto CASCI, 4/24/13Advisor: Luis Rocha

Page 2: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Overview

Motivation and Previous Work Implementation Preliminary Results Future Directions

Page 3: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Previous Work: Carneiro et al.

Image From: J. Carneiro, et al., “When three is not a crowd: a Crossregulation model of the dynamics and repertoire selection of regulatory CD4+ T cells.,” Immunological Reviews, vol. 216, pp. 48–68, 2007.

Page 4: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Previous Work: Carneiro et al.

Image From: J. Carneiro, et al., “When three is not a crowd: a Crossregulation model of the dynamics and repertoire selection of regulatory CD4+ T cells.,” Immunological Reviews, vol. 216, pp. 48–68, 2007.

Page 5: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Previous Work: Carneiro et al.

𝑑𝐸𝑑𝑡

=𝑝𝐸 𝐸𝐴−𝑑𝐸𝐸

𝑑𝑅𝑑𝑡

=𝑝𝑅 𝑅𝐴−𝑑𝑅𝑅

Image From: J. Carneiro, et al., “When three is not a crowd: a Crossregulation model of the dynamics and repertoire selection of regulatory CD4+ T cells.,” Immunological Reviews, vol. 216, pp. 48–68, 2007.

Page 6: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Previous Work: Alaa Abi-Haidar

Image From: A. Abi-Haidar and L. M. Rocha, “Collective Classification of Textual Documents by Guided Self-Organization in T-Cell Cross-Regulation Dynamics,” Evolutionary Intelligence, p. In press, 2011.

Page 7: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Previous Work: Alaa Abi-Haidar

Image From: A. Abi-Haidar and L. M. Rocha, “Collective Classification of Textual Documents by Guided Self-Organization in T-Cell Cross-Regulation Dynamics,” Evolutionary Intelligence, p. In press, 2011.

• Red: Balanced training with cell death

• Green: Positive-only training with cell death

• Blue: Balanced training with cell death

• Yellow: Positive-only training with cell death

Page 8: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Previous Work: Alaa Abi-Haidar

Image From: A. Abi-Haidar and L. M. Rocha, “Collective Classification of Textual Documents by Guided Self-Organization in T-Cell Cross-Regulation Dynamics,” Evolutionary Intelligence, p. In press, 2011.

Red: Training and testing documents ordered by timeGreen: reinforced biasBlue: Documents out of order

Page 9: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Previous Work: Alaa Abi-Haidar

Page 10: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

My Implementation

Page 11: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Benefits/Limitations

Benefits C/C++ executes fast Individual objects can be tracked Interface structure allows

implementations of parts to be swapped Can take advantage of hardware

Limitations Development time is slow System is not easily ported

Page 12: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Implementation Misc.

Preprocessing Porter-Stemming Removed Stop Words:

http://www.ranks.nl/resources/stopwords.html

Classification normalization

Page 13: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Aside: GPU processing

Good for performing the same process across many elements in parallel

Could allow for efficient binding of Tcells without exact matching

Images From: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

Page 14: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Binding Step in CUDA

Page 15: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Performance Comparison Specs: NVIDIA GeForce GTX 560 Ti running on

Windows 7 384 CUDA cores 1 GB dedicated memory, 4 GB available

Average binding time per document of the inefficient sequential version: 128.5766 seconds

Average binding time per document of the CUDA parallel version: 1.650175 seconds

Average binding time per document of the mapped sequential version: 0.061443669

Speedup1 = sequential/parallel = 77.92 Speedup2 = sequential/mappedsequential= 2093

Page 16: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Performance Comparison Cont.

0 100000 200000 300000 400000 500000 600000 7000000

50

100

150

200

250

300

Binding Time

SequentialParallel

Number of TCells

Time (seconds)

Page 17: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Performance Comparison cont.

0 100000 200000 300000 400000 500000 600000 7000000

0.5

1

1.5

2

2.5

3

3.5

Binding Time

Mapped SequentialParallel

Number of TCells

Time (seconds)

Page 18: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Preliminary Results

Most of my work thus far is replication of Al’s work

Image From: A. Abi-Haidar and L. M. Rocha, “Collective Classification of Textual Documents by Guided Self-Organization in T-Cell Cross-Regulation Dynamics,” Evolutionary Intelligence, p. In press, 2011.

Page 19: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Fold Generation

Mixed vs Separated Mixed – Training and Testing set are randomly

interleaved. The First 10% (12) documents are labeled

Separated – Testing set is presented after training

Ordered vs Shuffled Ordered – Documents are ordered by month of

publication (random within month) Shuffled – Documents are randomly shuffled

without regard to date of publication

Page 20: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Performance Measures

𝑎𝑐𝑐=𝑇𝑃+𝑇𝑁

𝑇𝑃+𝐹𝑃+𝑇𝑁+𝐹𝑁 𝑀𝐶𝐶=𝑇𝑃 .𝑇𝑁 −𝐹𝑃 .𝐹𝑁

√(𝑇𝑃+𝐹𝑃 )(𝑇𝑃+𝐹𝑁 )(𝑇𝑁+𝐹𝑃)(𝑇𝑁+𝐹𝑁 )

𝐹 1=2.𝑇𝑃

2𝑇𝑃+𝐹𝑃+𝐹𝑁𝑃𝑟𝑒𝑐=

𝑇𝑃𝑇𝑃+𝐹𝑃

𝑅𝑒𝑐𝑎𝑙𝑙=𝑇𝑃

𝑇𝑃+𝐹𝑁

Equations Source: Luis Rocha’s IARPA presentation

Page 21: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Top Configurations Found So Far

nslot

eself

rself

enself

rnself

eunlab runlab

edrate

rdrate

cond

condi

precision

accuracy

recall mcc f1 tpos

tneg

fpos

fneg

12 3 9 8 2 6 3 1 1 2 1 0.74 0.80.9

30.620.8

2 28 20 10 2

13 3 9 8 2 5 3 1 1 2 1 0.95 0.78 0.60.610.7

3 18 29 1 12

12 3 9 8 2 5 3 2 2 2 1 0.84 0.78 0.70.570.7

6 21 26 4 9

20 8 12 12 8 8 8 25 25 2 2 1 0.570.1

3 0.270.2

4 4 30 0 26

20 12 24 12 10 12 10 2 2 1 2 0.58 0.63 1 0.390.7

3 30 8 22 0

20 8 12 12 8 8 8 25 25 5 2 0.58 0.63 1 0.390.7

3 30 8 22 0

Page 22: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Top Words

Top Effector Words Top Regulator Words Top Weighted Effector Words Top Weighted Regulator Wordshigh sepharose high sepharoseprotein affiniti indicateng affiniticonditions hybrid low hybridlow pcr conditions pcrindicateng change various changevarious immunoprecipitated protein immunoprecipitatedshown identifi mani identifititle pull shown pullresults librari title interactionalso fragment results interactskeyword interacts however librarionli interaction onli fragmentfig ha also clonedfigure ecl keyword identificateoncell identificateon figure hahowever suppresses cell transcriptionalkeywords deletion fig deletionintroduction lysate keywords beadsbodi demonstrated introduction lysateabstract neither see member

Page 23: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Trends in Conditions

Page 24: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Trends in Configurations

Page 25: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Best Distinguishing Features over Time

nslot

eself

rself

enself

rnself

eunself

runself

edrate

rdrate

cond

condi

precision

accuracy

recall mcc f1 tpos

tneg

fpos

fneg

12 3 9 8 2 6 3 1 1 2 1 0.74 0.80.9

30.620.8

2 28 20 10 2

Page 26: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Most Deviating Features over Time

nslot

eself

rself

enself

rnself

eunself

runself

edrate

rdrate

cond

condi

precision

accuracy

recall mcc f1 tpos

tneg

fpos

fneg

12 3 9 8 2 6 3 1 1 2 1 0.74 0.80.9

30.620.8

2 28 20 10 2

Page 27: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Most Deviating Features over Time cont.

nslot

eself

rself

enself

rnself

eunself

runself

edrate

rdrate

cond

condi

precision

accuracy

recall mcc f1 tpos

tneg

fpos

fneg

14 3 8 3 7 3 7 1 2 5 1 0 0.38 0 -0.36 -1 0 23 7 30

20 4 6 6 4 4 4 1 1 6 1 0 0.5 0 0 0 0 30 0 30

Page 28: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Distribution of TCells

nslot

eself

rself

enself

rnself

eunself

runself

edrate

rdrate

cond

condi

precision

accuracy

recall mcc f1 tpos

tneg

fpos

fneg

12 3 9 8 2 6 3 1 1 2 1 0.74 0.80.9

30.620.8

2 28 20 10 2

Page 29: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Distribution of Tcells cont.

nslot

eself

rself

enself

rnself

eunself

runself

edrate

rdrate

cond

condi

precision

accuracy

recall mcc f1 tpos

tneg

fpos

fneg

14 3 8 3 7 3 7 1 2 5 1 0 0.38 0 -0.36 -1 0 23 7 30

20 4 6 6 4 4 4 1 1 6 1 0 0.5 0 0 0 0 30 0 30

Page 30: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Distribution of Tcells cont.

nslot

eself

rself

enself

rnself

eunself

runself

edrate

rdrate

cond

condi

precision

accuracy

recall mcc f1 tpos

tneg

fpos

fneg

10 6 12 6 5 6 5 1 1 1 2 0.67 0.7 0.8 0.41 0.73 24 18 12 6

10 6 12 6 5 6 5 1 1 2 1 0.5 0.5 1 0 0.67 30 0 30 0

Page 31: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Future Directions

Sooner Exhaustive parameter search Effect of multiple iterations on

distributions Characterizing sensitivity

(analytically/artificial data) Other datasets and comparisons to other

classifiers

Page 32: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

Future Directions cont.

Later Proximity on APC based on proximity in

text Bi-gram features Other binding function▪ Substring▪ Sequence comparison binding

Page 33: Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

References

J. Carneiro, et al., “When three is not a crowd: a Crossregulation model of the dynamics and repertoire selection of regulatory CD4+ T cells.,” Immunological Reviews, vol. 216, pp. 48–68, 2007.

A. Abi-Haidar and L. M. Rocha, “Collective Classification of Textual Documents by Guided Self-Organization in T-Cell Cross-Regulation Dynamics,” Evolutionary Intelligence, p. In press, 2011.