Work in Progress Presented by Ian Wood to CASCI, 4/24/13 Advisor: Luis Rocha

A T-Cell Cross-Regulation Binary Classifier

Work in ProgressPresented by Ian Woodto CASCI, 4/24/13Advisor: Luis Rocha

Overview

Motivation and Previous Work Implementation Preliminary Results Future Directions

Previous Work: Carneiro et al.

Image From: J. Carneiro, et al., “When three is not a crowd: a Crossregulation model of the dynamics and repertoire selection of regulatory CD4+ T cells.,” Immunological Reviews, vol. 216, pp. 48–68, 2007.




𝑑𝐸𝑑𝑡

=𝑝𝐸 𝐸𝐴−𝑑𝐸𝐸

𝑑𝑅𝑑𝑡

=𝑝𝑅 𝑅𝐴−𝑑𝑅𝑅


Previous Work: Alaa Abi-Haidar

Image From: A. Abi-Haidar and L. M. Rocha, “Collective Classification of Textual Documents by Guided Self-Organization in T-Cell Cross-Regulation Dynamics,” Evolutionary Intelligence, p. In press, 2011.



• Red: Balanced training with cell death

• Green: Positive-only training with cell death

• Blue: Balanced training with cell death

• Yellow: Positive-only training with cell death



Red: Training and testing documents ordered by timeGreen: reinforced biasBlue: Documents out of order


My Implementation

Benefits/Limitations

Benefits C/C++ executes fast Individual objects can be tracked Interface structure allows

implementations of parts to be swapped Can take advantage of hardware

Limitations Development time is slow System is not easily ported

Implementation Misc.

Preprocessing Porter-Stemming Removed Stop Words:

http://www.ranks.nl/resources/stopwords.html

Classification normalization

Aside: GPU processing

Good for performing the same process across many elements in parallel

Could allow for efficient binding of Tcells without exact matching

Images From: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

Binding Step in CUDA

Performance Comparison Specs: NVIDIA GeForce GTX 560 Ti running on

Windows 7 384 CUDA cores 1 GB dedicated memory, 4 GB available

Average binding time per document of the inefficient sequential version: 128.5766 seconds

Average binding time per document of the CUDA parallel version: 1.650175 seconds

Average binding time per document of the mapped sequential version: 0.061443669

Speedup1 = sequential/parallel = 77.92 Speedup2 = sequential/mappedsequential= 2093

Performance Comparison Cont.

0 100000 200000 300000 400000 500000 600000 7000000

50

100

150

200

250

300

Binding Time

SequentialParallel

Number of TCells

Time (seconds)

Performance Comparison cont.

0 100000 200000 300000 400000 500000 600000 7000000

0.5

1

1.5

2

2.5

3

3.5

Binding Time

Mapped SequentialParallel

Number of TCells

Time (seconds)

Preliminary Results

Most of my work thus far is replication of Al’s work


Fold Generation

Mixed vs Separated Mixed – Training and Testing set are randomly

interleaved. The First 10% (12) documents are labeled

Separated – Testing set is presented after training

Ordered vs Shuffled Ordered – Documents are ordered by month of

publication (random within month) Shuffled – Documents are randomly shuffled

without regard to date of publication

Performance Measures

𝑎𝑐𝑐=𝑇𝑃+𝑇𝑁

𝑇𝑃+𝐹𝑃+𝑇𝑁+𝐹𝑁 𝑀𝐶𝐶=𝑇𝑃 .𝑇𝑁 −𝐹𝑃 .𝐹𝑁

√(𝑇𝑃+𝐹𝑃 )(𝑇𝑃+𝐹𝑁 )(𝑇𝑁+𝐹𝑃)(𝑇𝑁+𝐹𝑁 )

𝐹 1=2.𝑇𝑃

2𝑇𝑃+𝐹𝑃+𝐹𝑁𝑃𝑟𝑒𝑐=

𝑇𝑃𝑇𝑃+𝐹𝑃

𝑅𝑒𝑐𝑎𝑙𝑙=𝑇𝑃

𝑇𝑃+𝐹𝑁

Equations Source: Luis Rocha’s IARPA presentation

Top Configurations Found So Far

nslot

eself

rself

enself

rnself

eunlab runlab

edrate

rdrate

cond

condi

precision

accuracy

recall mcc f1 tpos

tneg

fpos

fneg

12 3 9 8 2 6 3 1 1 2 1 0.74 0.80.9

30.620.8

2 28 20 10 2

13 3 9 8 2 5 3 1 1 2 1 0.95 0.78 0.60.610.7

3 18 29 1 12

12 3 9 8 2 5 3 2 2 2 1 0.84 0.78 0.70.570.7

6 21 26 4 9

20 8 12 12 8 8 8 25 25 2 2 1 0.570.1

3 0.270.2

4 4 30 0 26

20 12 24 12 10 12 10 2 2 1 2 0.58 0.63 1 0.390.7

3 30 8 22 0

20 8 12 12 8 8 8 25 25 5 2 0.58 0.63 1 0.390.7

3 30 8 22 0

Top Words

Top Effector Words Top Regulator Words Top Weighted Effector Words Top Weighted Regulator Wordshigh sepharose high sepharoseprotein affiniti indicateng affiniticonditions hybrid low hybridlow pcr conditions pcrindicateng change various changevarious immunoprecipitated protein immunoprecipitatedshown identifi mani identifititle pull shown pullresults librari title interactionalso fragment results interactskeyword interacts however librarionli interaction onli fragmentfig ha also clonedfigure ecl keyword identificateoncell identificateon figure hahowever suppresses cell transcriptionalkeywords deletion fig deletionintroduction lysate keywords beadsbodi demonstrated introduction lysateabstract neither see member

Trends in Conditions

Trends in Configurations

Best Distinguishing Features over Time

nslot

eself

rself

enself

rnself

eunself

runself

edrate

rdrate

cond

condi

precision

accuracy

recall mcc f1 tpos

tneg

fpos

fneg

12 3 9 8 2 6 3 1 1 2 1 0.74 0.80.9

30.620.8

2 28 20 10 2

Most Deviating Features over Time

nslot

eself

rself

enself

rnself

eunself

runself

edrate

rdrate

cond

condi

precision

accuracy

recall mcc f1 tpos

tneg

fpos

fneg

12 3 9 8 2 6 3 1 1 2 1 0.74 0.80.9

30.620.8

2 28 20 10 2

Most Deviating Features over Time cont.

nslot

eself

rself

enself

rnself

eunself

runself

edrate

rdrate

cond

condi

precision

accuracy

recall mcc f1 tpos

tneg

fpos

fneg

14 3 8 3 7 3 7 1 2 5 1 0 0.38 0 -0.36 -1 0 23 7 30

20 4 6 6 4 4 4 1 1 6 1 0 0.5 0 0 0 0 30 0 30

Distribution of TCells

nslot

eself

rself

enself

rnself

eunself

runself

edrate

rdrate

cond

condi

precision

accuracy

recall mcc f1 tpos

tneg

fpos

fneg

12 3 9 8 2 6 3 1 1 2 1 0.74 0.80.9

30.620.8

2 28 20 10 2

Distribution of Tcells cont.

nslot

eself

rself

enself

rnself

eunself

runself

edrate

rdrate

cond

condi

precision

accuracy

recall mcc f1 tpos

tneg

fpos

fneg

14 3 8 3 7 3 7 1 2 5 1 0 0.38 0 -0.36 -1 0 23 7 30

20 4 6 6 4 4 4 1 1 6 1 0 0.5 0 0 0 0 30 0 30

Distribution of Tcells cont.

nslot

eself

rself

enself

rnself

eunself

runself

edrate

rdrate

cond

condi

precision

accuracy

recall mcc f1 tpos

tneg

fpos

fneg

10 6 12 6 5 6 5 1 1 1 2 0.67 0.7 0.8 0.41 0.73 24 18 12 6

10 6 12 6 5 6 5 1 1 2 1 0.5 0.5 1 0 0.67 30 0 30 0

Future Directions

Sooner Exhaustive parameter search Effect of multiple iterations on

distributions Characterizing sensitivity

(analytically/artificial data) Other datasets and comparisons to other

classifiers

Future Directions cont.

Later Proximity on APC based on proximity in

text Bi-gram features Other binding function▪ Substring▪ Sequence comparison binding

References

J. Carneiro, et al., “When three is not a crowd: a Crossregulation model of the dynamics and repertoire selection of regulatory CD4+ T cells.,” Immunological Reviews, vol. 216, pp. 48–68, 2007.

A. Abi-Haidar and L. M. Rocha, “Collective Classification of Textual Documents by Guided Self-Organization in T-Cell Cross-Regulation Dynamics,” Evolutionary Intelligence, p. In press, 2011.