Upload
milo-tracy-marshall
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
A T-Cell Cross-Regulation Binary Classifier
Work in ProgressPresented by Ian Woodto CASCI, 4/24/13Advisor: Luis Rocha
Overview
Motivation and Previous Work Implementation Preliminary Results Future Directions
Previous Work: Carneiro et al.
Image From: J. Carneiro, et al., “When three is not a crowd: a Crossregulation model of the dynamics and repertoire selection of regulatory CD4+ T cells.,” Immunological Reviews, vol. 216, pp. 48–68, 2007.
Previous Work: Carneiro et al.
Image From: J. Carneiro, et al., “When three is not a crowd: a Crossregulation model of the dynamics and repertoire selection of regulatory CD4+ T cells.,” Immunological Reviews, vol. 216, pp. 48–68, 2007.
Previous Work: Carneiro et al.
𝑑𝐸𝑑𝑡
=𝑝𝐸 𝐸𝐴−𝑑𝐸𝐸
𝑑𝑅𝑑𝑡
=𝑝𝑅 𝑅𝐴−𝑑𝑅𝑅
Image From: J. Carneiro, et al., “When three is not a crowd: a Crossregulation model of the dynamics and repertoire selection of regulatory CD4+ T cells.,” Immunological Reviews, vol. 216, pp. 48–68, 2007.
Previous Work: Alaa Abi-Haidar
Image From: A. Abi-Haidar and L. M. Rocha, “Collective Classification of Textual Documents by Guided Self-Organization in T-Cell Cross-Regulation Dynamics,” Evolutionary Intelligence, p. In press, 2011.
Previous Work: Alaa Abi-Haidar
Image From: A. Abi-Haidar and L. M. Rocha, “Collective Classification of Textual Documents by Guided Self-Organization in T-Cell Cross-Regulation Dynamics,” Evolutionary Intelligence, p. In press, 2011.
• Red: Balanced training with cell death
• Green: Positive-only training with cell death
• Blue: Balanced training with cell death
• Yellow: Positive-only training with cell death
Previous Work: Alaa Abi-Haidar
Image From: A. Abi-Haidar and L. M. Rocha, “Collective Classification of Textual Documents by Guided Self-Organization in T-Cell Cross-Regulation Dynamics,” Evolutionary Intelligence, p. In press, 2011.
Red: Training and testing documents ordered by timeGreen: reinforced biasBlue: Documents out of order
Previous Work: Alaa Abi-Haidar
My Implementation
Benefits/Limitations
Benefits C/C++ executes fast Individual objects can be tracked Interface structure allows
implementations of parts to be swapped Can take advantage of hardware
Limitations Development time is slow System is not easily ported
Implementation Misc.
Preprocessing Porter-Stemming Removed Stop Words:
http://www.ranks.nl/resources/stopwords.html
Classification normalization
Aside: GPU processing
Good for performing the same process across many elements in parallel
Could allow for efficient binding of Tcells without exact matching
Images From: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
Binding Step in CUDA
Performance Comparison Specs: NVIDIA GeForce GTX 560 Ti running on
Windows 7 384 CUDA cores 1 GB dedicated memory, 4 GB available
Average binding time per document of the inefficient sequential version: 128.5766 seconds
Average binding time per document of the CUDA parallel version: 1.650175 seconds
Average binding time per document of the mapped sequential version: 0.061443669
Speedup1 = sequential/parallel = 77.92 Speedup2 = sequential/mappedsequential= 2093
Performance Comparison Cont.
0 100000 200000 300000 400000 500000 600000 7000000
50
100
150
200
250
300
Binding Time
SequentialParallel
Number of TCells
Time (seconds)
Performance Comparison cont.
0 100000 200000 300000 400000 500000 600000 7000000
0.5
1
1.5
2
2.5
3
3.5
Binding Time
Mapped SequentialParallel
Number of TCells
Time (seconds)
Preliminary Results
Most of my work thus far is replication of Al’s work
Image From: A. Abi-Haidar and L. M. Rocha, “Collective Classification of Textual Documents by Guided Self-Organization in T-Cell Cross-Regulation Dynamics,” Evolutionary Intelligence, p. In press, 2011.
Fold Generation
Mixed vs Separated Mixed – Training and Testing set are randomly
interleaved. The First 10% (12) documents are labeled
Separated – Testing set is presented after training
Ordered vs Shuffled Ordered – Documents are ordered by month of
publication (random within month) Shuffled – Documents are randomly shuffled
without regard to date of publication
Performance Measures
𝑎𝑐𝑐=𝑇𝑃+𝑇𝑁
𝑇𝑃+𝐹𝑃+𝑇𝑁+𝐹𝑁 𝑀𝐶𝐶=𝑇𝑃 .𝑇𝑁 −𝐹𝑃 .𝐹𝑁
√(𝑇𝑃+𝐹𝑃 )(𝑇𝑃+𝐹𝑁 )(𝑇𝑁+𝐹𝑃)(𝑇𝑁+𝐹𝑁 )
𝐹 1=2.𝑇𝑃
2𝑇𝑃+𝐹𝑃+𝐹𝑁𝑃𝑟𝑒𝑐=
𝑇𝑃𝑇𝑃+𝐹𝑃
𝑅𝑒𝑐𝑎𝑙𝑙=𝑇𝑃
𝑇𝑃+𝐹𝑁
Equations Source: Luis Rocha’s IARPA presentation
Top Configurations Found So Far
nslot
eself
rself
enself
rnself
eunlab runlab
edrate
rdrate
cond
condi
precision
accuracy
recall mcc f1 tpos
tneg
fpos
fneg
12 3 9 8 2 6 3 1 1 2 1 0.74 0.80.9
30.620.8
2 28 20 10 2
13 3 9 8 2 5 3 1 1 2 1 0.95 0.78 0.60.610.7
3 18 29 1 12
12 3 9 8 2 5 3 2 2 2 1 0.84 0.78 0.70.570.7
6 21 26 4 9
20 8 12 12 8 8 8 25 25 2 2 1 0.570.1
3 0.270.2
4 4 30 0 26
20 12 24 12 10 12 10 2 2 1 2 0.58 0.63 1 0.390.7
3 30 8 22 0
20 8 12 12 8 8 8 25 25 5 2 0.58 0.63 1 0.390.7
3 30 8 22 0
Top Words
Top Effector Words Top Regulator Words Top Weighted Effector Words Top Weighted Regulator Wordshigh sepharose high sepharoseprotein affiniti indicateng affiniticonditions hybrid low hybridlow pcr conditions pcrindicateng change various changevarious immunoprecipitated protein immunoprecipitatedshown identifi mani identifititle pull shown pullresults librari title interactionalso fragment results interactskeyword interacts however librarionli interaction onli fragmentfig ha also clonedfigure ecl keyword identificateoncell identificateon figure hahowever suppresses cell transcriptionalkeywords deletion fig deletionintroduction lysate keywords beadsbodi demonstrated introduction lysateabstract neither see member
Trends in Conditions
Trends in Configurations
Best Distinguishing Features over Time
nslot
eself
rself
enself
rnself
eunself
runself
edrate
rdrate
cond
condi
precision
accuracy
recall mcc f1 tpos
tneg
fpos
fneg
12 3 9 8 2 6 3 1 1 2 1 0.74 0.80.9
30.620.8
2 28 20 10 2
Most Deviating Features over Time
nslot
eself
rself
enself
rnself
eunself
runself
edrate
rdrate
cond
condi
precision
accuracy
recall mcc f1 tpos
tneg
fpos
fneg
12 3 9 8 2 6 3 1 1 2 1 0.74 0.80.9
30.620.8
2 28 20 10 2
Most Deviating Features over Time cont.
nslot
eself
rself
enself
rnself
eunself
runself
edrate
rdrate
cond
condi
precision
accuracy
recall mcc f1 tpos
tneg
fpos
fneg
14 3 8 3 7 3 7 1 2 5 1 0 0.38 0 -0.36 -1 0 23 7 30
20 4 6 6 4 4 4 1 1 6 1 0 0.5 0 0 0 0 30 0 30
Distribution of TCells
nslot
eself
rself
enself
rnself
eunself
runself
edrate
rdrate
cond
condi
precision
accuracy
recall mcc f1 tpos
tneg
fpos
fneg
12 3 9 8 2 6 3 1 1 2 1 0.74 0.80.9
30.620.8
2 28 20 10 2
Distribution of Tcells cont.
nslot
eself
rself
enself
rnself
eunself
runself
edrate
rdrate
cond
condi
precision
accuracy
recall mcc f1 tpos
tneg
fpos
fneg
14 3 8 3 7 3 7 1 2 5 1 0 0.38 0 -0.36 -1 0 23 7 30
20 4 6 6 4 4 4 1 1 6 1 0 0.5 0 0 0 0 30 0 30
Distribution of Tcells cont.
nslot
eself
rself
enself
rnself
eunself
runself
edrate
rdrate
cond
condi
precision
accuracy
recall mcc f1 tpos
tneg
fpos
fneg
10 6 12 6 5 6 5 1 1 1 2 0.67 0.7 0.8 0.41 0.73 24 18 12 6
10 6 12 6 5 6 5 1 1 2 1 0.5 0.5 1 0 0.67 30 0 30 0
Future Directions
Sooner Exhaustive parameter search Effect of multiple iterations on
distributions Characterizing sensitivity
(analytically/artificial data) Other datasets and comparisons to other
classifiers
Future Directions cont.
Later Proximity on APC based on proximity in
text Bi-gram features Other binding function▪ Substring▪ Sequence comparison binding
References
J. Carneiro, et al., “When three is not a crowd: a Crossregulation model of the dynamics and repertoire selection of regulatory CD4+ T cells.,” Immunological Reviews, vol. 216, pp. 48–68, 2007.
A. Abi-Haidar and L. M. Rocha, “Collective Classification of Textual Documents by Guided Self-Organization in T-Cell Cross-Regulation Dynamics,” Evolutionary Intelligence, p. In press, 2011.