Convolutional and LSTM Neural Networks
Vanessa Jurtz
January 12, 2016
Contents
I Neural networks and GPUs
I Lasagne
I Peptide binding to MHC class II molecules
I Convolutional Neural Networks (CNN)
I Recurrent and LSTM Neural Networks
I Hyper-parameter optimization
I Training CNN and LSTM ensembles
Neural networks and GPUs
I A CPU (central processing unit)has few cores with lots of cachememory
I A GPU (graphics processing unit)has hundreds of cores but eachwith little memory
I GPUs are very good for lots ofsmall operations that can beheavily parallelized, like matrixmultiplication
I training a neural network requireslots of matrix multiplications!
Lasagne
Why use Lasagne?
I python
I it is highly flexible as you will see in the afternoon exercise
I code can be run on a CPU or GPU
I my collaborators are using it too
Lasagne Documentation
Peptide binding to MHC class II molecules
Figure adapted from http://events.embo.org/12-antigen/
I MHC II only expressed inprofessional antigenpresenting cells (dentriticcells, macrophages, B cellsetc)
I only a small fraction ofpeptides bind MHC II andare potential T cellepitopes
I polygenic and highlypolymorphic
Peptides binding to MHCII molecules
Nielsen et al. 2010 ”MHC Class II epitope predictivealgorithms” Immunology
Peptides binding HLA-DR3 adapted from KennethMurphy ”Janeway’s Immunobiology ” 8th edition fig4.21 Garland Science
Why is this a hard problem?
I binding peptides have different lengths
I the binding core can lie anywhere within a peptide
I large number of different MHC class II molecules
CNN - Convolutional Neural Networks
I convolutional filter hasspecified input size
I slides over input sequencefeeding into a differenthidden neuron in eachposition
I weights of the filter arefixed as it slides over inputand updated inbackpropagation
I usually multiple filters ofthe same and/or differentsize are used
CNN - Convolutional Neural Networks
Why could this be useful forpredicting peptide binding toMHC class II?
I use filters of size 9 AA tofocus on binding core
I integrate informationabout all 9-mers
CNN - Convolutional Neural Networks
I CNNs are known to work well for image analysis
I CIFAR example
http://deeplearning.net/tutorial/lenet.html
CNN - Convolutional Neural Networks
A biological example: subcellular localization of proteins
Sønderby et al. 2015 ”Convolutional LSTM Networks for Subcellular Localization of Proteins”http://arxiv.org/abs/1503.01919
Recurrent Neural Networks
I have connections betweenthe neurons in the hiddenlayer
I have a memory ”learnwhat to store and what toignore” (Graves, 2015)
Graves 2015 ”Supervised Sequence Labelling with Recurrent Neural Networks”
LSTM memory blocks
Instead of a hidden neuron we use an LSTM memory block:
Graves 2015 ”Supervised Sequence Labelling with Recurrent Neural Networks”
LSTM Networks
Why make it more complicated?
Graves 2015 ”Supervised Sequence Labelling with Recurrent Neural Networks”
LSTM memory blocks allow tostore and access informationover longer periods of time
LSTM Networks
Input and output can be of variable length!
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Examplepeptide MHCclass II binding:many to one
LSTM Networks
I LSTM networks are known to work well for natural languageprocessing
I check out Andrej Karpathys blog
I to read up: Graves 2015
LSTM Networks
Another biological application: protein secondary structure prediction
http://chemistry.berea.edu/~biochemistry/
Anthony/Aequorin/Aequorin3.php Sønderby et al. 2015 ”Convolutional LSTM Networksfor Subcellular Localization of Proteins”http://arxiv.org/abs/1503.01919
Hyper-parameter optimization
Parameters:
I weights
Hyper-parameters:
I weight initialization
I weight update
I activation function
I data encoding
I dropout
I learning rate
I number of CNN filters / number of LSTM memory blocks
I hidden layer size
Bergstra et al 2012 ”Random Search for Hyper-ParameterOptimization” Journal of Machine Learning Research
Hyper-parameter optimization- architectures
CNN:
Peptide
Convolutional layer MHC II pseudo sequence
Fully connected layer
Output
filter size varied (9 AA,9+3 AA)
LSTM:
Hyper-parameter optimization- results
CNN:
1 2 4 6 8 10 20 50 70 100number of trials
0.4
0.5
0.6
0.7
0.8
0.9
AU
C
Best performance per number of trials
LSTM:
1 2 4 6 8 10 20 50 100 150 200number of trials
0.4
0.5
0.6
0.7
0.8
0.9
AU
C
Best performance per number of trials
single network performance (10% best nets for CNN +LSTM, entire ensemble of NetMHCIIpan-3.0)
CNN LSTM NetMHCIIpan-3.00.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
PC
C
validationevaluation
Training CNN and LSTM ensembles
CNN ensemble of 10 best hyper parameter settings + 5 seeds each
Training CNN and LSTM ensembles
LSTM ensemble of 2 best hyper parameter settings + 2 seeds each
Future work
I combine CNN and LSTM in one network
I make an ensemble of different network architectures: CNN,LSTM, feed forward
I try to visualize what the networks learn
I try to find a way to extract/visualize the binding core
Aknowledgements
I Casper Sønderby
I Søren Sønderby
I Ole Winther
I Morten Nielsen