Download pdf - Convolutional and LSTM Neural Networks · 2016-01-12 · Murphy "Janeway’s Immunobiology " 8th edition g 4.21 Garland Science Why is this a hard problem? I binding peptides have

Convolutional and LSTM Neural Networks

Vanessa Jurtz

January 12, 2016

Contents

I Neural networks and GPUs

I Lasagne

I Peptide binding to MHC class II molecules

I Convolutional Neural Networks (CNN)

I Recurrent and LSTM Neural Networks

I Hyper-parameter optimization

I Training CNN and LSTM ensembles

Neural networks and GPUs

I A CPU (central processing unit)has few cores with lots of cachememory

I A GPU (graphics processing unit)has hundreds of cores but eachwith little memory

I GPUs are very good for lots ofsmall operations that can beheavily parallelized, like matrixmultiplication

I training a neural network requireslots of matrix multiplications!

Lasagne

Why use Lasagne?

I python

I it is highly flexible as you will see in the afternoon exercise

I code can be run on a CPU or GPU

I my collaborators are using it too

Lasagne Documentation

http://lasagne.readthedocs.org

Peptide binding to MHC class II molecules

Figure adapted from http://events.embo.org/12-antigen/

I MHC II only expressed inprofessional antigenpresenting cells (dentriticcells, macrophages, B cellsetc)

I only a small fraction ofpeptides bind MHC II andare potential T cellepitopes

I polygenic and highlypolymorphic

http://events.embo.org/12-antigen/

Peptides binding to MHCII molecules

Nielsen et al. 2010 ”MHC Class II epitope predictivealgorithms” Immunology

Peptides binding HLA-DR3 adapted from KennethMurphy ”Janeway’s Immunobiology ” 8th edition fig4.21 Garland Science

Why is this a hard problem?

I binding peptides have different lengths

I the binding core can lie anywhere within a peptide

I large number of different MHC class II molecules

CNN - Convolutional Neural Networks

I convolutional filter hasspecified input size

I slides over input sequencefeeding into a differenthidden neuron in eachposition

I weights of the filter arefixed as it slides over inputand updated inbackpropagation

I usually multiple filters ofthe same and/or differentsize are used


Why could this be useful forpredicting peptide binding toMHC class II?

I use filters of size 9 AA tofocus on binding core

I integrate informationabout all 9-mers


I CNNs are known to work well for image analysis

I CIFAR example

http://deeplearning.net/tutorial/lenet.html

http://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html


A biological example: subcellular localization of proteins

Sønderby et al. 2015 ”Convolutional LSTM Networks for Subcellular Localization of Proteins”http://arxiv.org/abs/1503.01919

http://arxiv.org/abs/1503.01919

Recurrent Neural Networks

I have connections betweenthe neurons in the hiddenlayer

I have a memory ”learnwhat to store and what toignore” (Graves, 2015)

Graves 2015 ”Supervised Sequence Labelling with Recurrent Neural Networks”

LSTM memory blocks

Instead of a hidden neuron we use an LSTM memory block:


LSTM Networks

Why make it more complicated?


LSTM memory blocks allow tostore and access informationover longer periods of time

LSTM Networks

Input and output can be of variable length!

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Examplepeptide MHCclass II binding:many to one


LSTM Networks

I LSTM networks are known to work well for natural languageprocessing

I check out Andrej Karpathys blog

I to read up: Graves 2015


http://www.cs.toronto.edu/%7Egraves/preprint.pdf

LSTM Networks

Another biological application: protein secondary structure prediction

http://chemistry.berea.edu/~biochemistry/

Anthony/Aequorin/Aequorin3.php Sønderby et al. 2015 ”Convolutional LSTM Networksfor Subcellular Localization of Proteins”http://arxiv.org/abs/1503.01919

http://chemistry.berea.edu/~biochemistry/Anthony/Aequorin/Aequorin3.php

http://chemistry.berea.edu/~biochemistry/Anthony/Aequorin/Aequorin3.php

http://arxiv.org/abs/1503.01919

Hyper-parameter optimization

Parameters:

I weights

Hyper-parameters:

I weight initialization

I weight update

I activation function

I data encoding

I dropout

I learning rate

I number of CNN filters / number of LSTM memory blocks

I hidden layer size

Bergstra et al 2012 ”Random Search for Hyper-ParameterOptimization” Journal of Machine Learning Research

Hyper-parameter optimization- architectures

CNN:

Peptide

Convolutional layer MHC II pseudo sequence

Fully connected layer

Output

filter size varied (9 AA,9+3 AA)

LSTM:

Hyper-parameter optimization- results

CNN:

1 2 4 6 8 10 20 50 70 100number of trials

0.4

0.5

0.6

0.7

0.8

0.9

AU

C

Best performance per number of trials

LSTM:

1 2 4 6 8 10 20 50 100 150 200number of trials

0.4

0.5

0.6

0.7

0.8

0.9

AU

C

Best performance per number of trials

single network performance (10% best nets for CNN +LSTM, entire ensemble of NetMHCIIpan-3.0)

CNN LSTM NetMHCIIpan-3.00.40

0.45

0.50

0.55

0.60

0.65

0.70

0.75

PC

C

validationevaluation

Training CNN and LSTM ensembles

CNN ensemble of 10 best hyper parameter settings + 5 seeds each

Training CNN and LSTM ensembles

LSTM ensemble of 2 best hyper parameter settings + 2 seeds each

Future work

I combine CNN and LSTM in one network

I make an ensemble of different network architectures: CNN,LSTM, feed forward

I try to visualize what the networks learn

I try to find a way to extract/visualize the binding core

Aknowledgements

I Casper Sønderby

I Søren Sønderby

I Ole Winther

I Morten Nielsen