Not Just a Black Box: Interpretable Deep Learning for ...forum.stanford.edu/events/posterslides/NotJustaBlackBox... · Not Just a Black Box: Interpretable Deep Learning for Genomics

PosterPrintSize:Thispostertemplateis36”highby48”wide.Itcanbeusedtoprintanyposterwitha3:4aspectraAo.

Placeholders:ThevariouselementsincludedinthisposterareonesweoCenseeinmedical,research,andscienAficposters.Feelfreetoedit,move,add,anddeleteitems,orchangethelayouttosuityourneeds.Alwayscheckwithyourconferenceorganizerforspecificrequirements.

ImageQuality:YoucanplacedigitalphotosorlogoartinyourposterfilebyselecAngtheInsert,Picturecommand,orbyusingstandardcopy&paste.Forbestresults,allgraphicelementsshouldbeatleast150-200pixelsperinchintheirfinalprintedsize.Forinstance,a1600x1200pixelphotowillusuallylookfineupto8“-10”wideonyourprintedposter.Topreviewtheprintqualityofimages,selectamagnificaAonof100%whenpreviewingyourposter.Thiswillgiveyouagoodideaofwhatitwilllooklikeinprint.Ifyouarelayingoutalargeposterandusinghalf-scaledimensions,besuretopreviewyourgraphicsat200%toseethemattheirfinalprintedsize.Pleasenotethatgraphicsfromwebsites(suchasthelogoonyourhospital'soruniversity'shomepage)willonlybe72dpiandnotsuitableforprinAng.

[Thissidebarareadoesnotprint.]

ChangeColorTheme:Thistemplateisdesignedtousethebuilt-incolorthemesinthenewerversionsofPowerPoint.Tochangethecolortheme,selecttheDesigntab,thenselecttheColorsdrop-downlist.Thedefaultcolorthemeforthistemplateis“Office”,soyoucanalwaysreturntothataCertryingsomeofthealternaAves.

PrinAngYourPoster:Onceyourposterfileisready,visitwww.genigraphics.comtoorderahigh-quality,affordableposterprint.EveryorderreceivesafreedesignreviewandwecandeliverasfastasnextbusinessdaywithintheUSandCanada.Genigraphics®hasbeenproducingoutputfromPowerPoint®longerthananyoneintheindustry;daAngbacktowhenwehelpedMicrosoC®designthePowerPoint®soCware.USandCanada:1-800-790-4001Email:[email protected]

[Thissidebarareadoesnotprint.]

Not Just a Black Box: Interpretable Deep Learning for Genomics AvanAShrikumar1,PeytonGreenside2,AnshulKundaje1,3

1StanfordComputerScience,2StanfordDept.ofBiomedicalInformaAcs,3StanfordGeneAcs

•  Novelalgorithm(DeepLIFT)forexplainingpredicAonsofagivendeeplearningmodelforparAcularinputexamples

•  Novelalgorithm(MoDISco)forextracAngrecurringpamerns(moAfdiscovery)usingadeeplearningmodel

OurcontribuCons

Method:DeepLIFT(DeepLearningImportantFeatures)

1.  Alipanahi,B.,Delong,A.,Weirauch,M.,&Frey,B.(2015).PredicAngthesequencespecificiAesofDNA-andRNA-bindingproteinsbydeeplearning.NatBiotechnol2.  ZhouJ,TroyanskayaO.PredicAngeffectsofnoncodingvariantswithdeeplearning–basedsequencemodel.NatureMethods.20153.  Kelley,D.,Snoek,J.,&Rinn,J.(2015).Basset:LearningtheregulatorycodeoftheaccessiblegenomewithdeepconvoluAonalneuralnetworks.doi:10.1101/0283994.  HeinzS,e.(2016).SimplecombinaAonsoflineage-determiningtranscripAonfactorsprimecis-regulatoryelementsrequiredformacrophageandBcellidenAAes.

5.LimLS,e.(2016).ThepluripotencyregulatorZic3isadirectacAvatoroftheNanogpromoterinESCs.6.GagliardiA,e.(2016).AdirectphysicalinteracAonbetweenNanogandSox2regulatesembryonicstemcellself-renewal.-PubMed-NCBI.Ncbi.nlm.nih.gov.Retrieved30January2016,fromhmps://www.ncbi.nlm.nih.gov/pubmed/238924567.Kheradpour,P.,&Kellis,M.(2014).SystemaAcdiscoveryandcharacterizaAonofregulatorymoAfsinENCODETFbindingexperiments.Nucleicacidsresearch

ResultsoflogisCcregressionmodeltrainedtopredictNanogbindingusingthetop3moCfhits,permoCf,perregion

VisualizingindividualpaQern-detectors:DeepBind(Alipanahietal.)

SuperiormoCfdiscoveryforNanogPosiCveset:5,473reproducibleNanogpeaksinH1-ESCfromENCODENegaCveset:258,987H1ESCDNase-seqpeaks

Method:MoDISco(MoCfDiscoveryfromImportanceScores)

i1=0 i2=0

h1=max(0,i1+2i2+1)=1

h2=max(0,i1+2i2-1)=0

y=h1+h2=1

i1=-1 i2=-1

h1=max(0,i1+2i2+1)=0

h2=max(0,i1+2i2-1)=0

y=h1+h2=0

Computebehaviourunder“reference” Usedifferencefromreferencetofindimportancescores

Gradientsassignimportanceof0tobothinputsinlaQercase,asgradientofh1andh2are0.Usingdifference-from-reference,weseeh1is-1belowitsreferencevalue;DeepLIFTassignsanimportanceof-1/3toi1and-2/3toi2

Gata(Rev.Comp.)Gata SPI1Gata(Rev.Comp.)

B-cells

Gata1ChIP-seqpeak SPI1ChIP-seqpeak

NoSPI1peakNoGata1ChIP-seqpeak

Erythroid

Revealcontext-specificuseofregulatorysequence

Results(DeepLIFT)

Modelarchitectureoverview

C G A T A A C C G A T A T

LearnedpaQerndetectors

Input:DNAsequencerepresentedasonesandzeros

LaterlayersbuildonpaQernsofpreviouslayer

AccessibleinErythroid

AccessibleinB-cells

Output:Accessible(+1)vsnotaccessible(0)

“Fullyconnected”layersincorporateallinfotogether

ACGT

0100

0010

1000

0001

1000

1000

0100

0100

0010

1000

0001

1000

0001

Computervision

All5ENCODENanogmoCfs

CanonicalHOMERmatchesto4

MoDISComoCfs

All32de-novoHOMERmoCfs

Top4de-novoHOMERmoCfs

All4MoDIScomoCfs

LogisCcRe

gression

auR

OC

0.0

1.0

(a)Obtainper-baseimportancescoresusing

DeepLIFT

(b)Segmentto“seqlets”ofhigh

importance

(c)“Autocomplete”seqletsusingDeepLIFT

informaCon

(d)Computedistancesbetweenpairsofseqletsviacross-correlaCon

(e)Clusterseqletsusingpairwisedistances

(f)Aggregateclusters

•  RegulatorysequenceinvolvescomplexhierarchicalpamernsthataredifficultforexisAngcomputaAonalmethodstomodel

•  DeepLearningtechniquesshowgreatpromiseinthisarea[1-3]butareconsidereduninterpretable“BlackBoxes”,limiAngtheirusefulnessformakingbiologicaldiscoveries

MoCvaCon

•  Goal:learnkeyregulatorysequencesgoverninghematopoesis•  Approach:

(1)ExperimentallyidenAfybiochemicallyacAveregionsindifferentcell-linesduringthehematopoiesislineage(2)TraindeeplearningmodeltopredictacAvityfromseq.(3)Interpretthemodeltolearnkeyregulatorysequences

Exampleproblem

PeytonGreenside

4/5ENCODENanogmoCfs

CorrespondingMoDIScomoCf

Zic3

Sox2 Oct4-Sox2-Nanog

Nanog

Results(MoDISco)

4MoCfclustersidenCfiedbyMoDISco:

++and--orientaAon

+-and-+orientaAon

Zic3andNanogseparaAon:

Oct-Sox-NanogandNanogseparaAon:

ShuffledZic3andNanogseparaAon:

Co-bindingbetweenZic3andNanog? FusionmoCffromsubclustering:

Individualexamples:

Protein-proteininteracCon:

original scores:8 scores:3 masked,8->3 scores:6 masked,8->6

|grad|

(sim

onyan)

Guide

dBa

ckprop

grad

ient*

inpu

tintegrated

grad

s-10

De

epLIFT-

RevealCa

ncel

Proof-of-concept:morphingan“8”toa3ora6DeeplearningmodelistrainedtorecognizehandwriQendigitsfromtheMNISTdatabase.Pixelsarerankedbydifferenceofimportancefororiginalclass(eg:8)andtargetclass(eg:3or6)bydifferentmethods.Upto20%ofpixelsmoreimportanttooriginalclassthantargetclasserased.

i1 i2 y

i1 i2<i1 i1–(i1-i2)=i2

i1 i2>i1 i1–0=i1

i1 i2

y=i1–h2

h1=i1-i21

-1

1 -1

y=min(i1,i2)àgradient0foreitheri1ori2

h2=max(0,h1)

y=i1–max(0,i1–i2)=min(i1,i2)

-6

y=i1-max(0,i1–i2)=10–max(0,4)=6

Standardbreakdown:4=(10fromi1)+(-6fromi2)

max(0,i1-i2)

i1-i2i1=10

i2=6

+10

Otherpossiblebreakdown:4=(4fromi1)+(0fromi2)

max(0,i1-i2)

i1-i2

i1=10

i2=6

4

0

Standardbreakdown:y=(10fromi1)–[(10fromi1)–(6fromi2)]=6fromi2Averageoverbothorders:y=(10fromi1)–[(7fromi1)+(-3fromi2)]=(3fromi1)+(3fromi2)

Average:4=(7fromi1)+(-3fromi2)

i1-i2

Consideri1=10,i2=6

ByconsideringdifferentordersforposiCveandnegaCveterms,canalsoimproveassignmentofimportancescores:

“AND”/minoperaCon:

Documents

Not Just a Black Box: Interpretable Deep Learning for ...forum.stanford.edu/events/posterslides/NotJustaBlackBox... · Not Just a Black Box: Interpretable Deep Learning for Genomics