Upload
others
View
2
Download
0
Embed Size (px)
PACIFICSYMPOSIUMONBIOCOMPUTING2018
ABSTRACTBOOK
PosterPresenters:Posterspaceisassignedbyabstractpagenumber.Pleasefindthepagethatyourabstractisonandputyourposterontheposterboardwiththecorrespondingnumber(e.g.,ifyourabstractison
page50,putyourposteronboard#50).
Proceedingspaperswithoralpresentations#2-39arenotassignedposterspace.
Abstractsareorganizedfirstbysession,thenthelastnameofthefirstauthor.Presentingauthors’namesareunderlined.
i
TABLEOFCONTENTS
PROCEEDINGSPAPERSWITHORALPRESENTATIONAPPLICATIONSOFGENETICS,GENOMICSANDBIOINFORMATICSINDRUGDISCOVERY......................................................................................................................................................................1CHARACTERIZATIONOFDRUG-INDUCEDSPLICINGCOMPLEXITYINPROSTATECANCERCELLLINEUSINGLONGREADTECHNOLOGY........................................................................................2XintongChen,SanderHouten,KimaadaAllette,RobertP.Sebra,GustavoStolovitzky,BojanLosic
CELL-SPECIFICPREDICTIONANDAPPLICATIONOFDRUG-INDUCEDGENEEXPRESSIONPROFILES.................................................................................................................................................................3RachelHodos,PingZhang,Hao-ChihLee,QiaonanDuan,ZichenWang,NeilR.Clark,AviMa’ayan,FeiWang,BrianKidd,JianyingHu,DavidSontag,JoelT.Dudley
LARGE-SCALEINTEGRATIONOFHETEROGENEOUSPHARMACOGENOMICDATAFORIDENTIFYINGDRUGMECHANISMOFACTION.......................................................................................4YunanLuo,ShengWang,JinfengXiao,JianPeng
CHEMICALREACTIONVECTOREMBEDDINGS:TOWARDSPREDICTINGDRUGMETABOLISMINTHEHUMANGUTMICROBIOME...............................................................................5EmilyK.Mallory,AmbikaAcharya,StefanoE.Rensi,PeterJ.Turnbaugh,RoselieA.Bright,RussB.Altman
EXTRACTINGABIOLOGICALLYRELEVANTLATENTSPACEFROMCANCERTRANSCRIPTOMESWITHVARIATIONALAUTOENCODERS.............................................................6GregoryP.Way,CaseyS.Greene
CHALLENGESOFPATTERNRECOGNITIONINBIOMEDICALDATAORALPRESENTATION......................................................................................................................................7LARGE-SCALEANALYSISOFDISEASEPATHWAYSINTHEHUMANINTERACTOME...........8MonicaAgrawal,MarinkaZitnik,JureLeskovec
MAPPINGPATIENTTRAJECTORIESUSINGLONGITUDINALEXTRACTIONANDDEEPLEARNINGINTHEMIMIC-IIICRITICALCAREDATABASE................................................................9BrettK.Beaulieu-Jones,PatrykOrzechowski,JasonH.Moore
AUTOMATEDDISEASECOHORTSELECTIONUSINGWORDEMBEDDINGSFROMELECTRONICHEALTHRECORDS...............................................................................................................10BenjaminS.Glicksberg,RiccardoMiotto,KippW.Johnson,KhaderShameer,LiLi,RongChen,JoelT.Dudley
FUNCTIONALNETWORKCOMMUNITYDETECTIONCANDISAGGREGATEANDFILTERMULTIPLEUNDERLYINGPATHWAYSINENRICHMENTANALYSES.........................................11LiaX.Harrington,GregoryP.Way,JenniferA.Doherty,CaseyS.Greene
CAUSALINFERENCEONELECTRONICHEALTHRECORDSTOASSESSBLOODPRESSURETREATMENTTARGETS:ANAPPLICATIONOFTHEPARAMETRICGFORMULA..................12KippW.Johnson,BenjaminS.Glicksberg,RachelHodos,KhaderShameer,JoelT.Dudley
ii
DATA-DRIVENADVICEFORAPPLYINGMACHINELEARNINGTOBIOINFORMATICSPROBLEMS............................................................................................................................................................13RandalS.Olson,WilliamLaCava,ZairahMustahsan,AkshayVarik,JasonH.Moore
HOWPOWERFULARESUMMARY-BASEDMETHODSFORIDENTIFYINGEXPRESSION-TRAITASSOCIATIONSUNDERDIFFERENTGENETICARCHITECTURES?...............................14YogasudhaC.Veturi,MarylynD.Ritchie
DEMOCRATIZINGHEALTHDATAFORTRANSLATIONALRESEARCH................................15CLINGENCANCERSOMATICWORKINGGROUP–STANDARDIZINGANDDEMOCRATIZINGACCESSTOCANCERMOLECULARDIAGNOSTICDATATODRIVETRANSLATIONALRESEARCH......................................................................................................................16SubhaMadhavan,DeborahRitter,ChristineMicheel,ShrutiRao,AngshumoyRoy,DmitriySonkin,MatthewMcCoy,MalachiGriffith,ObiL.Griffith,PeterMcGarvey,ShashikantKulkarni,onbehalfoftheClingenSomaticWorkingGroup
AHEURISTICMETHODFORSIMULATINGOPEN-DATAOFARBITRARYCOMPLEXITYTHATCANBEUSEDTOCOMPAREANDEVALUATEMACHINELEARNINGMETHODS....17JasonH.Moore,MaksimShestov,PeterSchmitt,RandalS.Olson
BESTPRACTICESANDLESSONSLEARNEDFROMREUSEOF4PATIENT-DERIVEDMETABOLOMICSDATASETSINALZHEIMER'SDISEASE................................................................18JessicaD.Tenenbaum,ColetteBlach
IMAGINGGENOMICS...........................................................................................................................19DISCRIMINATIVEBAG-OF-CELLSFORIMAGING-GENOMICS.......................................................20BenjaminChidester,MinhN.Do,JianMa
DEEPINTEGRATIVEANALYSISFORSURVIVALPREDICTION......................................................21ChenglongHuang,AlbertZhang,GuanghuaXiao
GENOTYPE-PHENOTYPEASSOCIATIONSTUDYVIANEWMULTI-TASKLEARNINGMODEL....................................................................................................................................................................22ZhouyuanHuo,DinggangShen,HengHuang
CODONBIASAMONGSYNONYMOUSRAREVARIANTSISASSOCIATEDWITHALZHEIMER’SDISEASEIMAGINGBIOMARKER..................................................................................23JasonE.Miller,ManuK.Shivakumar,ShannonL.Risacher,AndrewJ.Saykin,SeunggeunLee,KwangsikNho,DokyoonKim
PRECISIONMEDICINE:FROMDIPLOTYPESTODISPARITIESTOWARDSIMPROVEDHEALTHANDTHERAPIES.................................................................................................................24SINGLESUBJECTTRANSCRIPTOMEANALYSISREPRODUCESSIGNEDGENESETFUNCTIONALACTIVATIONSIGNALSFROMCOHORTANALYSISOFMURINERESPONSETOHIGHFATDIET............................................................................................................................................25JoanneBerghout,QikeLi,NimaPouladi,JianrongLi,YvesA.Lussier
USINGSIMULATIONANDOPTIMIZATIONAPPROACHTOIMPROVEOUTCOMETHROUGHWARFARINPRECISIONTREATMENT...............................................................................26Chih-LinChi,LuHe,KouroshRavvaz,JohnWeissert,PeterJ.Tonellato
iii
COALITIONALGAMETHEORYASAPROMISINGAPPROACHTOIDENTIFYCANDIDATEAUTISMGENES...................................................................................................................................................27AnikaGupta,MinWooSun,KelleyM.Paskov,NateT.Stockham,Jae-YoonJung,DennisP.Wall
CONSIDERATIONSFORAUTOMATEDMACHINELEARNINGINCLINICALMETABOLICPROFILING:ALTEREDHOMOCYSTEINEPLASMACONCENTRATIONASSOCIATEDWITHMETFORMINEXPOSURE................................................................................................................................28AlenaOrlenko,JasonH.Moore,PatrykOrzechowski,RandalS.Olson,JunmeiCairns,PedroJ.Caraballo,RichardM.Weinshilboum,LieweiWang,MatthewK.Breitenstein
ADDRESSINGVITALSIGNALARMFATIGUEUSINGPERSONALIZEDALARMTHRESHOLDS......................................................................................................................................................29SarahPoole,NigamShah
EMERGENCEOFPATHWAY-LEVELCOMPOSITEBIOMARKERSFROMCONVERGINGGENESETSIGNALSOFHETEROGENEOUSTRANSCRIPTOMICRESPONSES.........................30SamirRachidZaim,QikeLi,A.GrantSchissler,YvesA.Lussier
ANALYZINGMETABOLOMICSDATAFORASSOCIATIONWITHGENOTYPESUSINGTWO-COMPONENTGAUSSIANMIXTUREDISTRIBUTIONS.......................................................................31JasonWestra,NicholasHartman,BethanyLake,GregoryShearer,NathanTintle
READINGBETWEENTHEGENES:COMPUTATIONALMODELSTODISCOVERFUNCTIONAND/ORCLINICALUTILITYFROMNONCODINGDNA..............................................................32CONVERGENTDOWNSTREAMCANDIDATEMECHANISMSOFINDEPENDENTINTERGENICPOLYMORPHISMSBETWEENCO-CLASSIFIEDDISEASESIMPLICATEEPISTASISAMONGNONCODINGELEMENTS.......................................................................................33JialiHan,JianrongLi,IkbelAchour,LorenzoPesce,IanFoster,HaiquanLi,YvesA.Lussier
NETWORKANALYSISOFPSEUDOGENE-GENERELATIONSHIPS:FROMPSEUDOGENEEVOLUTIONTOTHEIRFUNCTIONALPOTENTIALS.........................................................................34TravisS.Johnson,SihongLi,JohnathanR.Kho,KunHuang,YanZhang
LEVERAGINGPUTATIVEENHANCER-PROMOTERINTERACTIONSTOINVESTIGATETWO-WAYEPISTASISINTYPE2DIABETESGWAS...........................................................................35ElisabettaManduchi,AlessandraChesi,MollyA.Hall,StruanF.A.Grant,JasonH.Moore
TEXTMININGANDVISUALIZATIONFORPRECISIONMEDICINE..........................................36IMPROVINGPRECISIONINCONCEPTNORMALIZATION...............................................................37MaylaBoguslav,K.BretonnelCohen,WilliamA.BaumgartnerJr.,LawrenceE.Hunter
VISAGE:INTEGRATINGEXTERNALKNOWLEDGEINTOELECTRONICMEDICALRECORDVISUALIZATION.................................................................................................................................................38EdwardW.Huang,ShengWang,ChengXiangZhai
ANNOTATINGGENESETSBYMININGLARGELITERATURECOLLECTIONSWITHPROTEINNETWORKS.....................................................................................................................................39ShengWang,JianzhuMa,MichaelKuYu,FanZheng,EdwardW.Huang,JiaweiHan,JianPeng,TreyIdeker
iv
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONSAPPLICATIONSOFGENETICS,GENOMICSANDBIOINFORMATICSINDRUGDISCOVERY...................................................................................................................................................................40PREDICTIONOFPROTEIN-LIGANDINTERACTIONSFROMPAIREDPROTEINSEQUENCEMOTIFSANDLIGANDSUBSTRUCTURES................................................................................................41PeytonGreenside,MaureenHillenmeyer,AnshulKundaje
LOSS-OF-FUNCTIONOFNEUROPLASTICITY-RELATEDGENESCONFERSRISKFORHUMANNEURODEVELOPMENTALDISORDERS.................................................................................42MiloR.Smith,BenjaminS.Glicksberg,LiLi,RongChen,HirofumiMorishita,JoelT.Dudley
DIFFUSIONMAPPINGOFDRUGTARGETSONDISEASESIGNALINGNETWORKELEMENTSREVEALSDRUGCOMBINATIONSTRATEGIES.............................................................43JielinXu,KellyRegan,SiyuanDeng,WilliamE.CarsonIII,PhilipR.O.Payne,FuhaiLi
CHALLENGESOFPATTERNRECOGNITIONINBIOMEDICALDATA.....................................44OWL-NETS:TRANSFORMINGOWLREPRESENTATIONSFORIMPROVEDNETWORKINFERENCE..........................................................................................................................................................45TiffanyJ.Callahan,WilliamA.BaumgartnerJr.,MichaelBada,AdrianneL.Stefanski,IgnacioTripodi,ElizabethK.White,LawrenceE.Hunter
ANULTRA-FASTANDSCALABLEQUANTIFICATIONPIPELINEFORTRANSPOSABLEELEMENTSFROMNEXTGENERATIONSEQUENCINGDATA........................................................46Hyun-HwanJeong,HariKrishnaYalamanchili,CaiweiGuo,Joshua,M.Shulman,ZhandongLiu
IMPROVINGTHEEXPLAINABILITYOFRANDOMFORESTCLASSIFIER–USERCENTEREDAPPROACH...........................................................................................................................................................47DragutinPetkovic,RussB.Altman,MikeWong,ArthurVigil
TREE-BASEDMETHODSFORCHARACTERIZINGTUMORDENSITYHETEROGENEITY...48KatherineShoemaker,BrianP.Hobbs,KarthikBharath,ChaanS.Ng,VeerabhadranBaladandayuthapani
DEMOCRATIZINGHEALTHDATAFORTRANSLATIONALRESEARCH................................49IDENTIFYINGNATURALHEALTHPRODUCTANDDIETARYSUPPLEMENTINFORMATIONWITHINADVERSEEVENTREPORTINGSYSTEMS.............................................50VivekanandSharma,IndraNeilSarkar
DEMOCRATIZINGDATASCIENCETHROUGHDATASCIENCETRAINING...............................51JohnDarrellVanHorn,LilyFierro,JeanaKamdar,JonathanGordon,CrystalStewart,AvnishBhattrai,SumikoAbe,XiaoxiaoLei,CarolineO’Driscoll,AakanchhaSinha,PriyambadaJain,GullyBurns,KristinaLerman,JoséLuisAmbite
IMAGINGGENOMICS...........................................................................................................................52HERITABILITYESTIMATESONRESTINGSTATEFMRIDATAUSINGTHEENIGMAANALYSISPIPELINE.........................................................................................................................................53BhimM.Adhikari,NedaJahanshad,DineshShukla,DavidC.Glahn,JohnBlangero,RichardC.Reynolds,RobertW.Cox,ElsFieremans,JelleVeraart,DmitryS.Novikov,ThomasE.Nichols,L.ElliotHong,PaulM.Thompson,PeterKochunov
v
MRITOMGMT:PREDICTINGMETHYLATIONSTATUSINGLIOBLASTOMAPATIENTSUSINGCONVOLUTIONALRECURRENTNEURALNETWORKS......................................................54LichyHan,MaulikR.Kamdar
BUILDINGTRANS-OMICSEVIDENCE:USINGIMAGINGAND‘OMICS’TOCHARACTERIZECANCERPROFILES............................................................................................................................................55ArunimaSrivastava,ChaitanyaKulkarni,ParagMallick,KunHuang,RaghuMachiraju
PRECISIONMEDICINE:FROMDIPLOTYPESTODISPARITIESTOWARDSIMPROVEDHEALTHANDTHERAPIES.................................................................................................................56LOCALANCESTRYTRANSITIONSMODIFYSNP-TRAITASSOCIATIONS..................................57AlexandraE.Fish,DanaC.Crawford,JohnA.Capra,WilliamS.Bush
EVALUATIONOFPREDIXCANFORPRIORITIZINGGWASASSOCIATIONSANDPREDICTINGGENEEXPRESSION...........................................................................................................................................58BinglanLi,ShefaliS.Verma,YogasudhaC.Veturi,AnuragVerma,YukiBradford,DavidW.Haas,MarylynD.Ritchie
READINGBETWEENTHEGENES:COMPUTATIONALMODELSTODISCOVERFUNCTIONAND/ORCLINICALUTILITYFROMNONCODINGDNA.............................................................59PAN-CANCERANALYSISOFEXPRESSEDSOMATICNUCLEOTIDEVARIANTSINLONGINTERGENICNON-CODINGRNA................................................................................................................60TraversChing,LanaX.Garmire
TEXTMININGANDVISUALIZATIONFORPRECISIONMEDICINE.........................................61GENEDIVE:AGENEINTERACTIONSEARCHANDVISUALIZATIONTOOLTOFACILITATEPRECISIONMEDICINE.....................................................................................................................................62PaulPrevide,BrookThomas,MikeWong,EmilyK.Mallory,DragutinPetkovic,RussB.Altman,AnaghaKulkarni
POSTERPRESENTATIONSAPPLICATIONSOFGENETICS,GENOMICSANDBIOINFORMATICSINDRUGDISCOVERY...................................................................................................................................................................63CELL-SPECIFICPREDICTIONANDAPPLICATIONOFDRUG-INDUCEDGENEEXPRESSIONPROFILES..............................................................................................................................................................64RachelHodos,PingZhang,Hao-ChihLee,QiaonanDuan,ZichenWang,NeilR.Clark,AviMa’ayan,FeiWang,BrianKidd,JianyingHu,DavidSontag,JoelT.Dudley
SYSTEMATICDISCOVERYOFGENOMICMARKERSFORCLINICALOUTCOMESTHROUGHCOMBINEDANALYSISOFCLINICALANDGENOMICDATA...........................................................65JinhoKim,HonguiCha,Hyun-TaeShin,BoramLee,JaeWonYun,JoonHoKang,Woong-YangPark
IDENTIFICATIONOFAPREDICTIVEGENESIGNATUREFORDIFFERENTIATINGTHEEFFECTSOFCIGARETTESMOKING..........................................................................................................66GangLiu,JustinLi,G.L.Prasad
THEEXTREMEMEMORY®CHALLENGE:ASEARCHFORTHEHERITABLEFOUNDATIONSOFEXCEPTIONALMEMORY........................................................................................67MaryA.Pyc,DouglasFenger,PhilipCheung,J.StevendeBelle,TimTully
vi
EXTRACTINGABIOLOGICALLYRELEVANTLATENTSPACEFROMCANCERTRANSCRIPTOMESWITHVARIATIONALAUTOENCODERS..........................................................68GregoryP.Way,CaseyS.Greene
CHALLENGESOFPATTERNRECOGNITIONINBIOMEDICALDATAORALPRESENTATION...................................................................................................................................69LARGE-SCALEANALYSISOFDISEASEPATHWAYSINTHEHUMANINTERACTOME........70MonicaAgrawal,MarinkaZitnik,JureLeskovec
PROFILINGOFSOMATICALTERATIONSINBRCA1-LIKEBREASTTUMORS.........................71YoudinghuanChen,YueWang,LucasA.Salas,ToddW.Miller,JonathanD.Marotti,NicoleP.Jenkins,ArminjaN.Kettenbach,ChaoCheng,BrockC.Christensen
USINGARTIFICIALINTELLIGENCEINDIGITALPATHOLOGYTOCLASSIFYMELANOCYTICLESIONS................................................................................................................................72StevenN.Hart,W.Flotte,A.P.Norgan,K.K.Shah,Z.R.Buchan,K.B.Geiersbach,T.Mounajjed,T.J.Flotte
AMACHINELEARNINGAPPROACHTOSTUDYCOMMONGENEEXPRESSIONPATTERNS...................................................................................................................................................................................73MingzeHe,CarolynJ.Lawrence-Dill
GENERAL................................................................................................................................................74DATABASE-FREEMETAGENOMICANALYSISWITHAKRONYMER............................................75GabrielAl-Ghalith,AbigailJohnson,PajauVangay,DanKnights
SOFTWARECOMPARISONFORPREPROCESSINGGC/LC-MS-BASEDMETABOLOMICSDATA.......................................................................................................................................................................76JulianAldana,MonicaCalaMolina,MarthaZuluaga
GATEKEEPER:ANEWHARDWAREARCHITECTUREFORACCELERATINGPRE-ALIGNMENTINDNASHORTREADMAPPING......................................................................................77MohammedAlser,HasanHassan,HongyiXin,OğuzErgin,OnurMutlu,CanAlkan
MODELINGTHEENHANCERACTIVITYTHROUGHTHECOMBINATIONOFEPIGENETICFACTORS...............................................................................................................................................................78MinGyunBae,TaeyeopLee,JaehoOh,JunHyeongLee,JungKyoonChoi
FREQUENCYANDPROPERTIESOFMOSAICSOMATICMUTATIONSINANORMALDEVELOPINGBRAIN........................................................................................................................................79TaejeongBae,JessicaMariani,LiviaTomasini,BoZhou,AlexanderE.Urban,AlexejAbyzov,FloraM.Vaccarino
CYCLONOVO:DENOVOSEQUENCINGALGORITHMDISCOVERSNOVELCYCLICPEPTIDENATURALPRODUCTSINSUNFLOWERANDCYANOBACTERIAUSINGTANDEMMASSSPECTROMETRYDATA...................................................................................................................................80BaharBehsaz,HoseinMohimani,AlexeyGurevich,AndreyPrjibelski,MarkF.Fisher,LarrySmarr,PieterC.Dorrestein,JoshuaS.Mylne,PavelA.Pevzner
FUNCTIONALANNOTATIONOFGENOMICVARIANTSINSTUDIESOFLATE-ONSETALZHEIMER’SDISEASE...................................................................................................................................81MariuszButkiewicz,JonathanL.Haines,WilliamS.Bush
vii
OCTAD:ANOPENCANCERTHERAPEUTICDISCOVERYWORKSPACEINTHEERAOFPRECISIONMEDICINE.....................................................................................................................................82BinChen,BenjaminS.Glicksberg,WilliamZeng,YuyingChen,KeLiu
DEEPLEARNINGPREDICTSTUBERCULOSISDRUGRESISTANCESTATUSFROMWHOLE-GENOMESEQUENCINGDATA......................................................................................................................83MichaelL.Chen,IsaacS.Kohane,AndrewL.Beam,MahaFarhat
DESIGNINGPREDICTIONMODELFORHYPERURICEMIAWITHVARIOUSMACHINELEARNINGTOOLSUSINGHEALTHCHECK-UPEHRDATABASE..................................................84EunKyungChoe,SangWooLee
RICK:RNAINTERACTIVECOMPUTINGKIT..........................................................................................85GalinaA.Erikson,LingHuang,MaximShokhirev
PRIVATEINFORMATIONLEAKAGEINFUNCTIONALGENOMICSEXPERIMENTS:QUANTIFICATIONANDLINKING...............................................................................................................86GamzeGursoy,MarkGerstein
CARPED.I.E.M:ADATAINTEGRATIONEXPECTATIONMAPFORTHEPOTENTIALOFMULTI-`OMICSINTEGRATIONINCOMPLEXDISEASE.....................................................................87TiaTateHudson,ClarLyndaWilliams-DeVane
IMPROVINGGENEFUSIONDETECTIONACCURACYWITHFUSIONCONTIGREALIGNMENTINTARGETEDTUMORSEQUENCING......................................................................88JinHyunJu,XiaoChen,JuneSnedecor,Han-YuChuang,BenMishkanian,SvenBilke
SPARSEREGRESSIONFORNETWORKGRAPHSANDITSAPPLICATIONTOGENENETWORKSOFTHEBRAIN..........................................................................................................................89HidekoKawakubo,YusukeMatsui,TeppeiShimamura
GRIM-FILTER:FASTSEEDLOCATIONFILTERINGINDNAREADMAPPINGUSINGPROCESSING-IN-MEMORYTECHNOLOGIES.........................................................................................90JeremieS.Kim,DamlaS.Cali,HongyiXin,DonghyukLee,SaugataGhose,MohammedAlser,HasanHassan,OğuzErgin,CanAlkan,OnurMutlu
MULTI-CLASSCLASSIFICATIONSTRATEGYFORSUPPORTVECTORMACHINESUSINGWEIGHTEDVOTINGANDVOTINGDROP...............................................................................................91SunghoKim,TaehunKim
GENOME-WIDEANALYSISOFTRANSCRIPTIONALANDCYTOKINERESPONSEVARIABILITYINACTIVATEDHUMANIMMUNECELLS...................................................................92SarahKim-Hellmuth,MatthiasBechheim,BennoPütz,PejmanMohammadi,JohannesSchumacher,VeitHornung,BertramMüller-Myhsok,TuuliLappalainen
PREDICTINGFATIGUESEVERITYINONCOLOGYPATIENTSONEWEEKFOLLOWINGCHEMOTHERAPY...............................................................................................................................................93KordM.Kober,XiaoHu,BruceA.Cooper,StevenM.Paul,ChristineMiaskowski
SINGLE-MOLECULEPROTEINIDENTIFICATIONBYSUB-NANOPORESENSORS.................94MikhailKolmogorov,EamonnKennedy,ZhuxinDong,GregoryTimp,PavelA.Pevzner
GENEEXPRESSIONPROFILEOFOSTEOARTHRITISAFFECTEDFINGERJOINTS................95MilicaKrunic,KlausBobacz,ArndtvonHaeseler
viii
DISCOVERYANDPRIORITIZATIONOFDENOVOMUTATIONSINAUTISMSPECTRUMDISORDER.............................................................................................................................................................96TaeyeopLee,JaehoOh,MinGyunBae,JunHyeongLee,JungKyoonChoi
CROSSTALKER:ANOPENNETWORKANDPATHWAYANALYSISPLATFORM.....................97SeanMaxwell,MarkR.Chance
SIGNATURESOFNON–SMALL-CELLLUNGCANCERRELAPSEPATIENTS:DIFFERENTIALEXPRESSIONANALYSISANDGENENETWORKANALYSIS............................................................98AbigailE.Moore,BrandonZheng,PatriciaM.Watson,RobertC.Wilson,DennisK.Watson,PaulE.Anderson
RANKINGBIOLOGICALFEATURESBYDIFFERENTIALABUNDANCE.......................................99SoumyashantNayak,NicholasLahens,EunJiKim,GregoryGrant
SYSTEMATICANALYSISOFOBESITYASSOCIATEDVARIATIONSTHROUGHMACHINELEARNINGBASEDONGENOMICSANDEPIGENOMICS..................................................................100JaehoOh,JunHyeongLee,TaeyeopLee,MinGyunBae,JungKyoonChoi
SPARSEREGRESSIONMODELINGOFDRUGRESPONSEWITHALOCALIZEDESTIMATIONFRAMEWORK....................................................................................................................................................101TeppeiShimamura,HidekoKawakubo,HyunhaNam,YusukeMatsui
PDBMAP:APIPELINEANDDATABASEFORMAPPINGGENETICVARIATIONINTOPROTEINSTRUCTURESANDHOMOLOGYMODELS........................................................................102R. MichaelSivley,JohnA.Capra,WilliamS.Bush
REPETITIVERNAANDGENOMICINSTABILITYINHIGH-GRADESEROUSOVARIANCANCERPROGRESSIONANDDEVELOPMENT...................................................................................103JamesR.Torpy,NenadBartonicek,DavidD.L.Bowtell,MarcelE.Dinger
DIMENSIONREDUCTIONOFGENOME-WIDESEQUENCINGDATABASEDONLINKAGEDISEQUILIBRIUMSTRUCTURE.................................................................................................................104YunJooYoo,Suh-RyungKim,SunAhKim,ShelleyB.Bull
THEMULTIPLEGENEISOFORMTEST...................................................................................................105YaoYu,ChadD.Huff
IMAGINGGENOMICS.........................................................................................................................106GENETICANALYSISOFCEREBRALBLOODFLOWIMAGINGPHENOTYPESINALZHEIMER’SDISEASE.................................................................................................................................107XiaohuiYao,ShannonL.Risacher,KwangsikNho,AndrewJ.Saykin,HengHuang,ZeWang,LiShen
PBRM1MUTATIONSAREASSOCIATEDWITHTISSUEMORPHOLOGICALCHANGESINKIDNEYCANCER..............................................................................................................................................108JunCheng,JieZhang,ZhiHan,LiangCheng,QianjinFeng,KunHuang
IMAGEGENOMICSOFINTRA-TUMORHETEROGENEITYUSINGDEEPNEURALNETWORKS........................................................................................................................................................109HuiQu,SubhajyotiDe,DimitrisMetaxas
THENEUROIMAGINGINFORMATICSTOOLSANDRESOURCESCOLLABORATORY(NITRC)ANDITSIMAGINGGENOMICSDOMAIN.............................................................................110LiShen,DavidKennedy,ChristianHaselgrove,AbbyPaulson,NinaPreuss,RobertBuccigrossi,MatthewTravers,AlbertCrowley,andTheNITRCTeam
ix
IDENTIFYINGTHEGISTOFCNNS:FINDINGINTERPRETABLESIGNATURESOFHISTOLOGYIMAGEMODELSBUILTUSINGNEURALNETWORKS...........................................111ArunimaSrivastava,ChaitanyaKulkarni,KunHuang,ParagMallick,RaghuMachiraju
PRECISIONMEDICINE:FROMDIPLOTYPESTODISPARITIESTOWARDSIMPROVEDHEALTHANDTHERAPIES...............................................................................................................112EXPLORINGTHEPOTENTIALOFEXOMESEQUENCINGINNEWBORNSCREENING.......113StevenE.Brenner,AashishN.Adhikari,YaqiongWang,RobertJ.Currier,RenataC.Gallagher,RobertL.Nussbaum,YangyunZou,UmaSunderam,JosephSheih,FlaviaChen,MarkKvale,SeanD.Mooney,RajSrinivasan,BarbaraA.Koenig,PuiKwok,JenniferM.Puck,TheNBSeqProject
AMETHODFORIMPROVEDVARIANTCALLINGATHOMOPOLYMERMARGINS(ANDELSEWHERE)....................................................................................................................................................114J.Buckley,M.Hiemenz,J.Biegel,T.Triche,A.Ryutov,D.Maglinte,D.Ostrow,X.Gai
EFFICIENTSURVIVALMULTIFACTORDIMENSIONALITYREDUCTIONMETHODFORDETECTINGGENE-GENEINTERACTION..............................................................................................115JiangGui,XuemeiJi,ChristopherI.Amos
BIOINFORMATICSPROCESSINGSTRATEGIESFOREFFICIENTSEQUENCINGDATASTORAGEUSINGGVCFBANDING.............................................................................................................116NicholasB.Larson,ShannonK.McDonnell,IainF.Horton,SaurabhBaheti,JeanetteE.Eckel-Passow,StevenN.Hart
IDENTIFICATIONOFANOVELTSC2MUTATIONINAPATIENTWITHTUBEROUSSCLEROSISCOMPLEX....................................................................................................................................117Jae-HyungLee,Su-KyeongHwang,Jung-eunYang,Chae-SeokLim,Jin-ALee,KyungminLee,Bong-KiunKaang,Yong-SeokLee
CONSIDERATIONSFORAUTOMATEDMACHINELEARNINGINCLINICALMETABOLICPROFILING:ALTEREDHOMOCYSTEINEPLASMACONCENTRATIONASSOCIATEDWITHMETFORMINEXPOSURE..............................................................................................................................118AlenaOrlenko,JasonH.Moore,PatrykOrzechowski,RandalS.Olson,JunmeiCairns,PedroJ.Caraballo,RichardM.Weinshilboum,LieweiWang,MatthewK.Breitenstein
PHARMGKB:NEWWEBSITERELEASE2017........................................................................................119MichelleWhirl-Carrillo,RyanM.Whaley,MarkWoon,KatrinSangkuhi,LiGong,JuliaBarbarino,CarolineThorn,RachelHuddart,MariaAlvarellos,JillRobinson,RussB.Altman,TeriE.Klein
READINGBETWEENTHEGENES:COMPUTATIONALMODELSTODISCOVERFUNCTIONAND/ORCLINICALUTILITYFROMNONCODINGDNA...........................................................120NETWORKANALYSISOFPSEUDOGENE-GENERELATIONSHIPS:FROMPSEUDOGENEEVOLUTIONTOTHEIRFUNCTIONALPOTENTIALS........................................................................121TravisS.Johnson,SihongLi,JohnathanR.Kho,KunHuang,YanZhang
RANDOMWALKSONMUTUALMICRORNA-TARGETGENEINTERACTIONNETWORKIMPROVETHEPREDICTIONOFDISEASE-ASSOCIATEDMICRORNAS....................................122Duc-HauLe,LievenVerbeke,LeHoangSon,Dinh-ToiChu,Van-HuyPham
x
TEXTMININGANDVISUALIZATIONFORPRECISIONMEDICINE.......................................123MININGELECTRONICHEALTHRECORDSFORPATIENT-CENTEREDOUTCOMESTOGUIDETREATMENTPATHWAYDECISIONSFOLLOWINGPROSTATECANCERDIAGNOSIS..........................................................................................................................................................124SelenBozkurt,JungInPark,DanielL.Rubin,JamesD.Brooks,TinaHernandez-Boussard
GDMINER:ABIOTEXTMININGSYSTEMFORGENE-DISEASERELATIONANALYSIS.....125SooJunPark,JihyunKim,SooYoungCho,CharnyPark,YoungSeekLee
WORKSHOP.....................................................................................................................................126MACHINELEARNINGANDDEEPANALYTICSFORBIOCOMPUTING:CALLFORBETTEREXPLAINABILITY...............................................................................................................................126METHODSFOREXAMININGDATAQUALITYINHEALTHCAREINTEGRATEDDATAREPOSITORIES..................................................................................................................................................127VojtechHuser,MichaelG.Kahn,JeffreyS.Brown,RamkiranGouripeddi
MULTI-CLASSCLASSIFICATIONSTRATEGYFORSUPPORTVECTORMACHINESUSINGWEIGHTEDVOTINGANDVOTINGDROP.............................................................................................128SunghoKim,TaehunKim
ATOPOLOGY-BASEDAPPROACHTOQUANTIFYNETWORKPERTURBATIONSCORESFORASSESSMENTOFDIFFERENTTOBACCOPRODUCTCLASSES...........................................129QuynhT.Tran,LeeLarcombe,SubhashiniArimilli,G.L.Prasad
AUTHORINDEX.............................................................................................................................130
1
APPLICATIONSOFGENETICS,GENOMICSANDBIOINFORMATICSINDRUGDISCOVERY
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
2
CHARACTERIZATIONOFDRUG-INDUCEDSPLICINGCOMPLEXITYINPROSTATECANCERCELLLINEUSINGLONGREADTECHNOLOGY
XintongChen1,SanderHouten1,KimaadaAllette1,RobertP.Sebra1,GustavoStolovitzky1,2,BojanLosic1
1IcahnSchoolofMedicineatMountSinai,2IBM
Bojan,LosicWecharacterizethetranscriptionalsplicinglandscapeofaprostatecancercelllinetreatedwithapreviouslyidentifiedsynergisticdrugcombination.Weuseacombinationofthirdgenerationlong-readRNAsequencingtechnologyandshort-readRNAseqtocreateahigh-fidelitymapofexpressedisoformsandfusionstoquantifysplicingeventstriggeredbytreatment.Wefindstrongevidencefordrug-induced,coherentsplicingchangeswhichdisruptthefunctionofoncogenicproteins,anddetectnoveltranscriptsarisingfrompreviouslyunreportedfusionevents.
3
CELL-SPECIFICPREDICTIONANDAPPLICATIONOFDRUG-INDUCEDGENEEXPRESSIONPROFILES
RachelHodos1,2,PingZhang3,Hao-ChihLee1,QiaonanDuan1,ZichenWang1,NeilR.Clark1,AviMa’ayan1,FeiWang3,4,BrianKidd1,JianyingHu3,DavidSontag5,JoelT.
Dudley1
1IcahnSchoolofMedicineatMountSinai,2NewYorkUniversity,3IBMT.J.WatsonResearchCenter,4CornellUniversity,5MassachusettsInstituteofTechnology
Rachel,HodosGeneexpressionprofilingofinvitrodrugperturbationsisusefulformanybiomedicaldiscoveryapplicationsincludingdrugrepurposingandelucidationofdrugmechanisms.However,limiteddataavailabilityacrosscelltypeshashinderedourcapacitytoleverageorexplorethecell-specificityoftheseperturbations.Whilerecenteffortshavegeneratedalargenumberofdrugperturbationprofilesacrossavarietyofhumancelltypes,manygapsremaininthiscombinatorialdrug-cellspace.Hence,weaskedwhetheritispossibletofillthesegapsbypredictingcell-specificdrugperturbationprofilesusingavailableexpressiondatafromrelatedconditions--i.e.fromotherdrugsandcelltypes.Wedevelopedacomputationalframeworkthatfirstarrangesexistingprofilesintoathree-dimensionalarray(ortensor)indexedbydrugs,genes,andcelltypes,andthenuseseitherlocal(nearest-neighbors)orglobal(tensorcompletion)informationtopredictunmeasuredprofiles.Weevaluatepredictionaccuracyusingavarietyofmetrics,andfindthatthetwomethodshavecomplementaryperformance,eachsuperiorindifferentregionsinthedrug-cellspace.Predictionsachievecorrelationsof0.68withtruevalues,andmaintainaccuratedifferentiallyexpressedgenes(AUC0.81).Finally,wedemonstratethatthepredictedprofilesaddvalueformakingdownstreamassociationswithdrugtargetsandtherapeuticclasses.
4
LARGE-SCALEINTEGRATIONOFHETEROGENEOUSPHARMACOGENOMICDATAFORIDENTIFYINGDRUGMECHANISMOFACTION
YunanLuo,ShengWang,JinfengXiao,JianPeng
UniversityofIllinoisatUrbana-ChampaignYunan,LuoAvarietyoflarge-scalepharmacogenomicdata,suchasperturbationexperimentsandsensitivityprofiles,enablethesystematicalidentificationofdrugmechanismofactions(MoAs),whichisacrucialtaskintheeraofprecisionmedicine.However,integratingthesecomplementarypharmacogenomicdatasetsisinherentlychallengingduetothewildheterogeneity,high-dimensionalityandnoisynatureofthesedatasets.Inthiswork,wedevelopMania,anovelmethodforthescalableintegrationoflarge-scalepharmacogenomicdata.Maniafirstconstructsadrug-drugsimilaritynetworkthroughintegratingmultipleheterogeneousdatasources,includingdrugsensitivity,drugchemicalstructure,andperturbationassays.Itthenlearnsacompactvectorrepresentationforeachdrugtosimultaneouslyencodeitsstructuralandpharmacogenomicproperties.ExtensiveexperimentsdemonstratethatManiaachievessubstantiallyimprovedperformanceinbothMoAsandtargetsprediction,comparedtopredictionsbasedonindividualdatasourcesaswellasastate-of-the-artintegrativemethod.Moreover,Maniaidentifiesdrugsthattargetfrequentlymutatedcancergenes,whichprovidesnovelinsightsintodrugrepurposing.
5
CHEMICALREACTIONVECTOREMBEDDINGS:TOWARDSPREDICTINGDRUGMETABOLISMINTHEHUMANGUTMICROBIOME
EmilyK.Mallory1,AmbikaAcharya1,StefanoE.Rensi1,PeterJ.Turnbaugh2,RoselieA.Bright3,RussB.Altman1
1StanfordUniversity,2UniversityofCaliforniaSanFrancisco,3FoodandDrug
AdministrationEmily,MalloryBacteriainthehumanguthavetheabilitytoactivate,inactivate,andreactivatedrugswithbothintendedandunintendedeffects.Forexample,thedrugdigoxinisreducedtotheinactivemetabolitedihydrodigoxinbythegutActinobacteriumE.lenta,andpatientscolonizedwithhighlevelsofdrugmetabolizingstrainsmayhavelimitedresponsetothedrug.Understandingthecompletespaceofdrugsthataremetabolizedbythehumangutmicrobiomeiscriticalforpredictingbacteria-drugrelationshipsandtheireffectsonindividualpatientresponse.Discoveryandvalidationofdrugmetabolismviabacterialenzymeshasyielded>50drugsafternearlyacenturyofexperimentalresearch.However,therearelimitedcomputationaltoolsforscreeningdrugsforpotentialmetabolismbythegutmicrobiome.Wedevelopedapipelineforcomparingandcharacterizingchemicaltransformationsusingcontinuousvectorrepresentationsofmolecularstructurelearnedusingunsupervisedrepresentationlearning.WeappliedthispipelinetochemicalreactiondatafromMetaCyctocharacterizetheutilityofvectorrepresentationsforchemicalreactiontransformations.Afterclusteringmolecularandreactionvectors,weperformedenrichmentanalysesandqueriestocharacterizethespace.Wedetectedenrichedenzymenames,GeneOntologyterms,andEnzymeConsortium(EC)classeswithinreactionclusters.Inaddition,wequeriedreactionsagainstdrug-metabolitetransformationsknowntobemetabolizedbythehumangutmicrobiome.Thetopresultsfortheseknowndrugtransformationscontainedsimilarsubstructuremodificationstotheoriginaldrugpair.Thisworkenableshighthroughputscreeningofdrugsandtheirresultingmetabolitesagainstchemicalreactionscommontogutbacteria.
6
EXTRACTINGABIOLOGICALLYRELEVANTLATENTSPACEFROMCANCERTRANSCRIPTOMESWITHVARIATIONALAUTOENCODERS
GregoryP.Way,CaseyS.Greene
UniversityofPennsylvaniaGregory,WayTheCancerGenomeAtlas(TCGA)hasprofiledover10,000tumorsacross33differentcancer-typesformanygenomicfeatures,includinggeneexpressionlevels.Geneexpressionmeasurementscapturesubstantialinformationaboutthestateofeachtumor.Certainclassesofdeepneuralnetworkmodelsarecapableoflearningameaningfullatentspace.Suchalatentspacecouldbeusedtoexploreandgeneratehypotheticalgeneexpressionprofilesundervarioustypesofmolecularandgeneticperturbation.Forexample,onemightwishtousesuchamodeltopredictatumor'sresponsetospecifictherapiesortocharacterizecomplexgeneexpressionactivationsexistingindifferentialproportionsindifferenttumors.Variationalautoencoders(VAEs)areadeepneuralnetworkapproachcapableofgeneratingmeaningfullatentspacesforimageandtextdata.Inthiswork,wesoughttodeterminetheextenttowhichaVAEcanbetrainedtomodelcancergeneexpression,andwhetherornotsuchaVAEwouldcapturebiologically-relevantfeatures.Inthefollowingreport,weintroduceaVAEtrainedonTCGApan-cancerRNA-seqdata,identifyspecificpatternsintheVAEencodedfeatures,anddiscusspotentialmeritsoftheapproach.Wenameourmethod"Tybalt"afteraninstigative,cat-likecharacterwhosetsacascadingchainofeventsinmotioninShakespeare'sRomeoandJuliet.Fromasystemsbiologyperspective,Tybaltcouldonedayaidincancerstratificationorpredictspecificactivatedexpressionpatternsthatwouldresultfromgeneticchangesortreatmenteffects.
7
CHALLENGESOFPATTERNRECOGNITIONINBIOMEDICALDATAORALPRESENTATION
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
8
LARGE-SCALEANALYSISOFDISEASEPATHWAYSINTHEHUMANINTERACTOME
MonicaAgrawal,MarinkaZitnik,JureLeskovec
StanfordUniversityMarinka,ZitnikDiscoveringdiseasepathways,whichcanbedefinedassetsofproteinsassociatedwithagivendisease,isanimportantproblemthathasthepotentialtoprovideclinicallyactionableinsightsfordiseasediagnosis,prognosis,andtreatment.Computationalmethodsaidthediscoverybyrelyingonprotein-proteininteraction(PPI)networks.Theystartwithafewknowndisease-associatedproteinsandaimtofindtherestofthepathwaybyexploringthePPInetworkaroundtheknowndiseaseproteins.However,thesuccessofsuchmethodshasbeenlimited,andfailurecaseshavenotbeenwellunderstood.HerewestudythePPInetworkstructureof519diseasepathways.Wefindthat90%ofpathwaysdonotcorrespondtosinglewell-connectedcomponentsinthePPInetwork.Instead,proteinsassociatedwithasinglediseasetendtoformmanyseparateconnectedcomponents/regionsinthenetwork.Wethenevaluatestate-of-the-artdiseasepathwaydiscoverymethodsandshowthattheirperformanceisespeciallypoorondiseaseswithdisconnectedpathways.Thus,weconcludethatnetworkconnectivitystructurealonemaynotbesufficientfordiseasepathwaydiscovery.However,weshowthathigher-ordernetworkstructures,suchassmallsubgraphsofthepathway,provideapromisingdirectionforthedevelopmentofnewmethods.
9
MAPPINGPATIENTTRAJECTORIESUSINGLONGITUDINALEXTRACTIONANDDEEPLEARNINGINTHEMIMIC-IIICRITICALCAREDATABASE
BrettK.Beaulieu-Jones,PatrykOrzechowski,JasonH.Moore
UniversityofPennsylvaniaBrett,Beaulieu-JonesElectronicHealthRecords(EHRs)containawealthofpatientdatausefultobiomedicalresearchers.Atpresent,boththeextractionofdataandmethodsforanalysesarefrequentlydesignedtoworkwithasinglesnapshotofapatient’srecord.Healthcareprovidersoftenperformandrecordactionsinsmallbatchesovertime.Byextractingthesecareevents,asequencecanbeformedprovidingatrajectoryforapatient’sinteractionswiththehealthcaresystem.Thesecareeventsalsoofferabasicheuristicforthelevelofattentionapatientreceivesfromhealthcareproviders.Weshowthatispossibletolearnmeaningfulembeddingsfromthesecareeventsusingtwodeeplearningtechniques,unsupervisedautoencodersandlongshort-termmemorynetworks.WecomparethesemethodstotraditionalmachinelearningmethodswhichrequireapointintimesnapshottobeextractedfromanEHR.
10
AUTOMATEDDISEASECOHORTSELECTIONUSINGWORDEMBEDDINGSFROMELECTRONICHEALTHRECORDS
BenjaminS.Glicksberg,RiccardoMiotto,KippW.Johnson,KhaderShameer,LiLi,RongChen,JoelT.Dudley
IcahnSchoolofMedicineatMountSinai
Benjamin,GilcksbergAccurateandrobustcohortdefinitioniscriticaltobiomedicaldiscoveryusingElectronicHealthRecords(EHR).Similartoprospectivestudydesigns,highqualityEHR-basedresearchrequiresrigorousselectioncriteriatodesignatecase/controlstatusparticulartoeachdisease.Electronicphenotypingalgorithms,whicharemanuallybuiltandvalidatedperdisease,havebeensuccessfulinfillingthisneed.However,theseapproachesaretime-consuming,leadingtoonlyarelativelysmallamountofalgorithmsfordiseasesdeveloped.MethodologiesthatautomaticallylearnfeaturesfromEHRshavebeenusedforcohortselectionaswell.Todate,however,therehasbeennosystematicanalysisofhowthesemethodsperformagainstcurrentgoldstandards.Accordingly,thispapercomparestheperformanceofastate-of-the-artautomatedfeaturelearningmethodtoextractingresearch-gradecohortsforfivediseasesagainsttheirestablishedelectronicphenotypingalgorithms.Inparticular,weuseword2vectocreateunsupervisedembeddingsofthephenotypespacewithinanEHRsystem.Usingmedicalconceptsasaquery,wethenrankpatientsbytheirproximityintheembeddingspaceandautomaticallyextractputativediseasecohortsviaadistancethreshold.ExperimentalevaluationshowspromisingresultswithaverageF-scoreof0.57andAUC-ROCof0.98.However,wenoticedthatresultsvariedconsiderablybetweendiseases,thusnecessitatingfurtherinvestigationand/orphenotype-specificrefinementoftheapproachbeforebeingreadilydeployedacrossalldiseases.
11
FUNCTIONALNETWORKCOMMUNITYDETECTIONCANDISAGGREGATEANDFILTERMULTIPLEUNDERLYINGPATHWAYSINENRICHMENTANALYSES
LiaX.Harrington1,GregoryP.Way2,JenniferA.Doherty3,CaseyS.Greene2
1GeiselSchoolofMedicineatDartmouth,2UniversityofPennsylvania,3UniversityofUtahLia,HarringtonDifferentialexpressionexperimentsorotheranalysesoftenendinalistofgenes.Pathwayenrichmentanalysisisonemethodtodiscernimportantbiologicalsignalsandpatternsfromnoisyexpressiondata.However,pathwayenrichmentanalysismayperformsuboptimallyinsituationswheretherearemultipleimplicatedpathways–suchasinthecaseofgenesthatdefinesubtypesofcomplexdiseases.Oursimulationstudyshowsthatinthissetting,standardoverrepresentationanalysisidentifiesmanyfalsepositivepathwaysalongwiththetruepositives.Thesefalsepositiveshamperinvestigators’attemptstogleanbiologicalinsightsfromenrichmentanalysis.Wedevelopandevaluateanapproachthatcombinescommunitydetectionoverfunctionalnetworkswithpathwayenrichmenttoreducefalsepositives.Oursimulationstudydemonstratesthatalargereductioninfalsepositivescanbeobtainedwithasmalldecreaseinpower.Thoughwehypothesizedthatmultiplecommunitiesmightunderliepreviouslydescribedsubtypesofhigh-gradeserousovariancancerandappliedthisapproach,ourresultsdonotsupportthishypothesis.Insummary,applyingcommunitydetectionbeforeenrichmentanalysismayeaseinterpretationforcomplexgenesetsthatrepresentmultipledistinctpathways.
12
CAUSALINFERENCEONELECTRONICHEALTHRECORDSTOASSESSBLOODPRESSURETREATMENTTARGETS:ANAPPLICATIONOFTHEPARAMETRICG
FORMULA
KippW.Johnson1,BenjaminS.Glicksberg1,RachelHodos1,2,KhaderShameer1,JoelT.Dudley1
1InstituteforNextGenerationHealthcare,DepartmentofGeneticsandGenomic
Sciences,IcahnSchoolofMedicineatMountSinai;2CourantInstituteofMathematicalSciences,NewYorkUniversity
Kipp,JohnsonHypertensionisamajorriskfactorforischemiccardiovasculardiseaseandcerebrovasculardisease,whicharerespectivelytheprimaryandsecondarymostcommoncausesofmorbidityandmortalityacrosstheglobe.Toalleviatetherisksofhypertension,thereareanumberofeffectiveantihypertensivedrugsavailable.However,theoptimaltreatmentbloodpressuregoalforantihypertensivetherapyremainsanareaofcontroversy.TheresultsoftherecentSystolicBloodPressureInterventionTrial(SPRINT)trial,whichfoundbenefitsforintensiveloweringofsystolicbloodpressure,havebeendebatedforseveralreasons.WeaimedtoassessthebenefitsoftreatingtofourdifferentbloodpressuretargetsandtocompareourresultstothoseofSPRINTusingamethodforcausalinferencecalledtheparametricgformula.Weappliedthismethodtobloodpressuremeasurementsobtainedfromtheelectronichealthrecordsofapproximately200,000patientswhovisitedtheMountSinaiHospitalinNewYork,NY.Wesimulatedtheeffectoffourclinicallyrelevantdynamictreatmentregimes,assessingtheeffectivenessoftreatingtofourdifferentbloodpressuretargets:150mmHg,140mmHg,130mmHg,and120mmHg.IncontrasttocurrentAmericanHeartAssociationguidelinesandinconcordancewithSPRINT,wefindthattargeting120mmHgsystolicbloodpressureissignificantlyassociatedwithdecreasedincidenceofmajoradversecardiovascularevents.Causalinferencemethodsappliedtoelectronicmethodsareapowerfulandflexibletechniqueandmedicinemaybenefitfromtheirincreasedusage.
13
DATA-DRIVENADVICEFORAPPLYINGMACHINELEARNINGTOBIOINFORMATICSPROBLEMS
RandalS.Olson,WilliamLaCava,ZairahMustahsan,AkshayVarik,JasonH.Moore
UniversityofPennsylvaniaWilliam,LaCavaAsthebioinformaticsfieldgrows,itmustkeeppacenotonlywithnewdatabutwithnewalgorithms.Herewecontributeathoroughanalysisof13state-of-the-art,commonlyusedmachinelearningalgorithmsonasetof165publiclyavailableclassificationproblemsinordertoprovidedata-drivenalgorithmrecommendationstocurrentresearchers.Wepresentanumberofstatisticalandvisualcomparisonsofalgorithmperformanceandquantifytheeffectofmodelselectionandalgorithmtuningforeachalgorithmanddataset.Theanalysisculminatesintherecommendationoffivealgorithmswithhyperparametersthatmaximizeclassifierperformanceacrossthetestedproblems,aswellasgeneralguidelinesforapplyingmachinelearningtosupervisedclassificationproblems.
14
HOWPOWERFULARESUMMARY-BASEDMETHODSFORIDENTIFYINGEXPRESSION-TRAITASSOCIATIONSUNDERDIFFERENTGENETIC
ARCHITECTURES?
YogasudhaC.Veturi,MarylynD.Ritchie
BiomedicalandTranslationalInformaticsInstitute,GeisingerYogasudha,VeturiTranscriptome-wideassociationstudies(TWAS)haverecentlybeenemployedasanapproachthatcandrawupontheadvantagesofgenome-wideassociationstudies(GWAS)andgeneexpressionstudiestoidentifygenesassociatedwithcomplextraits.UnlikestandardGWAS,summaryleveldatasufficesforTWASandoffersimprovedstatisticalpower.TwopopularTWASmethodsincludeeither(a)imputingthecisgeneticcomponentofgeneexpressionfromsmallersizedstudies(usingmulti-SNPpredictionorMP)intomuchlargereffectivesamplesizesaffordedbyGWAS–-TWAS-MPor(b)usingsummary-basedMendelianrandomization–-TWAS-SMR.Althoughthesemethodshavebeeneffectiveatdetectingfunctionalvariants,itremainsunclearhowextensivevariabilityinthegeneticarchitectureofcomplextraitsanddiseasesimpactsTWASresults.Ourgoalwastoinvestigatethedifferentscenariosunderwhichthesemethodsyieldedenoughpowertodetectsignificantexpression-traitassociations.Inthisstudy,weconductedextensivesimulationsbasedon6000randomlychosen,unrelatedCaucasianmalesfromGeisinger’sMyCodepopulationtocomparethepowertodetectcisexpression-traitassociations(within500kbofagene)usingtheabove-describedapproaches.TotestTWASacrossvaryinggeneticbackgroundswesimulatedgeneexpressionandphenotypeusingdifferentquantitativetraitlocipergeneandcis-expression/traitheritabilityundergeneticmodelsthatdifferentiatetheeffectofcausalityfromthatofpleiotropy.Foreachgene,onatrainingsetrangingfrom100to1000individuals,weeither(a)estimatedregressioncoefficientswithgeneexpressionastheresponseusingfivedifferentmethods:LASSO,elasticnet,BayesianLASSO,Bayesianspike-slab,andBayesianridgeregressionor(b)performedeQTLanalysis.Wethensampledwithreplacement50,000,150,000,and300,000individualsrespectivelyfromthetestingsetoftheremaining5000individualsandconductedGWASoneachset.Subsequently,weintegratedtheGWASsummarystatisticsderivedfromthetestingsetwiththeweights(oreQTLs)derivedfromthetrainingsettoidentifyexpression-traitassociationsusing(a)TWAS-MP(b)TWAS-SMR(c)eQTL-basedGWAS,or(d)standaloneGWAS.Finally,weexaminedthepowertodetectfunctionallyrelevantgenesusingthedifferentapproachesundertheconsideredsimulationscenarios.Ingeneral,weobservedgreatsimilaritiesamongTWAS-MPmethodsalthoughtheBayesianmethodsresultedinimprovedpowerincomparisontoLASSOandelasticnetasthetraitarchitecturegrewmorecomplexwhiletrainingsamplesizesandexpressionheritabilityremainedsmall.Finally,weobservedhighpowerundercausalitybutverylowtomoderatepowerunderpleiotropy.
15
DEMOCRATIZINGHEALTHDATAFORTRANSLATIONALRESEARCH
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
16
CLINGENCANCERSOMATICWORKINGGROUP–STANDARDIZINGANDDEMOCRATIZINGACCESSTOCANCERMOLECULARDIAGNOSTICDATATODRIVE
TRANSLATIONALRESEARCH
SubhaMadhavan1,DeborahRitter2,ChristineMicheel3,ShrutiRao1,AngshumoyRoy2,DmitriySonkin4,MatthewMcCoy1,MalachiGriffith5,ObiL.Griffith5,PeterMcGarvey1,
ShashikantKulkarni2,onbehalfoftheClingenSomaticWorkingGroup
1InnovationCenterforBiomedicalInformatics,GeorgetownUniversity,WashingtonD.C.;2BaylorCollegeofMedicineandTexasChildren'sHospital,Houston,TX;3Vanderbilt
UniversitySchoolofMedicine,Nashville,TN;4NationalCancerInstitute,Rockville,MD;5TheMcDonnellGenomeInstitute,WashingtonUniversity,St.Louis,MO
Subha,MadhavanAgrowingnumberofacademicandcommunityclinicsareconductinggenomictestingtoinformtreatmentdecisionsforcancerpatients.Inthelast3-5years,therehasbeenarapidincreaseinclinicaluseofnextgenerationsequencing(NGS)basedcancermoleculardiagnostic(MolDx)testing.Theincreasingavailabilityanddecreasingcostoftumorgenomicprofilingmeansthatphysicianscannowmaketreatmentdecisionsarmedwithpatient-specificgeneticinformation.Accumulatingresearchinthecancerbiologyfieldindicatesthatthereissignificantpotentialtoimprovecancerpatientoutcomesbyeffectivelyleveragingthisrichsourceofgenomicdataintreatmentplanning.Toachievetrulypersonalizedmedicineinoncology,itiscriticaltocatalogcancersequencevariantsfromMolDxtestingfortheirclinicalrelevancealongwithtreatmentinformationandpatientoutcomes,andtodosoinawaythatsupportslarge-scaledataaggregationandnewhypothesisgeneration.Onecriticalchallengetoencodingvariantdataisadoptingastandardofannotationofthosevariantsthatareclinicallyactionable.ThroughtheNIH-fundedClinicalGenomeResource(ClinGen),incollaborationwithNLM’sClinVardatabaseand>50academicandindustrybasedcancerresearchorganizations,wedevelopedtheMinimalVariantLevelData(MVLD)frameworktostandardizereportingandinterpretationofdrugassociatedalterations.WearecurrentlyinvolvedincollaborativeeffortstoaligntheMVLDframeworkwithparallel,complementarysequencevariantsinterpretationclinicalguidelinesfromtheAssociationofMolecularPathologists(AMP)forclinicallabs.InordertotrulydemocratizeaccesstoMolDxdataforcareandresearchneeds,thesestandardsmustbeharmonizedtosupportsharingofclinicalcancervariants.HerewedescribetheprocessesandmethodsdevelopedwithintheClinGen’sSomaticWGincollaborationwithover60cancercareandresearchorganizationsaswellasCLIA-certified,CAP-accreditedclinicaltestinglabstodevelopstandardsforcancervariantinterpretationandsharing.Keywords:ClinGen,Somaticvariants,predictivebiomarkers,MVLD,datasharing
17
AHEURISTICMETHODFORSIMULATINGOPEN-DATAOFARBITRARYCOMPLEXITYTHATCANBEUSEDTOCOMPAREANDEVALUATEMACHINE
LEARNINGMETHODS
JasonH.Moore,MaksimShestov,PeterSchmitt,RandalS.Olson
InstituteforBiomedicalInformatics,UniversityofPennsylvaniaJason,MooreAcentralchallengeofdevelopingandevaluatingartificialintelligenceandmachinelearningmethodsforregressionandclassificationisaccesstodatathatilluminatesthestrengthsandweaknessesofdifferentmethods.Opendataplaysanimportantroleinthisprocessbymakingiteasyforcomputationalresearcherstoeasilyaccessrealdataforthispurpose.GenomicshasinsomeexamplestakenaleadingroleintheopendataeffortstartingwithDNAmicroarrays.Whilerealdatafromexperimentalandobservationalstudiesisnecessaryfordevelopingcomputationalmethodsitisnotsufficient.Thisisbecauseitisnotpossibletoknowwhatthegroundtruthisinrealdata.Thismustbeaccompaniedbysimulateddatawherethatbalancebetweensignalandnoiseisknownandcanbedirectlyevaluated.Unfortunately,thereisalackofmethodsandsoftwareforsimulatingdatawiththekindofcomplexityfoundinrealbiologicalandbiomedicalsystems.WepresentheretheHeuristicIdentificationofBiologicalArchitecturesforsimulatingComplexHierarchicalInteractions(HIBACHI)methodandprototypesoftwareforsimulatingcomplexbiologicalandbiomedicaldata.Further,weintroducenewmethodsfordevelopingsimulationmodelsthatgeneratedatathatspecificallyallowsdiscriminationbetweendifferentmachinelearningmethods.
18
BESTPRACTICESANDLESSONSLEARNEDFROMREUSEOF4PATIENT-DERIVEDMETABOLOMICSDATASETSINALZHEIMER'SDISEASE
JessicaD.Tenenbaum,ColetteBlach
DukeUniversityJessica,TenenbaumTheimportanceofopendatahasbeenincreasinglyrecognizedinrecentyears.Althoughthesharingandreuseofclinicaldatafortranslationalresearchlagsbehindbestpracticesinbiologicalscience,anumberofpatient-deriveddatasetsexistandhavebeenpublishedenablingtranslationalresearchspanningmultiplescalesfrommoleculartoorganlevel,andfrompatientstopopulations.InseekingtoreplicatemetabolomicbiomarkerresultsinAlzheimer’sdiseaseourteamidentifiedthreeindependentcohortsinwhichtocomparefindings.Accessingthedatasetsassociatedwiththesecohorts,understandingtheircontentandprovenance,andcomparingvariablesbetweenstudieswasavaluableexerciseinexploringtheprinciplesofopendatainpractice.Italsohelpedinformstepstakentomaketheoriginaldatasetsavailableforusebyotherresearchers.Inthispaperwedescribebestpracticesandlessonslearnedinattemptingtoidentify,access,understand,andanalyzetheseadditionaldatasetstoadvanceresearchreproducibility,aswellasstepstakentofacilitatesharingofourowndata.
19
IMAGINGGENOMICS
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
20
DISCRIMINATIVEBAG-OF-CELLSFORIMAGING-GENOMICS
BenjaminChidester1,MinhN.Do2,JianMa1
1CarnegieMellonUniversity,2UniversityofIllinoisatUrbana-ChampaignBenjamin,ChidesterConnectinggenotypestoimagephenotypesiscrucialforacomprehensiveunderstandingofcancer.Tolearnsuchconnections,newmachinelearningapproachesmustbedevelopedforthebetterintegrationofimagingandgenomicdata.HereweproposeanovelapproachcalledDiscriminativeBag-of-Cells(DBC)forpredictinggenomicmarkersusingimagingfeatures,whichaddressesthechallengeofsummarizinghistopathologicalimagesbyrepresentingcellswithlearneddiscriminativetypes,orcodewords.Wealsodevelopedareliableandefficientpatch-basednuclearsegmentationschemeusingconvolutionalneuralnetworksfromwhichnuclearandcellularfeaturesareextracted.ApplyingDBConTCGAbreastcancersamplestopredictbasalsubtypestatusyieldedaclass-balancedaccuracyof70%onaseparatetestpartitionof213patients.Asdatasetsofimagingandgenomicdatabecomeincreasinglyavailable,webelieveDBCwillbeausefulapproachforscreeninghistopathologicalimagesforgenomicmarkers.SourcecodeofnuclearsegmentationandDBCareavailableat:https://github.com/bchidest/DBC.
21
DEEPINTEGRATIVEANALYSISFORSURVIVALPREDICTION
ChenglongHuang1,AlbertZhang2,GuanghuaXiao3
1ColleyvilleHeritageHighSchool,2HighlandParkHighSchool,3UniversityofTexasSouthwesternMedicalCenter
Chenglong,HuangSurvivalpredictionisveryimportantinmedicaltreatment.However,recentleadingresearchischallengedbytwofactors:1)thedatasetsusuallycomewithmulti-modality;and2)samplesizesarerelativelysmall.Tosolvetheabovechallenges,wedevelopedadeepsurvivallearningmodeltopredictpatients’survivaloutcomesbyintegratingmulti-viewdata.Theproposednetworkcontainstwosub-networks,oneview-specificandonecommonsub-network.WedesignatedoneCNN-basedandoneFCN-basedsub-networktoefficientlyhandlepathologicalimagesandmolecularprofiles,respectively.Ourmodelfirstexplicitlymaximizesthecorrelationamongtheviewsandthentransfersfeaturehierarchiesfromviewcommonalityandspecificallyfine-tunesonthesurvivalpredictiontask.Weevaluateourmethodonreallungandbraintumordatasetstodemonstratetheeffectivenessoftheproposedmodelusingdatawithmultiplemodalitiesacrossdifferenttumortypes.
22
GENOTYPE-PHENOTYPEASSOCIATIONSTUDYVIANEWMULTI-TASKLEARNINGMODEL
ZhouyuanHuo1,DinggangShen2,HengHuang1
1UniversityofPittsburgh,2UniversityofNorthCarolinaatChapelHillHeng,HuangResearchontheassociationsbetweengeneticvariationsandimagingphenotypesisdevelopingwiththeadvanceinhigh-throughputgenotypeandbrainimagetechniques.Regressionanalysisofsinglenucleotidepolymorphisms(SNPs)andimagingmeasuresasquantitativetraits(QTs)hasbeenproposedtoidentifythequantitativetraitloci(QTL)viamulti-tasklearningmodels.RecentstudiesconsidertheinterlinkedstructureswithinSNPsandimagingQTsthroughgrouplasso,e.g.ℓ21-norm,leadingtobetterpredictiveresultsandinsightsofSNPs.However,groupsparsityisnotenoughforrepresentingthecorrelationbetweenmultipletasksandℓ21-normregularizationisnotrobusteither.Inthispaper,weproposeanewmulti-tasklearningmodeltoanalyzetheassociationsbetweenSNPsandQTs.Wesupposethatlow-rankstructureisalsobeneficialtouncoverthecorrelationbetweengeneticvariationsandimagingphenotypes.Finally,weconductregressionanalysisofSNPsandQTs.ExperimentalresultsshowthatourmodelismoreaccurateinpredictionthancomparedmethodsandpresentsnewinsightsofSNPs.
23
CODONBIASAMONGSYNONYMOUSRAREVARIANTSISASSOCIATEDWITHALZHEIMER’SDISEASEIMAGINGBIOMARKER
JasonE.Miller1,ManuK.Shivakumar2,ShannonL.Risacher2,AndrewJ.Saykin2,SeunggeunLee3,KwangsikNho2,DokyoonKim1,4
1GeisingerHealthSystem,2IndianaUniversitySchoolofMedicine,3Universityof
Michigan,4PennsylvaniaStateUniversityJason,MillerAlzheimer’sdisease(AD)isaneurodegenerativedisorderwithfewbiomarkerseventhoughitimpactsarelativelylargeportionofthepopulationandispredictedtoaffectsignificantlymoreindividualsinthefuture.NeuroimaginghasbeenusedinconcertwithgeneticinformationtoimproveourunderstandinginrelationtohowADarisesandhowitcanbepotentiallydiagnosed.Additionally,evidencesuggestssynonymousvariantscanhaveafunctionalimpactongeneregulatorymechanisms,includingthoserelatedtoAD.Somesynonymouscodonsarepreferredoverothersleadingtoacodonbias.Thebiascanarisewithrespecttocodonsthataremoreorlessfrequentlyusedinthegenome.Abiascanalsoresultfromoptimalandnon-optimalcodons,whichhavestrongerandweakercodonanti-codoninteractions,respectively.AlthoughassociationtestshavebeenutilizedbeforetoidentifygenesassociatedwithAD,itremainsunclearhowcodonbiasplaysaroleandifitcanimproverarevariantanalysis.Inthiswork,rarevariantsfromwhole-genomesequencingfromtheAlzheimer’sDiseaseNeuroimagingInitiative(ADNI)cohortwerebinnedintogenesusingBioBin.AnassociationanalysisofthegeneswithAD-relatedneuroimagingbiomarkerwasperformedusingSKAT-O.Whileusingallsynonymousvariantswedidnotidentifyanygenome-widesignificantassociations,usingonlysynonymousvariantsthataffectedcodonfrequencyweidentifiedseveralgenesassignificantlyassociatedwiththeimagingphenotype.Additionally,significantassociationswerefoundusingonlyrarevariantsthatcontainsanoptimalcodoninamongminorallelesandanon-optimalcodoninthemajorallele.TheseresultssuggestthatcodonbiasmayplayaroleinADandthatitcanbeusedtoimprovedetectionpowerinrarevariantassociationanalysis.
24
PRECISIONMEDICINE:FROMDIPLOTYPESTODISPARITIESTOWARDSIMPROVEDHEALTHANDTHERAPIES
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
25
SINGLESUBJECTTRANSCRIPTOMEANALYSISREPRODUCESSIGNEDGENESETFUNCTIONALACTIVATIONSIGNALSFROMCOHORTANALYSISOFMURINE
RESPONSETOHIGHFATDIET
JoanneBerghout,QikeLi,NimaPouladi,JianrongLi,YvesA.Lussier
UniversityofArizonaJoanne,BerghoutAnalysisofsingle-subjecttranscriptomeresponsedataisanunmetneedofprecisionmedicine,madechallengingbythehighdimension,dynamicnatureanddifficultyinextractingmeaningfulsignalsfrombiologicalorstochasticnoise.Wehaveproposedamethodforsinglesubjectanalysisthatusesamixturemodelfortranscriptfold-changeclusteringfromisogenicpairedsamples,followedbyintegrationofthesedistributionswithGeneOntologyBiologicalProcesses(GO-BP)toreducedimensionandidentifyfunctionalattributes.WethenextendedthesemethodstodevelopfunctionalsigningmetricsforgenesetprocessregulationbyincorporatingbiologicalrepressorrelationshipsencodedinGOasnegatively_regulatesedges.Resultsrevealedreproducibleandbiologicallymeaningfulsignalsfromanalysisofasinglesubject’sresponse,openingthedoortofuturetranscriptomicstudieswheresubjectandresourceavailabilityarecurrentlylimiting.Weusedinbredmousestrainsfeddifferentdietstoprovideisogenicbiologicalreplicates,permittingrigorousvalidationofourmethod.Wecomparedsignificantgenotype-specificGO-BPtermresultsforoverlapandrankorderacrossthreereplicatespergenotype,andcross-methodstoreferencestandards(limma+FET,SAM+FET,andGSEA).Allsingle-subjectanalyticsfindingswererobustandhighlyreproducible(medianareaundertheROCcurve=0.96,n=24genotypesx3replicates),providingconfidenceandvalidationofthisapproachforanalysesinsinglesubjects.Rcodeisavailableonlineathttp://www.lussiergroup.org/publications/PathwayActivity
26
USINGSIMULATIONANDOPTIMIZATIONAPPROACHTOIMPROVEOUTCOMETHROUGHWARFARINPRECISIONTREATMENT
Chih-LinChi1,LuHe2,KouroshRavvaz3,JohnWeissert3,PeterJ.Tonellato4,5
1SchoolofNursing&InstituteforHealthInformatics,UniversityofMinnesota,Minneapolis,MN,USA;2ComputerScienceandEngineering,UniversityofMinnesota,Minneapolis,MN,USA;3AuroraHealthCare,Milwaukee,WI,USA;4Departmentof
BiomedicalInformatics,DepartmentofPathology,HarvardMedicalSchool,Boston,MA,USA;5ZilberSchoolofPublicHealthUniversityofWisconsin-Milwaukee,Milwaukee,WI,
USAChih-Lin,ChiWeapplyatreatmentsimulationandoptimizationapproachtodevelopdecisionsupportguidanceforwarfarinprecisiontreatmentplans.Simulationincludetheuseof~1,500,000clinicalavatars(simulatedpatients)generatedbyanintegrateddata-drivenanddomain-knowledgebasedBayesianNetworkModelingapproach.Subsequently,wesimulate30-dayindividualpatientresponsetowarfarintreatmentoffiveclinicalandgenetictreatmentplansfollowedbybothindividualandsub-populationbasedoptimization.Sub-populationoptimization(comparedtoindividualoptimization)providesacosteffectiveandrealisticmeansofimplementationofaprecision-driventreatmentplaninpracticalsettings.Inthisproject,weusethepropertyofminimalentropytominimizeoveralladverserisksforthelargestpossiblepatientsub-populationsandwetempertheresultsbyconsideringbothtransparencyandeaseofimplementation.Finally,wediscusstheimprovedoutcomeoftheprecisiontreatmentplanbasedonthesub-populationoptimizeddecisionsupportrules.
27
COALITIONALGAMETHEORYASAPROMISINGAPPROACHTOIDENTIFYCANDIDATEAUTISMGENES
AnikaGupta,MinWooSun,KelleyM.Paskov,NateT.Stockham,Jae-YoonJung,DennisP.Wall
StanfordUniversity
Dennis,WallDespitemountingevidenceforthestrongroleofgeneticsinthephenotypicmanifestationofAutismSpectrumDisorder(ASD),thespecificgenesresponsibleforthevariableformsofASDremainundefined.ASDmaybebestexplainedbyacombinatorialgeneticmodelwithvaryingepistaticinteractionsacrossmanysmalleffectmutations.Coalitionalorcooperativegametheoryisatechniquethatstudiesthecombinedeffectsofgroupsofplayers,knownascoalitions,seekingtoidentifyplayerswhotendtoimprovetheperformance--therelationshiptoaspecificdiseasephenotype--ofanycoalitiontheyjoin.Thismethodhasbeenpreviouslyshowntoboostbiologicallyinformativesignalingeneexpressiondatabutto-datehasnotbeenappliedtothesearchforcooperativemutationsamongputativeASDgenes.WedescribeourapproachtohighlightgenesrelevanttoASDusingcoalitionalgametheoryonalterationdataof1,965fullysequencedgenomesfrom756multiplexfamilies.AlterationswereencodedintobinarymatricesforASD(case)andunaffected(control)samples,indicatinglikelygene-disrupting,inheritedmutationsinalteredgenes.TodetermineindividualgenecontributionsgivenanASDphenotype,a“player”metric,referredtoastheShapleyvalue,wascalculatedforeachgeneinthecaseandcontrolcohorts.SixtysevengeneswerefoundtohavesignificantlyelevatedplayerscoresandlikelyrepresentsignificantcontributorstothegeneticcoordinationunderlyingASD.Usingnetworkandcross-studyanalysis,wefoundthatthesegenesareinvolvedinbiologicalpathwaysknowntobeaffectedintheautismcasesandthatasubsetdirectlyinteractwithseveralgenesknowntohavestrongassociationstoautism.Thesefindingssuggestthatcoalitionalgametheorycanbeappliedtolarge-scalegenomicdatatoidentifyhiddenyetinfluentialplayersincomplexpolygenicdisorderssuchasautism.
28
CONSIDERATIONSFORAUTOMATEDMACHINELEARNINGINCLINICALMETABOLICPROFILING:ALTEREDHOMOCYSTEINEPLASMACONCENTRATION
ASSOCIATEDWITHMETFORMINEXPOSURE
AlenaOrlenko1,JasonH.Moore1,PatrykOrzechowski1,RandalS.Olson1,JunmeiCairns2,PedroJ.Caraballo2,RichardM.Weinshilboum2,LieweiWang2,MatthewK.Breitenstein1
1UniversityofPennsylvania,2MayoClinic
Matthew,BreitensteinWiththematurationofmetabolomicsscienceandproliferationofbiobanks,clinicalmetabolicprofilingisanincreasinglyopportunisticfrontierforadvancingtranslationalclinicalresearch.AutomatedMachineLearning(AutoML)approachesprovideexcitingopportunitytoguidefeatureselectioninagnosticmetabolicprofilingendeavors,wherepotentiallythousandsofindependentdatapointsmustbeevaluated.Inpreviousresearch,AutoMLusinghigh-dimensionaldataofvaryingtypeshasbeendemonstrablyrobust,outperformingtraditionalapproaches.However,considerationsforapplicationinclinicalmetabolicprofilingremaintobeevaluated.Particularly,regardingtherobustnessofAutoMLtoidentifyandadjustforcommonclinicalconfounders.Inthisstudy,wepresentafocusedcasestudyregardingAutoMLconsiderationsforusingtheTree-BasedOptimizationTool(TPOT)inmetabolicprofilingofexposuretometformininabiobankcohort.First,weproposeatandemrank-accuracymeasuretoguideagnosticfeatureselectionandcorrespondingthresholddeterminationinclinicalmetabolicprofilingendeavors.Second,whileAutoML,usingdefaultparameters,demonstratedpotentialtolacksensitivitytolow-effectconfoundingclinicalcovariates,wedemonstratedresidualtrainingandadjustmentofmetabolitefeaturesasaneasilyapplicableapproachtoensureAutoMLadjustmentforpotentialconfoundingcharacteristics.Finally,wepresentincreasedhomocysteinewithlong-termexposuretometforminasapotentiallynovel,non-replicatedmetaboliteassociationsuggestedbyTPOT;anassociationnotidentifiedinparallelclinicalmetabolicprofilingendeavors.Whilewarrantingindependentreplication,ourtandemrank-accuracymeasuresuggestshomocysteinetobethemetabolitefeaturewithlargesteffect,andcorrespondingpriorityforfurthertranslationalclinicalresearch.ResidualtrainingandadjustmentforapotentialconfoundingeffectbyBMIonlyslightlymodifiedthesuggestedassociation.IncreasedhomocysteineisthoughttobeassociatedwithvitaminB12deficiency–evaluationforpotentialclinicalrelevanceissuggested.Whileconsiderationsforclinicalmetabolicprofilingarerecommended,includingadjustmentapproachesforclinicalconfounders,AutoMLpresentsanexcitingtooltoenhanceclinicalmetabolicprofilingandadvancetranslationalresearchendeavors.
29
ADDRESSINGVITALSIGNALARMFATIGUEUSINGPERSONALIZEDALARMTHRESHOLDS
SarahPoole,NigamShah
StanfordUniversitySarah,PooleAlarmfatigue,aconditioninwhichclinicalstaffbecomedesensitizedtoalarmsduetothehighfrequencyofunnecessaryalarms,isamajorpatientsafetyconcern.Alarmfatigueisparticularlyprevalentinthepediatricsetting,duetothehighlevelofvariationinvitalsignswithpatientage.Existingstudieshaveshownthatthecurrentdefaultpediatricvitalsignalarmthresholdsareinappropriate,andleadtoalargerthannecessaryalarmload.Thisstudyleveragesalargedatabasecontainingover190patient-yearsofheartratedatatoaccuratelyidentifythe1stand99thpercentilesofanindividual’sheartrateontheirfirstdayofvitalsignmonitoring.Thesepercentilesarethenusedaspersonalizedvitalsignthresholds,whichareevaluatedbycomparingtonon-defaultalarmthresholdsusedinpractice,andbyusingthepresenceofmajorclinicaleventstoinferalarmlabels.Usingtheproposedpersonalizedthresholdswoulddecreaselowandhighheartratealarmsbyupto50%and44%respectively,whilemaintainingsensitivityof62%andincreasingspecificityto49%.Theproposedpersonalizedvitalsignalarmthresholdswillreducealarmfatigue,thuscontributingtoimprovedpatientoutcomes,shorterhospitalstays,andreducedhospitalcosts.
30
EMERGENCEOFPATHWAY-LEVELCOMPOSITEBIOMARKERSFROMCONVERGINGGENESETSIGNALSOFHETEROGENEOUSTRANSCRIPTOMIC
RESPONSES
SamirRachidZaim,QikeLi,A.GrantSchissler,YvesA.Lussier
TheUniversityofArizonaYves,LussierRecentprecisionmedicineinitiativeshaveledtotheexpectationofimprovedclinicaldecision-makinganchoredingenomicdatascience.However,overthelastdecade,onlyahandfulofnewsingle-geneproductbiomarkershavebeentranslatedtoclinicalpractice(FDAapproved)inspiteofconsiderablediscoveryeffortsdeployedandaplethoraoftranscriptomesavailableintheGeneExpressionOmnibus.Withthismodestoutcomeofcurrentapproachesinmind,wedevelopedapilotsimulationstudytodemonstratetheuntappedbenefitsofdevelopingdiseasedetectionmethodsforcaseswherethetruesignalliesatthepathwaylevel,evenifthepathway’sgeneexpressionalterationsmaybeheterogeneousacrosspatients.Inotherwords,werelaxedthecross-patienthomogeneityassumptionfromthetranscriptlevel(cohortassumptionsofderegulatedgeneexpression)tothepathwaylevel(assumptionsofderegulatedpathwayexpression).Furthermore,wehaveexpandedprevioussingle-subject(SS)methodsintocohortanalysestoillustratethebenefitofaccountingforanindividual’svariabilityincohortscenarios.WecompareSSandcohort-based(CB)techniquesunder54distinctscenarios,eachwith1,000simulations,todemonstratethattheemergenceofapathway-levelsignaloccursthroughthesummativeeffectofitsalteredgeneexpression,heterogeneousacrosspatients.Studiedvariablesincludepathwaygenesetsize,fractionofexpressedgeneresponsivewithingeneset,fractionofexpressedgeneresponsiveup-vsdown-regulated,andcohortsize.WedemonstratedthatourSSapproachwasuniquelysuitedtodetectsignalsinheterogeneouspopulationsinwhichindividualshavevaryinglevelsofbaselinerisksthataresimultaneouslyconfoundedbypatient-specific“genome-by-environment”interactions(G×E).Areaundertheprecision-recallcurveoftheSSapproachfarsurpassedthatoftheCB(1stquartile,median,3rdquartile:SS=0.94,0.96,0.99;CB=0.50,0.52,0.65).Weconcludethatsingle-subjectpathwaydetectionmethodsareuniquelysuitedforconsistentlydetectingpathwaydysregulationbytheinclusionofapatient’sindividualvariability.http://www.lussiergroup.org/publications/PathwayMarker/
31
ANALYZINGMETABOLOMICSDATAFORASSOCIATIONWITHGENOTYPESUSINGTWO-COMPONENTGAUSSIANMIXTUREDISTRIBUTIONS
JasonWestra1,2,NicholasHartman3,BethanyLake4,GregoryShearer5,NathanTintle1
1DordtCollege,2IowaStateUniversity,3CornellUniversity,4ElonUniversity,5ThePennsylvaniaStateUniversity
Jason,WestraStandardapproachestoevaluatetheimpactofsinglenucleotidepolymorphisms(SNP)onquantitativephenotypesuselinearmodels.However,thesenormal-basedapproachesmaynotoptimallymodelphenotypeswhicharebetterrepresentedbyGaussianmixturedistributions(e.g.,somemetabolomicsdata).Wedevelopalikelihoodratiotestonthemixingproportionsoftwo-componentGaussianmixturedistributionsandconsidermorerestrictivemodelstoincreasepowerinlightofaprioribiologicalknowledge.Datawassimulatedtovalidatetheimprovedpowerofthelikelihoodratiotestandtherestrictedlikelihoodratiotestoveralinearmodelandalogtransformedlinearmodel.Then,usingrealdatafromtheFraminghamHeartStudy,weanalyzed20,315SNPsonchromosome11,demonstratingthattheproposedlikelihoodratiotestidentifiesSNPswellknowntoparticipateinthedesaturationofcertainfattyacids.OurstudybothvalidatestheapproachofincreasingpowerbyusingthelikelihoodratiotestthatleveragesGaussianmixturemodels,andcreatesamodelwithimprovedsensitivityandinterpretability.
32
READINGBETWEENTHEGENES:COMPUTATIONALMODELSTODISCOVERFUNCTIONAND/ORCLINICALUTILITYFROM
NONCODINGDNA
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
33
CONVERGENTDOWNSTREAMCANDIDATEMECHANISMSOFINDEPENDENTINTERGENICPOLYMORPHISMSBETWEENCO-CLASSIFIEDDISEASESIMPLICATE
EPISTASISAMONGNONCODINGELEMENTS
JialiHan1,JianrongLi1,IkbelAchour1,LorenzoPesce2,IanFoster2,HaiquanLi3,YvesA.Lussier3
1CenterforBiomedicalInformaticsandBiostatistics(CB2)andDepartmentsofMedicineandofSystemsandIndustrialEngineering,TheUniversityofArizona,Tucson,AZ85721,USA;2ComputationInstitute,ArgonneNationalLaboratoryandUniversityofChicago,
Chicago,IL60637,USA;3CB2,BIO5Institute,UACC,andDeptofMedicine,TheUniversityofArizona,Tucson,AZ85721,USA
Haiquan,LiEightypercentofDNAoutsideproteincodingregionswasshownbiochemicallyfunctionalbytheENCODEproject,enablingstudiesoftheirinteractions.Studieshavesinceexploredhowconvergentdownstreammechanismsarisefromindependentgeneticrisksofonecomplexdisease.However,thecross-talkandepistasisbetweenintergenicrisksassociatedwithdistinctcomplexdiseaseshavenotbeencomprehensivelycharacterized.Ourrecentintegrativegenomicanalysisunveileddownstreambiologicaleffectorsofdisease-specificpolymorphismsburiedinintergenicregions,andwethenvalidatedtheirgeneticsynergyandantagonismindistinctGWAS.WeextendthisapproachtocharacterizeconvergentdownstreamcandidatemechanismsofdistinctintergenicSNPsacrossdistinctdiseaseswithinthesameclinicalclassification.Weconstructamultipartitenetworkconsistingof467diseasesorganizedin15classes,2,358disease-associatedSNPs,6,301SNP-associatedmRNAsbyeQTL,andmRNAannotationsto4,538GeneOntologymechanisms.FunctionalsimilaritybetweentwoSNPs(similarSNPpairs)isimputedusinganestedinformationtheoreticdistancemodelforwhichp-valuesareassignedbyconservativescale-freepermutationofnetworkedgeswithoutreplacement(nodedegreesconstant).AtFDR≤5%,weprioritized3,870intergenicSNPpairsassociated,amongwhich755areassociatedwithdistinctdiseasessharingthesamediseaseclass,implicating167intergenicSNPs,14classes,230mRNAs,and134GOterms.Co-classifiedSNPpairsweremorelikelytobeprioritizedascomparedtothoseofdistinctclassesconfirminganoncodinggeneticunderpinningtoclinicalclassification(oddsratio~3.8;p≤10E-25).Theprioritizedpairswerealsoenrichedinregionsboundtothesame/interactingtranscriptionfactorsand/orinteractinginlong-rangechromatininteractionssuggestiveofepistasis(oddsratio~2,500;p≤10E-25).Thisprioritizednetworkimplicatescomplexepistasisbetweenintergenicpolymorphismsofco-classifieddiseasesandoffersaroadmapforanoveltherapeuticparadigm:repositioningmedicationsthattargetproteinswithindownstreammechanismsofintergenicdisease-associatedSNPs.Supplementaryinformationandsoftware:http://lussiergroup.org/publications/disease_class
34
NETWORKANALYSISOFPSEUDOGENE-GENERELATIONSHIPS:FROMPSEUDOGENEEVOLUTIONTOTHEIRFUNCTIONALPOTENTIALS
TravisS.Johnson1,SihongLi1,JohnathanR.Kho2,KunHuang3,YanZhang1
1OhioStateUniversity,2GeorgiaInstituteofTechnology,3IndianaUniversityTravis,JohnsonPseudogenesarefossilrelativesofgenes.Pseudogeneshavelongbeenthoughtofas“junkDNAs”,sincetheydonotcodeproteinsinnormaltissues.Althoughmostofthehumanpseudogenesdonothavenoticeablefunctions,~20%ofthemexhibittranscriptionalactivity.TherehasbeenevidenceshowingthatsomepseudogenesadoptedfunctionsaslncRNAsandworkasregulatorsofgeneexpression.Furthermore,pseudogenescanevenbe“reactivated”insomeconditions,suchascancerinitiation.Somepseudogenesaretranscribedinspecificcancertypes,andsomeareeventranslatedintoproteinsasobservedinseveralcancercelllines.Alltheabovehaveshownthatpseudogenescouldhavefunctionalrolesorpotentialsinthegenome.Evaluatingtherelationshipsbetweenpseudogenesandtheirgenecounterpartscouldhelpusrevealtheevolutionarypathofpseudogenesandassociatepseudogeneswithfunctionalpotentials.Italsoprovidesaninsightintotheregulatorynetworksinvolvingpseudogeneswithtranscriptionalandeventranslationalactivities.Inthisstudy,wedevelopanovelapproachintegratinggraphanalysis,sequencealignmentandfunctionalanalysistoevaluatepseudogene-generelationships,andapplyittohumangenehomologsandpseudogenes.Wegeneratedacomprehensivesetof445pseudogene-gene(PGG)familiesfromtheoriginal3,281genefamilies(13.56%).Ofthese438(98.4%PGG,13.3%total)werenon-trivial(containingmorethanonepseudogene).EachPGGfamilycontainsmultiplegenesandpseudogeneswithhighsequencesimilarity.Foreachfamily,wegenerateasequencealignmentnetworkandphylogenetictreesrecapitulatingtheevolutionarypaths.Wefindevidencesupportingtheevolutionhistoryofolfactoryfamily(bothgenesandpseudogenes)inhuman,whichalsosupportsthevalidityofouranalysismethod.Next,weevaluatethesenetworksinrespecttothegeneontologyfromwhichweidentifyfunctionsenrichedinthesepseudogene-genefamiliesandinferfunctionalimpactofpseudogenesinvolvedinthenetworks.ThisdemonstratestheapplicationofourPGGnetworkdatabaseinthestudyofpseudogenefunctionindiseasecontext.
35
LEVERAGINGPUTATIVEENHANCER-PROMOTERINTERACTIONSTOINVESTIGATETWO-WAYEPISTASISINTYPE2DIABETESGWAS
ElisabettaManduchi1,2,AlessandraChesi2,MollyA.Hall1,StruanF.A.Grant2,JasonH.Moore1
1UniversityofPennsylvania,2TheChildren’sHospitalofPhiladelphia
Elisabetta,ManduchiWeutilizedevidenceforenhancer-promoterinteractionsfromfunctionalgenomicsdatainordertobuildbiologicalfilterstonarrowdownthesearchspacefortwo-waySingleNucleotidePolymorphism(SNP)interactionsinType2Diabetes(T2D)GenomeWideAssociationStudies(GWAS).ThishasledustotheidentificationofareproduciblestatisticallysignificantSNPpairassociatedwithT2D.Asmorefunctionalgenomicsdataarebeinggeneratedthatcanhelpidentifypotentiallyinteractingenhancer-promoterpairsinlargercollectionoftissues/cells,thisapproachhasimplicationsforinvestigationofepistasisfromGWASingeneral.
36
TEXTMININGANDVISUALIZATIONFORPRECISIONMEDICINE
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
37
IMPROVINGPRECISIONINCONCEPTNORMALIZATION
MaylaBoguslav,K.BretonnelCohen,WilliamA.BaumgartnerJr.,LawrenceE.Hunter
ComputationalBioscienceProgram,UniversityofColoradoSchoolofMedicineMayla,BoguslavMostnaturallanguageprocessingapplicationsexhibitatrade-offbetweenprecisionandrecall.Insomeusecasesfornaturallanguageprocessing,therearereasonstoprefertotiltthattrade-offtowardhighprecision.RelyingontheZipfiandistributionoffalsepositiveresults,wedescribeastrategyforincreasingprecision,usingavarietyofbothpre-processingandpost-processingmethods.Theydrawonbothknowledge-basedandfrequentistapproachestomodelinglanguage.Basedonanexistinghigh-performancebiomedicalconceptrecognitionpipelineandapreviouslypublishedmanuallyannotatedcorpus,weapplythishybridrationalist/empiriciststrategytoconceptnormalizationforeightdifferentontologies.Whichapproachesdidanddidnotimproveprecisionvariedwidelybetweentheontologies.
38
VISAGE:INTEGRATINGEXTERNALKNOWLEDGEINTOELECTRONICMEDICALRECORDVISUALIZATION
EdwardW.Huang,ShengWang,ChengXiangZhai
UniversityofIllinoisatUrbana-ChampaignEdward,HuangInthispaper,wepresentVisAGE,amethodthatvisualizeselectronicmedicalrecords(EMRs)inalow-dimensionalspace.Effectivevisualizationofnewpatientsallowsdoctorstoviewsimilar,previouslytreatedpatientsandtoidentifythenewpatients'diseasesubtypes,reducingthechanceofmisdiagnosis.However,EMRsaretypicallyincompleteorfragmented,resultinginpatientswhoaremissingmanyavailablefeaturesbeingplacednearunrelatedpatientsinthevisualizedspace.VisAGEintegratesseveralexternaldatasourcestoenrichEMRdatabasestosolvethisissue.WeevaluatedVisAGEonadatasetofParkinson'sdiseasepatients.WequalitativelyandquantitativelyshowthatVisAGEcanmoreeffectivelyclusterpatients,whichallowsdoctorstobetterdiscoverpatientsubtypesandthusimprovepatientcare.
39
ANNOTATINGGENESETSBYMININGLARGELITERATURECOLLECTIONSWITHPROTEINNETWORKS
ShengWang1,JianzhuMa2,MichaelKuYu2,FanZheng2,EdwardW.Huang1,JiaweiHan1,JianPeng1,TreyIdeker2
1DepartmentofComputerScience,UniversityofIllinoisatUrbana-Champaign,Urbana,IL,USA,2SchoolofMedicine,UniversityofCaliforniaSanDiego,SanDiego,CA,USA
Jianzhu,MaAnalysisofpatientgenomesandtranscriptomesroutinelyrecognizesnewgenesetsassociatedwithhumandisease.Herewepresentanintegrativenaturallanguageprocessingsystemwhichinferscommonfunctionsforagenesetthroughautomaticminingofthescientificliteraturewithbiologicalnetworks.Thissystemlinksgeneswithassociatedliteraturephrasesandcombinestheselinkswithproteininteractionsinasingleheterogeneousnetwork.Multiscalefunctionalannotationsareinferredbasedonnetworkdistancesbetweenphrasesandgenesandthenvisualizedasanontologyofbiologicalconcepts.Toevaluatethissystem,wepredictfunctionsforgenesetsrepresentingknownpathwaysandfindthatourapproachachievessubstantialimprovementovertheconventionaltext-miningbaselinemethod.Moreover,oursystemdiscoversnovelannotationsforgenesetsorpathwayswithoutpreviouslyknownfunctions.Twocasestudiesdemonstratehowthesystemisusedindiscoveryofnewcancer-relatedpathwayswithontologicalannotations.
40
APPLICATIONSOFGENETICS,GENOMICSANDBIOINFORMATICSINDRUGDISCOVERY
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
41
PREDICTIONOFPROTEIN-LIGANDINTERACTIONSFROMPAIREDPROTEINSEQUENCEMOTIFSANDLIGANDSUBSTRUCTURES
PeytonGreenside,MaureenHillenmeyer,AnshulKundaje
StanfordUniversityPeyton,GreensideIdentificationofsmallmoleculeligandsthatbindtoproteinsisacriticalstepindrugdiscovery.Computationalmethodshavebeendevelopedtoacceleratethepredictionofprotein-ligandbinding,butoftendependon3Dproteinstructures.Asonlyalimitednumberofprotein3Dstructureshavebeenresolved,theabilitytopredictprotein-ligandinteractionswithoutrelyingona3Drepresentationwouldbehighlyvaluable.Weuseaninterpretableconfidence-ratedboostingalgorithmtopredictprotein-ligandinteractionswithhighaccuracyfromligandchemicalsubstructuresandprotein1Dsequencemotifs,withoutrelyingon3Dproteinstructures.Wecompareseveralproteinmotifdefinitions,assessgeneralizationofourmodel’spredictionstounseenproteinsandligands,demonstraterecoveryofwellestablishedinteractionsandidentifygloballypredictiveprotein-ligandmotifpairs.Bybridgingbiologicalandchemicalperspectives,wedemonstratethatitispossibletopredictprotein-ligandinteractionsusingonlymotif-basedfeaturesandthatinterpretationofthesefeaturescanrevealnewinsightsintothemolecularmechanicsunderlyingeachinteraction.Ourworkalsolaysafoundationtoexploremorepredictivefeaturesetsandsophisticatedmachinelearningapproachesaswellasotherapplications,suchaspredictingunintendedinteractionsortheeffectsofmutations.
42
LOSS-OF-FUNCTIONOFNEUROPLASTICITY-RELATEDGENESCONFERSRISKFORHUMANNEURODEVELOPMENTALDISORDERS
MiloR.Smith,BenjaminS.Glicksberg,LiLi,RongChen,HirofumiMorishita,JoelT.Dudley
IcahnSchoolofMedicineatMountSinai
Milo,SmithHighandincreasingprevalenceofneurodevelopmentaldisordersplaceenormouspersonalandeconomicburdensonsociety.Giventhegrowingrealizationthattherootsofneurodevelopmentaldisordersoftenlieinearlychildhood,thereisanurgentneedtoidentifychildhoodriskfactors.Neurodevelopmentismarkedbyperiodsofheightenedexperience-dependentneuroplasticitywhereinneuralcircuitryisoptimizedbytheenvironment.Ifthesecriticalperiodsaredisrupted,developmentofnormalbrainfunctioncanbepermanentlyaltered,leadingtoneurodevelopmentaldisorders.Here,weaimtosystematicallyidentifyhumanvariantsinneuroplasticity-relatedgenesthatconferriskforneurodevelopmentaldisorders.Historically,thisknowledgehasbeenlimitedbyalackoftechniquestoidentifygenesrelatedtoneurodevelopmentalplasticityinahigh-thoughputmannerandalackofmethodstosystematicallyidentifymutationsinthesegenesthatconferriskforneurodevelopmentaldisorders.Usinganintegrativegenomicsapproach,wedeterminedloss-of-function(LOF)variantsinputativeplasticitygenes,identifiedfromtranscriptionalprofilesofbrainfrommicewithelevatedplasticity,thatwereassociatedwithneurodevelopmentaldisorders.Fromfiveshareddifferentiallyexpressedgenesfoundintwomousemodelsofjuvenile-likeelevatedplasticity(juvenilewild-typeoradultLynx1-/-relativetoadultwild-type)thatwerealsogenotypedintheMountSinaiBioMeBiobankweidentifiedmultipleassociationsbetweenLOFgenesandincreasedriskforneurodevelopmentaldisordersacross10,510patientslinkedtotheMountSinaiElectronicMedicalRecords(EMR),includingepilepsyandschizophrenia.Thisworkdemonstratesanovelapproachtoidentifyneurodevelopmentalriskgenesandpointstowardapromisingavenuetodiscovernewdrugtargetstoaddresstheunmettherapeuticneedsofneurodevelopmentaldisease.
43
DIFFUSIONMAPPINGOFDRUGTARGETSONDISEASESIGNALINGNETWORKELEMENTSREVEALSDRUGCOMBINATIONSTRATEGIES
JielinXu1,KellyRegan1,SiyuanDeng1,WilliamE.CarsonIII2,PhilipR.O.Payne3,FuhaiLi1
1DeptartmentofBiomedicalInformatics,TheOhioStateUniversity;2ComprehensiveCancerCenter,TheOhioStateUniversity;3InstituteforInformatics,Washington
UniversityinSt.LouisFuhai,LiTheemergenceofdrugresistancetotraditionalchemotherapyandnewertargetedtherapiesincancerpatientsisamajorclinicalchallenge.Reactivationofthesameorcompensatorysignalingpathwaysisacommonclassofdrugresistancemechanisms.Employingdrugcombinationsthatinhibitmultiplemodulesofreactivatedsignalingpathwaysisapromisingstrategytoovercomeandpreventtheonsetofdrugresistance.However,withthousandsofavailableFDA-approvedandinvestigationalcompounds,itisinfeasibletoexperimentallyscreenmillionsofpossibledrugcombinationswithlimitedresources.Therefore,computationalapproachesareneededtoconstrainthesearchspaceandprioritizesynergisticdrugcombinationsforpreclinicalstudies.Inthisstudy,weproposeanovelapproachforpredictingdrugcombinationsthroughinvestigatingpotentialeffectsofdrugtargetsondiseasesignalingnetwork.Wefirstconstructadiseasesignalingnetworkbyintegratinggeneexpressiondatawithdisease-associateddrivergenes.Individualdrugsthatcanpartiallyperturbthediseasesignalingnetworkarethenselectedbasedonadrug-diseasenetwork“impactmatrix”,whichiscalculatedusingnetworkdiffusiondistancefromdrugtargetstosignalingnetworkelements.Theselecteddrugsaresubsequentlyclusteredintocommunities(subgroups),whichareproposedtosharesimilarmechanismsofaction.Finally,drugcombinationsarerankedaccordingtomaximalimpactonsignalingsub-networksfromdistinctmechanism-basedcommunities.Ourmethodisadvantageouscomparedtootherapproachesinthatitdoesnotrequirelargeamountsdrugdoseresponsedata,drug-induced“omics”profilesorclinicalefficacydata,whicharenotoftenreadilyavailable.WevalidateourapproachusingaBRAF-mutantmelanomasignalingnetworkandcombinatorialinvitrodrugscreeningdata,andreportdrugcombinationswithdiversemechanismsofactionandopportunitiesfordrugrepositioning.
44
CHALLENGESOFPATTERNRECOGNITIONINBIOMEDICALDATA
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
45
OWL-NETS:TRANSFORMINGOWLREPRESENTATIONSFORIMPROVEDNETWORKINFERENCE
TiffanyJ.Callahan1,WilliamA.BaumgartnerJr.1,MichaelBada1,AdrianneL.Stefanski1,IgnacioTripodi2,ElizabethK.White1,LawrenceE.Hunter1
1UniversityofColoradoDenverAnschutzMedicalCampus,2UniversityofColorado
BoulderTiffany,CallahanOurknowledgeofthebiologicalmechanismsunderlyingcomplexhumandiseaseislargelyincomplete.WhileSemanticWebtechnologies,suchastheWebOntologyLanguage(OWL),providepowerfultechniquesforrepresentingexistingknowledge,well-establishedOWLreasonersareunabletoaccountformissingoruncertainknowledge.Theapplicationofinductiveinferencemethods,likemachinelearningandnetworkinferencearevitalforextendingourcurrentknowledge.Therefore,robustmethodswhichfacilitateinductiveinferenceonrichOWL-encodedknowledgeareneeded.Here,weproposeOWL-NETS(NEtworkTransformationforStatisticallearning),anovelcomputationalmethodthatreversiblyabstractsOWL-encodedbiomedicalknowledgeintoanetworkrepresentationtailoredfornetworkinference.UsingseveralexamplesbuiltwiththeOpenBiomedicalOntologies,weshowthatOWL-NETScanleverageexistingontology-basedknowledgerepresentationsandnetworkinferencemethodstogeneratenovel,biologically-relevanthypotheses.Further,thelosslesstransformationofOWL-NETSallowsforseamlessintegrationofinferrededgesbackintotheoriginalknowledgebase,extendingitscoverageandcompleteness.
46
ANULTRA-FASTANDSCALABLEQUANTIFICATIONPIPELINEFORTRANSPOSABLEELEMENTSFROMNEXTGENERATIONSEQUENCINGDATA
Hyun-HwanJeong,HariKrishnaYalamanchili,CaiweiGuo,Joshua,M.Shulman,ZhandongLiu
CollegeofMedicine,JanandDanDuncanNeurologicalResearchInstitute
Hyun-Hwan,JeongTransposableelements(TEs)areDNAsequenceswhicharecapableofmovingfromonelocationtoanotherandrepresentalargeproportion(45%)ofthehumangenome.TEshavefunctionalrolesinavarietyofbiologicalphenomenasuchascancer,neurodegenerativedisease,andaging.RapiddevelopmentinRNA-sequencingtechnologyhasenabledus,forthefirsttime,tostudytheactivityofTEatthesystemslevel.However,efficientTEanalysistoolsarenotyetdeveloped.Inthiswork,wedevelopedSalmonTE,afastandreliablepipelineforthequantificationofTEsfromRNA-seqdata.WebenchmarkedourtoolagainstTEtranscripts,awidelyusedTEquantificationmethod,andthreeotherquantificationmethodsusingseveralRNA-seqdatasetsfromDrosophilamelanogasterandhumancell-line.Weachieved20timesfasterexecutionspeedwithoutcompromisingtheaccuracy.ThispipelinewillenablethebiomedicalresearchcommunitytoquantifyandanalyzeTEsfromlargeamountsofdataandleadtonovelTEcentricdiscoveries.
47
IMPROVINGTHEEXPLAINABILITYOFRANDOMFORESTCLASSIFIER–USERCENTEREDAPPROACH
DragutinPetkovic1,3,RussB.Altman2,MikeWong3,ArthurVigil4
1ComputerScienceDepartment,SanFranciscoStateUniversity(SFSU),1600HollowayAve.,SanFranciscoCA94132,[email protected];2DepartmentofBioengineering,
StanfordUniversity,443ViaOrtegaDrive,Stanford,CA94305-4145;3SFSUCenterforComputingforLifeSciences,1600HollowayAve.,SanFrancisco,CA94132;4Twist
Bioscience,455MissionBayBoulevardSouth,SanFrancisco,CA94158Dragutin,PetkovicMachineLearning(ML)methodsarenowinfluencingmajordecisionsaboutpatientcare,newmedicalmethods,drugdevelopmentandtheiruseandimportancearerapidlyincreasinginallareas.However,theseMLmethodsareinherentlycomplexandoftendifficulttounderstandandexplainresultinginbarrierstotheiradoptionandvalidation.Ourwork(RFEX)focusesonenhancingRandomForest(RF)classifierexplainabilitybydevelopingeasytointerpretexplainabilitysummaryreportsfromtrainedRFclassifiersasawaytoimprovetheexplainabilityfor(oftennon-expert)users.RFEXisimplementedandextensivelytestedonStanfordFEATUREdatawhereRFistaskedwithpredictingfunctionalsitesin3Dmoleculesbasedontheirelectrochemicalsignatures(features).IndevelopingRFEXmethodweapplyuser-centeredapproachdrivenbyexplainabilityquestionsandrequirementscollectedbydiscussionswithinterestedpractitioners.Weperformedformalusabilitytestingwith13expertandnon-expertuserstoverifyRFEXusefulness.AnalysisofRFEXexplainabilityreportanduserfeedbackindicatesitsusefulnessinsignificantlyincreasingexplainabilityanduserconfidenceinRFclassificationonFEATUREdata.Notably,RFEXsummaryreportseasilyrevealthatoneneedsveryfew(from2-6dependingonamodel)toprankedfeaturestoachieve90%orbetteroftheaccuracywhenall480featuresareused.Keywords:RandomForest,Explainability,Interpretability,StanfordFEATURE
48
TREE-BASEDMETHODSFORCHARACTERIZINGTUMORDENSITYHETEROGENEITY
KatherineShoemaker1,BrianP.Hobbs2,KarthikBharath3,ChaanS.Ng2,VeerabhadranBaladandayuthapani2
1RiceUniversity,2MDAndersonCancerCenter,3UniversityofNottingham
Katherine,ShoemakerSolidlesionsemergewithindiversetissueenvironmentsmakingtheircharacterizationanddiagnosisachallenge.Withtheadventofcancerradiomics,avarietyoftechniqueshavebeendevelopedtotransformimagesintoquantifiablefeaturesetsproducingsummarystatisticsthatdescribethemorphologyandtextureofsolidmasses.Relyingonempiricaldistributionsummariesaswellasgrey-levelco-occurrencestatistics,severalapproacheshavebeendevisedtocharacterizetissuedensityheterogeneity.Thisarticleproposesanoveldecision-treebasedapproachwhichquantifiesthetissuedensityheterogeneityofagivenlesionthroughitsresultantdistributionoftree-structureddissimilaritymetricscomputedwithleastcommonancestortreesunderrepeatedpixelre-sampling.Themethodology,basedonstatisticsderivedfromGalton-Watsontrees,producesmetricsthatareminimallycorrelatedwithexistingfeatures,addingnewinformationtothefeaturespaceandimprovingquantitativecharacterizationoftheextenttowhichaCTimageconveysheterogeneousdensitydistribution.Wedemonstrateitspracticalapplicationthroughadiagnosticstudyofadrenallesions.Integratingtheproposedwithexistingfeaturesidentifiesclassifiersofthreeimportantlesiontypes;malignantfrombenign(AUC=0.78),functioningfromnon-functioning(AUC=0.93)andcalcifiedfromnon-calcified(AUCof1).
49
DEMOCRATIZINGHEALTHDATAFORTRANSLATIONALRESEARCH
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
50
IDENTIFYINGNATURALHEALTHPRODUCTANDDIETARYSUPPLEMENTINFORMATIONWITHINADVERSEEVENTREPORTINGSYSTEMS
VivekanandSharma,IndraNeilSarkar
CenterforBiomedicalInformatics,BrownUniversityVivekanand,Sharma
Dataonsafetyandefficacyissuesassociatedwithnaturalhealthproductsanddietarysupplements(NHP&S)remainslargelycloisteredwithindomainspecificdatabasesorembeddedwithingeneralbiomedicaldatasources.Amajorchallengeinleveraginganalyticapproachesonsuchdataisduetotheinefficientabilitytoretrieverelevantdata,whichincludesagenerallackofinteroperabilityamongrelatedsources.ThisstudydevelopedathesaurusofNHP&Singredienttermsthatcanbeusedbyexistingbiomedicalnaturallanguageprocessing(NLP)toolsforextractinginformationofinterest.ThisprocesswasevaluatedrelativetointerventionnamestringssampledfromtheUnitedStatesFoodandDrugAdministrationAdverseEventReportingSystem(FAERS).AusecasewasusedtodemonstratethepotentialtoutilizeFAERSformonitoringNHP&Sadverseevents.Theresultsfromthisstudyprovideinsightsonapproachesforidentifyingadditionalknowledgefromextantrepositoriesofknowledge,andpotentiallyasinformationthatcanbeincludedintolargercurationefforts.
51
DEMOCRATIZINGDATASCIENCETHROUGHDATASCIENCETRAINING
JohnDarrellVanHorn1,LilyFierro2,JeanaKamdar1,JonathanGordon2,CrystalStewart1,AvnishBhattrai1,SumikoAbe1,XiaoxiaoLei1,CarolineO’Driscoll1,AakanchhaSinha2,
PriyambadaJain2,GullyBurns2,KristinaLerman2,JoséLuisAmbite2
1USCMarkandMaryStevensNeuroimagingandInformaticsInstitute,KeckSchoolofMedicineofUSC,UniversityofSouthernCalifornia,2025ZonalAvenue,SHN,Los
Angeles,CA90033,Phone:323-442-7246;2InformationSciencesInstitute,UniversityofSouthernCalifornia,MarinadelRey,CA,USA
John,VanHornThebiomedicalscienceshaveexperiencedanexplosionofdatawhichpromisestooverwhelmmanycurrentpractitioners.Withouteasyaccesstodatasciencetrainingresources,biomedicalresearchersmayfindthemselvesunabletowrangletheirowndatasets.In2014,toaddressthechallengesposedsuchadataonslaught,theNationalInstitutesofHealth(NIH)launchedtheBigDatatoKnowledge(BD2K)initiative.Tothisend,theBD2KTrainingCoordinatingCenter(TCC;bigdatau.org)wasfundedtofacilitatebothin-personandonlinelearning,andopenuptheconceptsofdatasciencetothewidestpossibleaudience.Here,wedescribetheactivitiesoftheBD2KTCCanditsfocusontheconstructionoftheEducationalResourceDiscoveryIndex(ERuDIte),whichidentifies,collects,describes,andorganizesonlinedatasciencematerialsfromBD2Kawardees,openonlinecourses,andvideosfromscientificlecturesandtutorials.ERuDItenowindexesover9,500resources.Giventherichnessofonlinetrainingmaterialsandtheconstantevolutionofbiomedicaldatascience,computationalmethodsapplyinginformationretrieval,naturallanguageprocessing,andmachinelearningtechniquesarerequired-ineffect,usingdatasciencetoinformtrainingindatascience.Insodoing,theTCCseekstodemocratizenovelinsightsanddiscoveriesbroughtforthvialarge-scaledatasciencetraining.
52
IMAGINGGENOMICS
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
53
HERITABILITYESTIMATESONRESTINGSTATEFMRIDATAUSINGTHEENIGMAANALYSISPIPELINE
BhimM.Adhikari1,NedaJahanshad2,DineshShukla1,DavidC.Glahn3,JohnBlangero4,RichardC.Reynolds5,RobertW.Cox5,ElsFieremans6,JelleVeraart6,DmitryS.Novikov6,
ThomasE.Nichols7,L.ElliotHong1,PaulM.Thompson2,PeterKochunov1
1MarylandPsychiatricResearchCenter,DepartmentofPsychiatry,UniversityofMarylandSchoolofMedicine,Baltimore,MD,USA;2ImagingGeneticsCenter,StevensInstituteforNeuroimaging&Informatics,KeckSchoolofMedicineofUSC,MarinadelRey,CA,USA;3DepartmentofPsychiatry,YaleUniversity,SchoolofMedicine,New
Haven,CT,USA;4GenomicsComputingCenter,UniversityofTexasatRioGrandeValley,USA;5NationalInstituteofMentalHealth,Bethesda,MD,USA;6CenterforBiomedicalImaging,DepartmentofRadiology,NewYorkUniversitySchoolofMedicine,NY,USA;
7DepartmentofStatistics,UniversityofWarwick,Coventry,CV47AL,UKPeter,KochunovBigdatainitiativessuchastheEnhancingNeuroImagingGeneticsthroughMeta-Analysisconsortium(ENIGMA),combinedatacollectedbyindependentstudiesworldwidetoachievemoregeneralizableestimatesofeffectsizesandmorereliableandreproducibleoutcomes.Sucheffortsrequireharmonizedimageanalysesprotocolstoextractphenotypesconsistently.ThisharmonizationisparticularlychallengingforrestingstatefMRIduetothewidevariabilityofacquisitionprotocolsandscannerplatforms;thisleadstosite-to-sitevarianceinquality,resolutionandtemporalsignal-to-noiseratio(tSNR).Aneffectiveharmonizationshouldprovideoptimalmeasuresfordataofdifferentqualities.Wedevelopedamulti-sitersfMRIanalysispipelinetoallowresearchgroupsaroundtheworldtoprocessrsfMRIscansinaharmonizedway,toextractconsistentandquantitativemeasurementsofconnectivityandtoperformcoordinatedstatisticaltests.Weusedthesingle-modalityENIGMArsfMRIpreprocessingpipelinebasedonmodel-freeMarchenko-PasturPCAbaseddenoisingtoverifyandreplicaterestingstatenetworkheritabilityestimates.Weanalyzedtwoindependentcohorts,GOBS(GeneticsofBrainStructure)andHCP(theHumanConnectomeProject),whichcollecteddatausingconventionalandconnectomicsorientedfMRIprotocols,respectively.Weusedseed-basedconnectivityanddual-regressionapproachestoshowthatthersfMRIsignalisconsistentlyheritableacrosstwentymajorfunctionalnetworkmeasures.Heritabilityvaluesof20-40%wereobservedacrossbothcohorts.
54
MRITOMGMT:PREDICTINGMETHYLATIONSTATUSINGLIOBLASTOMAPATIENTSUSINGCONVOLUTIONALRECURRENTNEURALNETWORKS
LichyHan,MaulikR.Kamdar
PrograminBiomedicalInformatics,StanfordUniversitySchoolofMedicineLichy,HanGlioblastomaMultiforme(GBM),amalignantbraintumor,isamongthemostlethalofallcancers.TemozolomideistheprimarychemotherapytreatmentforpatientsdiagnosedwithGBM.ThemethylationstatusofthepromoterortheenhancerregionsoftheO6-methylguaninemethyltransferase(MGMT)genemayimpacttheefficacyandsensitivityoftemozolomide,andhencemayaffectoverallpatientsurvival.Microscopicgeneticchangesmaymanifestasmacroscopicmorphologicalchangesinthebraintumorsthatcanbedetectedusingmagneticresonanceimaging(MRI),whichcanserveasnoninvasivebiomarkersfordeterminingmethylationofMGMTregulatoryregions.Inthisresearch,weuseacompendiumofbrainMRIscansofGBMpatientscollectedfromTheCancerImagingArchive(TCIA)combinedwithmethylationdatafromTheCancerGenomeAtlas(TCGA)topredictthemethylationstateoftheMGMTregulatoryregionsinthesepatients.Ourapproachreliesonabi-directionalconvolutionalrecurrentneuralnetworkarchitecture(CRNN)thatleveragesthespatialaspectsofthese3-dimensionalMRIscans.OurCRNNobtainsanaccuracyof67%onthevalidationdataand62%onthetestdata,withprecisionandrecallbothat67%,suggestingtheexistenceofMRIfeaturesthatmaycomplementexistingmarkersforGBMpatientstratificationandprognosis.Wehaveadditionallypresentedourmodelviaanovelneuralnetworkvisualizationplatform,whichwehavedevelopedtoimproveinterpretabilityofdeeplearningMRI-basedclassificationmodels.
55
BUILDINGTRANS-OMICSEVIDENCE:USINGIMAGINGAND‘OMICS’TOCHARACTERIZECANCERPROFILES
ArunimaSrivastava1,ChaitanyaKulkarni1,ParagMallick2,KunHuang3,RaghuMachiraju1
1TheOhioStateUniversity,2StanfordUniversity,3IndianaUniversitySchoolofMedicineArunima,SrivastavaUtilizationofsinglemodalitydatatobuildpredictivemodelsincancerresultsinarathernarrowviewofmostpatientprofiles.Someclinicalfacetsrelatestronglytohistologyimagefeatures,e.g.tumorstages,whereasothersareassociatedwithgenomicandproteomicvariations(e.g.cancersubtypesanddiseaseaggressionbiomarkers).Wehypothesizethattherearecoherent“trans-omics”featuresthatcharacterizevariedclinicalcohortsacrossmultiplesourcesofdataleadingtomoredescriptiveandrobustdiseasecharacterization.Inthiswork,for105breastcancerpatientsfromtheTCGA(TheCancerGenomeAtlas),weconsiderfourclinicalattributes(AJCCStage,TumorStage,ER-StatusandPAM50mRNASubtypes),andbuildpredictivemodelsusingthreedifferentmodalitiesofdata(histopathologicalimages,transcriptomicsandproteomics).Followingwhich,weidentifycriticalmulti-levelfeaturesthatdrivesuccessfulclassificationofpatientsforthevariousdifferentcohorts.Tobuildpredictorsforeachdatatype,weemploywidelyused“bestpractice”techniquesincludingCNN-based(convolutionalneuralnetwork)classifiersforhistopathologicalimagesandregressionmodelsforproteogenomicdata.While,asexpected,histologyimagesoutperformedmolecularfeatureswhilepredictingcancerstages,andtranscriptomicsheldsuperiordiscriminatorypowerforER-StatusandPAM50subtypes,thereexistafewcaseswherealldatamodalitiesexhibitedcomparableperformance.Further,wealsoidentifiedsetsofkeygenesandproteinswhoseexpressionandabundancecorrelateacrosseachclinicalcohortincluding(i)tumorseverityandprogression(incl.GABARAP),(ii)ER-status(incl.ESR1)and(iii)diseasesubtypes(incl.FOXC1).Thus,wequantitativelyassesstheefficacyofdifferentdatatypestopredictcriticalbreastcancerpatientattributesandimprovediseasecharacterization.
56
PRECISIONMEDICINE:FROMDIPLOTYPESTODISPARITIESTOWARDSIMPROVEDHEALTHANDTHERAPIES
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
57
LOCALANCESTRYTRANSITIONSMODIFYSNP-TRAITASSOCIATIONS
AlexandraE.Fish1,DanaC.Crawford2,JohnA.Capra1,WilliamS.Bush2
1VanderbiltUniversity,2CaseWesternReserveUniversityWilliam,BushGenomicmapsoflocalancestryidentifyancestrytransitions–pointsonachromosomewhererecentrecombinationeventsinadmixedindividualshavejoinedtwodifferentancestralhaplotypes.Theseeventsbringtogetherallelesthatevolvedwithinseparatecontinentialpopulations,providingauniqueopportunitytoevaluatethejointeffectoftheseallelesonhealthoutcomes.Inthiswork,weevaluatetheimpactofgeneticvariantsinthecontextofnearbylocalancestrytransitionswithinasampleofnearly10,000adultsofAfricanancestrywithtraitsderivedfromelectronichealthrecords.GeneticdatawaslocatedusingtheMetabochip,andusedtoderivelocalancestry.Wedevelopamodelthatcapturestheeffectofbothsinglevariantsandlocalancestry,anduseittoidentifyexampleswherelocalancestrytransitionssignificantlyinteractwithnearbyvariantstoinfluencemetabolictraits.Inourmostcompellingexample,wefindthattheminoralleleofrs16890640occuringonaEuropeanbackgroundwithadownstreamlocalancestrytransitiontoAfricanancestryresultsinsignificantlylowermeancorpuscularhemoglobinandvolume.Thisfindingrepresentsanewwayofdiscoveringgeneticinteractions,andissupportedbymoleculardatathatsuggestchangestolocalancestrymayimpactlocalchromatinlooping.
58
EVALUATIONOFPREDIXCANFORPRIORITIZINGGWASASSOCIATIONSANDPREDICTINGGENEEXPRESSION
BinglanLi1,ShefaliS.Verma1,2,YogasudhaC.Veturi2,AnuragVerma1,2,YukiBradford2,DavidW.Haas3,4,MarylynD.Ritchie1,2
1TheHuckInstitutesoftheLifeSciences,ThePennsylvaniaStateUniversity,UniversityPark,PA,USA;2BiomedicalandTranslationalInformaticsInstitute,Danville,PA,USA;3DepartmentofMedicine,Pharmacology,Pathology,Microbiology&Immunology,
VanderbiltUniversitySchoolofMedicine,Nashville,TN,USA;4DepartmentofInternalMedicine,MeharryMedicalCollege,Nashville,TN,USA
Binglan,LiGenome-wideassociationstudies(GWAS)havebeensuccessfulinfacilitatingtheunderstandingofgeneticarchitecturebehindhumandiseases,butthisapproachfacesmanychallenges.Toidentifydisease-relatedlociwithmodesttoweakeffectsize,GWASrequiresverylargesamplesizes,whichcanbecomputationalburdensome.Inaddition,theinterpretationofdiscoveredassociationsremainsdifficult.PrediXcanwasdevelopedtohelpaddresstheseissues.WithbuiltinSNP-expressionmodels,PrediXcanisabletopredicttheexpressionofgenesthatareregulatedbyputativeexpressionquantitativetraitloci(eQTLs),andthesepredictedexpressionlevelscanthenbeusedtoperformgene-basedassociationstudies.Thisapproachreducesthemultipletestingburdenfrommillionsofvariantsdowntoseveralthousandgenes.Butmostimportantly,theidentifiedassociationscanrevealthegenesthatareunderregulationofeQTLsandconsequentlyinvolvedindiseasepathogenesis.Inthisstudy,twoofthemostpracticalfunctionsofPrediXcanweretested:1)predictinggeneexpression,and2)prioritizingGWASresults.WetestedthepredictionaccuracyofPrediXcanbycomparingthepredictedandobservedgeneexpressionlevels,andalsolookedintosomepotentialinfluentialfactorsandafiltercriterionwiththeaimofimprovingPrediXcanperformance.AsforGWASprioritization,predictedgeneexpressionlevelswereusedtoobtaingene-traitassociations,andbackgroundregionsofsignificantassociationswereexaminedtodecreasethelikelihoodoffalsepositives.Ourresultsshowedthat1)PrediXcanpredictedgeneexpressionlevelsaccuratelyforsomebutnotallgenes;2)includingmoreputativeeQTLsintopredictiondidnotimprovethepredictionaccuracy;and3)integratingpredictedgeneexpressionlevelsfromthetwoPrediXcanwholebloodmodelsdidnoteliminatefalsepositives.Still,PrediXcanwasabletoprioritizeGWASassociationsthatwerebelowthegenome-widesignificancethresholdinGWAS,whileretainingGWASsignificantresults.ThisstudysuggestsseveralwaystoconsiderPrediXcan’sperformancethatwillbeofvaluetoeQTLandcomplexhumandiseaseresearch.
59
READINGBETWEENTHEGENES:COMPUTATIONALMODELSTODISCOVERFUNCTIONAND/ORCLINICALUTILITYFROM
NONCODINGDNA
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
60
PAN-CANCERANALYSISOFEXPRESSEDSOMATICNUCLEOTIDEVARIANTSINLONGINTERGENICNON-CODINGRNA
TraversChing1,2,LanaX.Garmire1,2
1MolecularBiosciencesandBioengineeringGraduateProgram,UniversityofHawaiiatManoaHonolulu,HI96822,USA;2EpidemiologyProgram,UniversityofHawaiiCancer
CenterHonolulu,HI96813,USALana,GarmireLongintergenicnon-codingRNAshavebeenshowntoplayimportantrolesincancer.However,becauselincRNAsarearelativelynewclassofRNAscomparedtoprotein-codingmRNAs,themutationallandscapeoflincRNAshasnotbeenasextensivelystudied.HerewecharacterizeexpressedsomaticnucleotidevariantswithinlincRNAsusing12cancerRNA-SeqdatasetsinTCGA.Webuildmachine-learningmodelstodiscriminatesomaticvariantsfromgermlinevariantswithinlincRNAregions(AUC0.987).WebuildanothermodeltodifferentiatelincRNAsomaticmutationsfrombackgroundregions(AUC0.72)andfindseveralmolecularfeaturesthatarestronglyassociatedwithlincRNAmutations,includingcopynumbervariation,conservation,substitutiontypeandhistonemarkerfeatures.
61
TEXTMININGANDVISUALIZATIONFORPRECISIONMEDICINE
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
62
GENEDIVE:AGENEINTERACTIONSEARCHANDVISUALIZATIONTOOLTOFACILITATEPRECISIONMEDICINE
PaulPrevide1,BrookThomas1,MikeWong1,EmilyK.Mallory2,DragutinPetkovic1,RussB.Altman2,AnaghaKulkarni1
1SanFranciscoStateUniversity,2StanfordUniversity
Anagha,KulkarniObtainingrelevantinformationaboutgeneinteractionsiscriticalforunderstandingdiseaseprocessesandtreatment.Withtheriseintextminingapproaches,thevolumeofsuchbiomedicaldataisrapidlyincreasing,therebycreatinganewproblemfortheusersofthisdata:informationoverload.Atoolforefficientqueryingandvisualizationofbiomedicaldatathathelpsresearchersunderstandtheunderlyingbiologicalmechanismsfordiseasesanddrugresponses,andultimatelyhelpspatients,issorelyneeded.TothisendwehavedevelopedGeneDive,aweb-basedinformationretrieval,filtering,andvisualizationtoolforlargevolumesofgeneinteractiondata.GeneDiveoffersvariousfeaturesandmodalitiesthatguidetheuserthroughthesearchprocesstoefficientlyreachtheinformationoftheirinterest.GeneDivecurrentlyprocessesoverthreemilliongene-geneinteractionswithresponsetimeswithinafewseconds.Foroverhalfofthecuratedgenesetssourcedfromfourprominentdatabases,morethan80%ofthegenesetmembersarerecoveredbyGeneDive.Inthenearfuture,GeneDivewillseamlesslyaccommodateotherinteractiontypes,suchasgene-drugandgene-diseaseinteractions,thusenablingfullexplorationoftopicssuchasprecisionmedicine.TheGeneDiveapplicationandinformationaboutitsunderlyingsystemarchitectureareavailableathttp://www.genedive.net.
63
APPLICATIONSOFGENETICS,GENOMICSANDBIOINFORMATICSINDRUGDISCOVERY
POSTERPRESENTATIONS
64
CELL-SPECIFICPREDICTIONANDAPPLICATIONOFDRUG-INDUCEDGENEEXPRESSIONPROFILES
RachelHodos1,2,PingZhang3,Hao-ChihLee1,QiaonanDuan1,ZichenWang1,NeilR.Clark1,AviMa’ayan1,FeiWang3,4,BrianKidd1,JianyingHu3,DavidSontag5,JoelT.
Dudley1
1IcahnSchoolofMedicineatMountSinai,2NewYorkUniversity,3IBMT.J.WatsonResearchCenter,4CornellUniversity,5MassachusettsInstituteofTechnology
Rachel,HodosGeneexpressionprofilingofinvitrodrugperturbationsisusefulformanybiomedicaldiscoveryapplicationsincludingdrugrepurposingandelucidationofdrugmechanisms.However,limiteddataavailabilityacrosscelltypeshashinderedourcapacitytoleverageorexplorethecell-specificityoftheseperturbations.Whilerecenteffortshavegeneratedalargenumberofdrugperturbationprofilesacrossavarietyofhumancelltypes,manygapsremaininthiscombinatorialdrug-cellspace.Hence,weaskedwhetheritispossibletofillthesegapsbypredictingcell-specificdrugperturbationprofilesusingavailableexpressiondatafromrelatedconditions--i.e.fromotherdrugsandcelltypes.Wedevelopedacomputationalframeworkthatfirstarrangesexistingprofilesintoathree-dimensionalarray(ortensor)indexedbydrugs,genes,andcelltypes,andthenuseseitherlocal(nearest-neighbors)orglobal(tensorcompletion)informationtopredictunmeasuredprofiles.Weevaluatepredictionaccuracyusingavarietyofmetrics,andfindthatthetwomethodshavecomplementaryperformance,eachsuperiorindifferentregionsinthedrug-cellspace.Predictionsachievecorrelationsof0.68withtruevalues,andmaintainaccuratedifferentiallyexpressedgenes(AUC0.81).Finally,wedemonstratethatthepredictedprofilesaddvalueformakingdownstreamassociationswithdrugtargetsandtherapeuticclasses.
65
SYSTEMATICDISCOVERYOFGENOMICMARKERSFORCLINICALOUTCOMESTHROUGHCOMBINEDANALYSISOFCLINICALANDGENOMICDATA
JinhoKim1,HonguiCha2,Hyun-TaeShin2,BoramLee2,JaeWonYun2,JoonHoKang3,Woong-YangPark1
1SamsungMedicalCenter,2SungkyunkwanUniversity,3SungkyunkwanUniversitySchool
ofMedicineJinho,KimMolecularprofilingisakeycomponentofprecisionmedicineforcancer,asitprovidestargetablegeneorpathwaystopreventthetumortogrow.Inthisregard,moreandmorecancerclinicsemployclinicalsequencingplatformandareaccumulatingclinicogenomicsdata.However,ithasnotbeensystematicallystudiedhowgenomicalterationsinparticularvariantsinDNAcanbenefitinpredictingclinicaloutcomes.Herewedescribesystematicanalysestogainbiologicalinsightsfromacancergenomedatabankassociatedwiththeclinicalinformation.WeestablishedalargedatabankofclinicalandgenomicinformationthroughourNGS-basedclinicalsequencingplatform,CancerSCAN.Weidentifiednovelclinicallyrelevantvariantmarkerswhichpotentiallyimplicatedinpatientsurvivalandresponsetochemotherapeuticagents.Finally,webuildamultigenemodeltopredictclinicaloutcome.Themodelcorrectlycapturedclinicallyrelevantsomaticvariantsandwasvalidatedusinganindependentcohort.Ourstudyprovidesavaluableresourcetorealizeprecisiononcology.
66
IDENTIFICATIONOFAPREDICTIVEGENESIGNATUREFORDIFFERENTIATINGTHEEFFECTSOFCIGARETTESMOKING
GangLiu1,JustinLi2,G.L.Prasad1
1RAIServicesCompany,P.O.Box1487,Winston-Salem,NC27102,USA;2AccuraScience,5721MerleHayRoad,Johnston,IA50131,USA
Background:Chroniccigarettesmokingadverselyimpactsmultipleorgansandisamajorriskfactorforseveraldiseasessuchascancer,cardiovasculardiseases,andchronicpulmonaryobstructivedisease(COPD).Becausesmoking-relateddiseasesoftendevelopoveralongperiod,itisusefultoinvestigatetheeffectsofsmokinginhealthyindividualstounderstandthepre-clinicalchangesthatleadtodiseasestates.Thoseearlymoleculareventscouldbefurtherdevelopedintobiomarkersthatareindicativeoftheadverseeffectsofsmoking.Severalclassesofdifferenttobaccoproducts,includingelectroniccigarettes(E-cigs),arecurrentlymarketedintheUSA,andtheirimpactonconsumershasnotyetbeenfullyunderstood.Giventhatthereisnoepidemiologydataavailableforthesenewclassesoftobaccoproducts,anunderstandingoftheearlymolecularandcellularchangesinhealthyconsumerscouldhelptodifferentiatetheeffectsofcigarettesandotherclassesoftobaccoproducts.Towardthatend,inthisstudy,weaimtodeveloppredictivegenesignaturesthatcanbeusedtodifferentiatesmokersfromnon-tobaccoconsumers.Methods:Thedataweusedforidentificationofgenesignatureswerederivedfromblood-basedgenome-wideexpressionprofilesfrom40smokersand40non-tobaccoconsumersenrolledinacross-sectionalbiomarkerstudy.Wesystematicallyevaluatedtheperformanceofseveralmachinelearningalgorithms.Thesealgorithmsarecombinationsoffourclassificationmethods,includingSupportVectorMachine(SVM),andfourfeatureselectionmethodsincludingRecursiveFeatureElimination(RFE).Eachgeneexpressionsignaturemodelwasconstructedusingatwo-layercross-validationscheme.TheywereevaluatedusingaccuracyandMathew’scorrelationcoefficient(MCC),whichareperformanceevaluationmetricswidelyusedinmachinelearningtechniques.Results:OurresultssuggestthatSVMcombinedwithRFEoutperformsthe15otheralgorithmswehavetested.Thisledtoidentificationofa32-genesignaturewithhighsensitivityandspecificity.Inaddition,thisnewgenesignatureachievesexcellentvalidationresults(accuracy:0.87,MCC:0.7)whenevaluatedusinganotherindependentmicroarraydatasetfromsmokersandnon-smokers.Thegenesinthe32-genesignatureincludepreviouslyreportedgenebiomarkerssuchasGPR15,SASH1,andLRRN3,andalsoconsistofadditionalnovelgenesassociatedwithinflammation,liverinjury,andarachidonicacidmetabolism.Wearecurrentlyworkingtofurtherrefineandvalidatethisgenesignatureusingotherpublically-availablesmoking-relatedgeneexpressiondatasetsandthepolymerasechainreaction-basedassay.Conclusions:Wehavedescribedahigh-performing32-genesignaturethatenablespredictionofmolecularchangesinhealthysmokers.ThisgenesignaturecouldaidindifferentiatingtheeffectsofadditionalclassesoftobaccoproductssuchasE-cigs.
67
THEEXTREMEMEMORY®CHALLENGE:ASEARCHFORTHEHERITABLEFOUNDATIONSOFEXCEPTIONALMEMORY
MaryA.Pyc,DouglasFenger,PhilipCheung,J.StevendeBelle,TimTully
DartNeuroScienceDouglas,FengerWeareinterestedindiscoveringcandidatetargetsfordrugtherapiestoenhancecognitivevitalityinhumansthroughoutlife,andtoremediatememorydeficitsassociatedwithbraininjuryandbrain-relateddiseasessuchasAlzheimer’sandParkinson’sdiseases.WeimplementedaGenome-WideAssociationStudy(GWAS)toidentifygeneticlocivaryingamongindividualswhopossessexceptionalandnormalmemoryabilities.Thesegenesandthoseinassociatednetworkswillinformdrugdiscoveryanddevelopment.Ourfirststepistoidentifyexceptionalmembersofthepopulation.Thus,wehavecreatedanonlinememorytest–theExtremeMemoryChallenge(XMC,accessibleathttp://www.extremememorychallenge.com)–toscreenthroughanunlimitednumberofsubjectstofindindividualswithexceptionalmemoryconsolidationabilities.AsubsetofsubjectswerevalidatedbyabatteryofsecondarymemorytasksandprovidedsalivasamplesfromwhichwecanisolateDNAforGWAS.Todate,26,348participantsfrom187nationshavebeenscreened(with16,486completingbothsessions).ThesampleisprimarilyCaucasian(58%),post-secondaryschool-educated(64%),averageageof34yearsold,andequalnumbersofeachgender.Theaverageforgettingrateacrosssessionswas10%.Thesecondaryscreeninginvolvedmemory,IQ,attentionalcontrol,andpersonalitymeasures.Analysesareunderwaytodeterminetherelationshipbetweenexceptionalmemoryandgenetics.
68
EXTRACTINGABIOLOGICALLYRELEVANTLATENTSPACEFROMCANCERTRANSCRIPTOMESWITHVARIATIONALAUTOENCODERS
GregoryP.Way,CaseyS.Greene
GenomicsandComputationalBiologyGraduateProgram,DepartmentofSystemsPharmacologyandTranslationalTherapeutics,UniversityofPennsylvania,Philadelphia,
PA19104USAGregory,WayTheCancerGenomeAtlas(TCGA)hasprofiledover10,000tumorsacross33differentcancer-typesformanygenomicfeatures,includinggeneexpressionlevels.Geneexpressionmeasurementscapturesubstantialinformationaboutthestateofeachtumor.Certainclassesofdeepneuralnetworkmodelsarecapableoflearningameaningfullatentspace.Suchalatentspacecouldbeusedtoexploreandgeneratehypotheticalgeneexpressionprofilesundervarioustypesofmolecularandgeneticperturbation.Forexample,onemightwishtousesuchamodeltopredictatumor'sresponsetospecifictherapiesortocharacterizecomplexgeneexpressionactivationsexistingindifferentialproportionsindifferenttumors.Variationalautoencoders(VAEs)areadeepneuralnetworkapproachcapableofgeneratingmeaningfullatentspacesforimageandtextdata.Inthiswork,wesoughttodeterminetheextenttowhichaVAEcanbetrainedtomodelcancergeneexpression,andwhetherornotsuchaVAEwouldcapturebiologically-relevantfeatures.Inthefollowingreport,weintroduceaVAEtrainedonTCGApan-cancerRNA-seqdata,identifyspecificpatternsintheVAEencodedfeatures,anddiscusspotentialmeritsoftheapproach.Wenameourmethod"Tybalt"afteraninstigative,cat-likecharacterwhosetsacascadingchainofeventsinmotioninShakespeare'sRomeoandJuliet.Fromasystemsbiologyperspective,Tybaltcouldonedayaidincancerstratificationorpredictspecificactivatedexpressionpatternsthatwouldresultfromgeneticchangesortreatmenteffects.
69
CHALLENGESOFPATTERNRECOGNITIONINBIOMEDICALDATAORALPRESENTATION
POSTERPRESENTATIONS
70
LARGE-SCALEANALYSISOFDISEASEPATHWAYSINTHEHUMANINTERACTOME
MonicaAgrawal1,MarinkaZitnik1,JureLeskovec1,2
1DepartmentofComputerScience,StanfordUniversity;2ChanZuckerbergBiohub,SanFrancisco,CA
Marinka,ZitnikDiscoveringdiseasepathways,whichcanbedefinedassetsofproteinsassociatedwithagivendisease,isanimportantproblemthathasthepotentialtoprovideclinicallyactionableinsightsfordiseasediagnosis,prognosis,andtreatment.Computationalmethodsaidthediscoverybyrelyingonprotein-proteininteraction(PPI)networks.Theystartwithafewknowndisease-associatedproteinsandaimtofindtherestofthepathwaybyexploringthePPInetworkaroundtheknowndiseaseproteins.However,thesuccessofsuchmethodshasbeenlimited,andfailurecaseshavenotbeenwellunderstood.HerewestudythePPInetworkstructureof519diseasepathways.Wefindthat90%ofpathwaysdonotcorrespondtosinglewell-connectedcomponentsinthePPInetwork.Instead,proteinsassociatedwithasinglediseasetendtoformmanyseparateconnectedcomponents/regionsinthenetwork.Wethenevaluatestate-of-the-artdiseasepathwaydiscoverymethodsandshowthattheirperformanceisespeciallypoorondiseaseswithdisconnectedpathways.Thus,weconcludethatnetworkconnectivitystructurealonemaynotbesufficientfordiseasepathwaydiscovery.However,weshowthathigher-ordernetworkstructures,suchassmallsubgraphsofthepathway,provideapromisingdirectionforthedevelopmentofnewmethods.
71
PROFILINGOFSOMATICALTERATIONSINBRCA1-LIKEBREASTTUMORS
YoudinghuanChen1,2,3,YueWang3,4,LucasA.Salas1,ToddW.Miller3,7,JonathanD.Marotti5,NicoleP.Jenkins2,ArminjaN.Kettenbach2,3,7,ChaoCheng3,4,7,BrockC.
Christensen1,3,7
1DepartmentofEpidemiology,2DepartmentofBiochemistryandCellBiology,3DepartmentofMolecularandSystemsBiology,4DepartmentofGenetics,5Department
ofPathologyandLaboratoryMedicine,6DepartmentofBiomedicalDataScienceatGeiselSchoolofMedicine,Dartmouth,Lebanon,NH03756;
7NorrisCottonCancerCenter,Dartmouth-HitchcockMedicalCenter,Lebanon,NH03756Youdinghuan,ChenGermlineorsomaticmutationinBRCA1isassociatedwithanincreasedriskofbreastcancerandmoreaggressivetumorsubtypes.BRCA1-deficienttumorcellshavedefectivehomologousrecombination(HR)DNArepair,exhibitinggenomeinstabilityandaneuploidy.HRdeficiencycanalsoariseintumorsintheabsenceofBRCA1mutation.AnHR-deficient,BRCA1-likephenotypehasbeenreferredtoas“BRCAness.”BRCA1-likecancersexhibitworseprognosisbutareselectivelysensitivetochemotherapeutictreatments(e.g.platinum-basedalkylatingagents).However,themolecularlandscapesofBRCA1-likebreasttumorsremainlargelyunknowninpartbecausetheyarelesscommoninthegeneralpopulation.Byapplyingacopynumber-basedclassifier,weobservedthat>30%ofTheCancerGenomeAtlas(TCGA)breasttumorsareBRCA1-likeeventhoughonly~3%tumorsanalyzedcarryaBRCA1mutationorpromoterhypermethylation.Separately,adifferentialanalysiscontrollingforhormonereceptorstatus,subjectage,tumorstageandpurityrevealedasignificantincreaseinDNAmethyltransferase1(DNMT1)proteinexpressioninBRCA1-liketumors.Inaddition,differentiallymethylatedgenesetsinBRCA1-liketumorsindicatedastrongenrichmentindevelopmentalsignalingandamoderateinvolvementingenetranscription.ProfilingofconcomitantsomaticalterationlandscapesinBRCA1-likebreasttumorsprovidesalternativestrategiestoidentifythissubsetoftumorsandinsightsintonovelpotentialtherapeuticapproaches.
72
USINGARTIFICIALINTELLIGENCEINDIGITALPATHOLOGYTOCLASSIFYMELANOCYTICLESIONS
StevenN.Hart,W.Flotte,A.P.Norgan,K.K.Shah,Z.R.Buchan,K.B.Geiersbach,T.Mounajjed,T.J.Flotte
MayoClinic,200FirstSt.SW,Rochester,MN55901
Steven,HartExaminationofhematoxylinandeosinstaining(H&E)stainedslidesbylightmicroscopyhasbeenthecornerstoneofhistopathologyforoveracentury.Duringmicroscopicexamination,apathologistusessalientclinicalinformation,patternmatchingandfeaturerecognition(shape,color,structure,etc.)torenderadiagnosis.Recently,whole-slideimage(WSI)scannershavemadeitpossibletofullydigitizepathologyslides.Inadditiontoenablinglongtermslidepreservationandfacilitatingslidesharingforcollaborationorsecondopinions,digitizationofpathologyslidesallowsforthedevelopmentandutilizationArtificialIntelligence(AI)-drivendiagnostictools.WeconductedapilotstudytotesttheabilityanAIconvolutionalneuralnetwork(CNN)todistinguishbetweentwotypesofmelanocyticlesions,ConventionalandSpitznevi.Wesoughttodeterminetheaddedvalueofpathologist-assistedtrainingbycomparingtrainingeffectivenessofcompleteslideanalysisversustrainingonpathologistselectedimagepatches.ImageswereclassifiedbyadeepCNNusingGoogle’sTensorFlowframework.Wefoundsignificantimprovementinclassificationaccuracywhenthemodelwastrainedfromthepathologist-curatedimageset.ThesedataprovidestrongevidenceforthecontinueddevelopmentofAI-drivendiagnostictoolsindigitalpathology,andhighlightstheaddedvalueofdomainexpertswhenbuildingAIworkflows.Futuredirectionsofthisworkincludeexpandingthenumbermelanocyticlesionsrecognizedbythistool,andenhancingitsclinicalperformancethroughincorporationofmolecular,demographic,andoutcomesdata.
73
AMACHINELEARNINGAPPROACHTOSTUDYCOMMONGENEEXPRESSIONPATTERNS
MingzeHe1,2,CarolynJ.Lawrence-Dill1,2,3
1BioinformaticsandComputationalBiologyProgram,IowaStateUniversity,Ames,Iowa,USA,50011;2DepartmentofGenetics,DevelopmentandCellBiology,IowaState
University,Ames,Iowa,USA50011;3DepartmentofAgronomy,IowaStateUniversity,Ames,Iowa,USA50011
Mingze,HeGeneexpressionlandscapechangesaccordingtocertaincircumstances,suchasstressresponses.Themaindifficultiesinpredictingcommonexpressionpatternsamonggroupsofgeneslayinlocatingreliablegenemarkersanddevelopingnovelstatisticalapproaches.Wefirstlybuildasharedgeneontology(GO)correlatedgroupingdatabasebynaturallanguageprocessing(NLP).Further,wetestandapplyamixtureofsupervisedandunsupervisedmachinelearningalgorithmstocompareprincipalcomponentsofexpressionpatternsacrossspecies.WefoundseveralsurprisingcommonexpressionpatternsbetweenmaizegenesandhumantumorcelllinesifG-quadruplex(G4)usedasgeneclassifier.Especially,responsetoreactiveoxygenspecies(ROS)relatedG4carryinggenesshowasignificantclusteringofmaizeundercoldandUVstresswithhumantumorcelllines.ThisresultimpliesthatG4regulatenearbygenesundersimilarstresssituation.
74
GENERAL
POSTERPRESENTATIONS
75
DATABASE-FREEMETAGENOMICANALYSISWITHAKRONYMER
GabrielAl-Ghalith1,AbigailJohnson2,PajauVangay1,DanKnights3
1BioinformaticsandComputationalBiology-UniversityofMinnesota;2TheBiotechnologyInstitute-UniversityofMinnesota;3DepartmentofComputerScienceand
Engineering-UniversityofMinnesotaGabriel,Al-GhalithMicrobiomeresearchischaracterizedbythecomparisonofmicrobialcommunitycensusdatainferredfrombiologicalsamples.Tocreatethesecensuses,metagenomicDNAistypicallyclustered,aligned,orotherwiseannotatedtoformasetoffeatureswithwhichtoevaluateandcomparemicrobialcommunities.Thesefeaturesmaytakedifferentforms.Amplicon-basedstudiesmayusereference-basedapproachesand/orclusteringofsimilarreadstodistillarepresentativesetoffeaturessuchasoperationaltaxonomicunits.Shotgun-basedapproachescanresultinfiner-grained,lessbiasedtaxonomicresolution,butoftenrelyonreferencedatabasesorclassifierstrainedonknownmicrobialentities.Whiletaxonomyandotherdatabaseannotationsareusefulforinterpretation,theymaymaskusefulsequence-levelinformationforcomparingsamplestoeachother.Inparticular,wheneverthereisnotenoughsequencedatafromparticularorganismsinthereferencedatabase(orrawreads)toidentifythemreliably,informationabouttheseorganismscanbelostormisattributed.Thiscausesmanyenvironmentstobedifficultorevenimpossibletocomparewithcurrentmethods.Further,clusteringorreference-basedanalysesaretypicallycomputationallydemanding.Wepresentacomplementary(oralternative)strategyformicrobiomecomparisoninthesoftwareaKronyMer.Itusesanovel,probabilityadjusteddeterministick-merdistancemetricandultrafastnon-heuristicNei-Saitou-basedtreeclusteringalgorithmtorapidlycalculatealphadiversity,betadiversity,andsampleinter-relatednesstreeswitheitherampliconorshotgunsequencedatadirectlywithoutadatabase.Itisrobusttolow-depthsequencing,itrecoversperson-specificsignatureswithfewerthan100,000shotgunreadspersampleinadatasetof34healthyindividuals,anditrecapitulatesotherexpectedtrendsinpublicdatasets.Additionally,aKronyMercanbeusedtoinferphylogenetictreesfromamplicondatainsecondsonalaptop,createawhole-genomephylogenomictreefromall~100,000RefSeqmicrobialgenomesinafewhoursonadesktop,denoisereadsduringprocessing,andinotherpotentialapplications.
76
SOFTWARECOMPARISONFORPREPROCESSINGGC/LC-MS-BASEDMETABOLOMICSDATA
JulianAldana1,MonicaCalaMolina1,MarthaZuluaga2
1DepartmentofChemistryGrupodeInvestigaciónenQuímicaAnalíticayBioanalítica(GABIO),UniversidaddelosAndesBogotáDC,Colombia;2DepartmentofChemistryGrupodeInvestigaciónenCromatografíayTécnicasAfines(GICTA)Universidadde
CaldasManizalesColombiaJulian,AldanaMetabolomicsdatapreprocessingisthefirststepfromrawinstrumentoutputtobiologicalinference,anditiscrucialforthediscoveryofmetabolicsignaturesrelatedtoaparticularphysiopathologicalstateofanorganism.Moreover,datahandlingofgaschromatography/massspectrometry(GC/MS)andliquidchromatography/massspectrometry(LC/MS)datasetsarechallengingduetoitssize,complexityandnoise.Therefore,datapreprocessingisperformedasamulti-steptaskthatinvolves:filtering,peakdetection,deconvolution,andalignment,whichcanbecarriedoutusingawidevarietyofalgorithmsandsoftwarepackages.Giventhelackofasingularpreprocessingsoftwareasabenchmark,thegoalofthisstudyistocomparetheperformanceinthepreprocessingofGC/LC-MSdatabetweenopensourceplatforms(MZmine2,XCMSonlineandMetaboAnalyst3.0)andcommercialsoftware(MassHunterProfinder8.0andMetaboliteplot).Forthispurpose,datasetswerecollectedfromtheanalysisofreplicatesamplesfromaplasmapooling,andwefollowaworkflowprocessineachsoftwareadjustingtheparametersinasimilarwaytoallowthecomparison.Then,thedatageneratedwasanalyzedtodeterminethenumberoffeatures,coefficientofvariationandpeakarea.Asaresult,significantdifferencesweredeterminedinthequantitativeperformanceofthepreprocessingevaluatedpackagesforbothGCandLC-MSdatasets.Finally,thiscomparisonallowedustoevaluatethemagnitudeofpreprocessingeffectinthefinaloutputinMS-basedmetabolomicdata,andhowtheresultsofdifferentsoftwarecanbecomparedeachother.
77
GATEKEEPER:ANEWHARDWAREARCHITECTUREFORACCELERATINGPRE-ALIGNMENTINDNASHORTREADMAPPING
MohammedAlser1,HasanHassan2,HongyiXin3,OğuzErgin4,OnurMutlu2,CanAlkan1
1BilkentUniversity,2ETHZurich,3CarnegieMellonUniversity,4TOBBUniversityofEconomicsandTechnology
Onur,MutluMotivation:Untiltoday,itremainschallengingtosequencetheentireDNAmoleculeasawhole.IntheeraofhighthroughputDNAsequencing(HTS)technologies,genomesaresequencedrelativelyquicklybutresultinanexcessivenumberofsmallDNAsegments(calledshortreadsandareabout75-300basepairslong).Resultingreadsdonothaveanyinformationaboutwhichpartofgenometheycomefrom;hencethebiggestchallengeingenomeanalysisistodeterminetheoriginofeachofthebillionsofshortreadswithinareferencegenometoconstructthedonor’scompletegenome.Identifyingthepotentialoriginofeachread,calledalignment,typicallyperformedusingquadratic-timedynamicprogrammingalgorithms.Theseoptimalalignmentalgorithmsareunavoidableandessentialforprovidingaccurateinformationaboutthequalityofthealignment.Inrecentworks[1-4],researchersobservedthatthemajorityofcandidatelocationsinthereferencegenomedonotalignwithagivenreadduetohighdissimilarity.Calculatingthealignmentofsuchincorrectcandidatelocationswastestheexecutiontimeandincursignificantcomputationalburden.Therefore,itiscrucialtodevelopafastandeffectiveheuristicmethodthatcandetectincorrectcandidatelocationsandeliminatethembeforeinvokingcomputationallycostlyalignmentalgorithms.Results:WeproposeGateKeeper,anewhardwareacceleratorthatfunctionsasapre-alignmentstepthatquicklyfiltersoutmostincorrectcandidatelocations.GateKeeperisthefirstdesigntoacceleratepre-alignmentusingField-ProgrammableGateArrays(FPGAs),whichcanperformpre-alignmentmuchfasterthansoftware.WhenimplementedonasingleFPGAchip,GateKeepermaintainshighaccuracy(onaverage>96%)whileproviding,onaverage,90-foldand130-foldspeedupoverthestate-of-the-artsoftwarepre-alignmenttechniques,AdjacencyFilterandShiftedHammingDistance(SHD),respectively.TheadditionofGateKeeperasapre-alignmentstepcanreducetheverificationtimeofthemrFASTmapperbyafactorof10.Availability:GateKeeperisopen-sourceandfreelyavailableonlineathttps://github.com/BilkentCompGen/GateKeeper.Anextendedversionofthisworkappearsin[1].References:[1]Alser,M.,etal.,GateKeeper:anewhardwarearchitectureforacceleratingpre-alignmentinDNAshortreadmapping.Bioinformatics,2017.33(21):p.3355-3363.[2]Xin,H.,etal.,ShiftedHammingDistance:AFastandAccurateSIMD-FriendlyFiltertoAccelerateAlignmentVerificationinReadMapping.Bioinformatics,2015.31(10):p.1553-1560.[3]Xin,H.,etal.,AcceleratingreadmappingwithFastHASH.BMCgenomics,2013.14(Suppl1):p.S13.[4]Kim,J.,etal.,GenomeReadIn-Memory(GRIM)Filter:FastLocationFilteringinDNAReadMappingusingEmergingMemoryTechnologies,toappearinBMCGenomics,2018.
78
MODELINGTHEENHANCERACTIVITYTHROUGHTHECOMBINATIONOFEPIGENETICFACTORS
MinGyunBae,TaeyeopLee,JaehoOh,JunHyeongLee,JungKyoonChoi
DepartmentofBioBrainEngineering,KoreaAdvancedInstituteofScienceandTechnology(KAIST),RepublicofKorea
MinGyun,BaeEpigenomemapsallowustopredictthousandsofputativeregulatoryregionssuchaspromoter,insulatorsandenhancersinvariouscelllinesthroughinvivoepigenomicsignaturesandarewidelyusedforstudyinggeneregulationofdevelopmentalprocessanddisease.Especially,super-enhancers,whichconsistofclustersofactiveenhancerspredictedfromH3K27acsignal,areknowntoregulateneargenesthatareimportantincontrollinganddefiningcellidentity.However,thecombinationoftranscriptionfactorsforregulatingenhanceractivityisnotstudiedyet.Inthisstudy,weusedmassivelyparallelreporterassay(MPRA)datawhichmeasurethequantitativeactivityofregulatoryregionstoidentifyenhancers.Through5-nucleotideresolutiontilingofoverlappingMPRAconstructswithaprobabilisticgraphicalmodel,weestimatedthehighresolutionactivityspanning15000putativeregulatoryregionsinK562andHepG2cellline.Accordingtotheratioofactivityatboundaryandcenterofregulatoryregion,weidentifiedthousandsofenhancerscandidates.Usingtheseenhancers,wedevelopedarandomforestmodeltoidentifytheepigeneticdifferencesusingabout300histonemodificationsandtranscriptionfactorsinencyclopediaofDNAelements(ENCODE).Throughtheperformancetestbyareaundercurve(AUC),weconfirmedthatourmodelaccuratelypredictedtheenhancers.Inconclusion,weidentifiedenhancersthroughhigh-throughputreporterassayandfoundtheepigeneticfeaturesthroughrandomforestmodelling.
79
FREQUENCYANDPROPERTIESOFMOSAICSOMATICMUTATIONSINANORMALDEVELOPINGBRAIN
TaejeongBae1,JessicaMariani2,LiviaTomasini2,BoZhou3,AlexanderE.Urban3,AlexejAbyzov1,FloraM.Vaccarino2
1MayoClinic,2YaleUniversity,3StanfordUniversity
Alexej,AbyzovAsmountingevidenceindicates,eachcellinthehumanbodyhasitsowngenome,aphenomenoncalledsomaticmosaicism.Fewstudieshavebeenconductedtounderstandpost-zygoticaccumulationofmutationsincellsofthehealthyhumanbody.Startingfromsinglecells,directlyobtainedfromthreefetalbrains,weestablished31separatecoloniesofneuronalprogenitorcells,andcarriedoutwhole-genomesequencingonDNAfromeachcolony.Theclonalnatureofthesecoloniesallowsahigh-resolutionanalysisofthegenomesofthefounderprogenitorcellswithoutbeingconfoundedbytheartifactsofinvitrosinglecellwholegenomeamplification.Acrossthethreebrainswedetected200to400non-germlineSNVsperclone.Validationexperiments(withPCR,digitaldropletPCR,andcapturedeepsequencing)revealedhighspecificity(>95%)andsensitivity(>80%)oftheSNVsaswellasconfirmedthepresenceofoverahundredofSNVsintheoriginalbraintissues,therebyprovingthatthedetectedSNVsrepresentgenuinemosaicvariantspresentinneuronalprogenitors.Theper-cellnumberofmosaicSNVsincreasedlinearlywithbrainageallowingustoestimatethemutationrateatabout8.6SNVspercelldivision.DozensofSNVsweregenotypedinmultipledifferentregionsofabrainandeveninblood,suggestingthattheyhaveoccurredpriortogastrulation.UsingtheseSNVs,wereconstructedcelllineagesforthefirstfivepost-zygoticcleavagesandcalculatedamutationrateof~1.3SNVsperdivisionperdaughtercell.Comparisonofmutationspectrarevealedashifttowardsoxidativedamage-relatedmutationsinneurogenesis.Bothneurogenesisandearlyembryogenesisexhibitdrasticallymoremutagenesisthanadulthood.Onacoarse-grainedscalemosaicSNVsweredistributeduniformlyacrossthegenomeandwereenrichedinmutationalsignaturesobservedinmedulloblastoma,neuroblastoma,aswellasinasignatureobservedinallcancersandindenovovariantsandwhich,aswepreviouslyhypothesized,isahallmarkofnormalcellproliferation.Correlationswithhistonemarksfurtherstrengthenedthesimilarityofmosaicmutationsinnormalfetalbrainwithsomaticmutationsreportedforbraincancers.OnasmallerscaleSNVsweremostlybenign,showednoassociationwithanyGOcategoryandtendedtoavoidDNAsehypersensitivesites.Thesefindingsrevealalargedegreeofsomaticmosaicisminthedevelopinghumanbrain,linkdenovoandcancermutationstonormalmosaicismandsetabaselineformosaicgenomevariationrelatedtohumanbraindevelopmentandfunction.
80
CYCLONOVO:DENOVOSEQUENCINGALGORITHMDISCOVERSNOVELCYCLICPEPTIDENATURALPRODUCTSINSUNFLOWERANDCYANOBACTERIAUSING
TANDEMMASSSPECTROMETRYDATA
BaharBehsaz1,HoseinMohimani2,AlexeyGurevich3,AndreyPrjibelski3,MarkF.Fisher4,LarrySmarr2,PieterC.Dorrestein5,JoshuaS.Mylne4,PavelA.Pevzner2
1BioinformaticsandSystemsBiologyProgram,UniversityofCaliforniaatSanDiego,LaJolla,USA;
2DepartmentofComputerScienceandEngineering,UniversityofCaliforniaatSanDiego,LaJolla,USA;3CenterforAlgorithmicBiotechnology,InstituteforTranslationalBiomedicine,St.PetersburgState
University,StPetersburg,Russia;4TheUniversityofWesternAustralia,SchoolofMolecularSciencesandARCCentreofExcellenceinPlantEnergyBiology,Crawley,Australia;5DepartmentofPharmacology,
UniversityofCaliforniaatSanDiego,LaJolla,USABahar,BehsazCyclopeptidesrepresentanimportantclassofnaturalproductswithanunparalleledtrackrecordinpharmacology:manyantibiotics,antitumoragents,andimmunosuppressors,arecyclopeptides.WhilebillionsoftandemmassspectraofnaturalproductshavebeendepositedtoGlobalNaturalProductsSocial(GNPS)molecularnetwork,thediscoveryofnovelcyclopeptidesfromthisgoldmineofspectraldataremainschallenging.Astheresult,onlyasmallfractionofspectraintheGNPSmolecularnetworkhavebeenidentifiedsofar.Toaddressthisbottleneck,wedevelopedCycloNovoalgorithmfordenovocyclopeptidesequencingbasedontheconceptofthedeBruijngraphs,theworkhorseofmoderngenomesequencingalgorithms.Givenaspectraldataset,CycloNovofirstidentifiesasubsetofthisdatasetthatmayrepresentcyclicandbranch-cyclicpeptidesbyanalyzingspectral-convolutionofeachspectrum.Afterward,itattemptstodenovosequenceeachspectrumofputativecyclicorbranch-cyclicpeptides.CycloNovopipelineincludes(i)computingthespectralconvolutionofeachspectrum,andextractingthesetofmassesthatrepresentputativeaminoacidsintheunknownPNP,(ii)computingcompositionsofmassesthatmatchestheprecursormassofthespectrum,(iii)constructingpotential5-mersforeachcompositionwithhighscoreagainstthespectrum,(iv)constructingadeBruijngraphwiththose5-mers,(v)traversingthedeBruijngraphandgeneratingcandidatesequences,and(vi)computingthePeptide-Spectrum-Match(PSM)scoreforeachcandidatesequence.CycloNovorevealedmanystillunknowncyclopeptides(hundredsofnovelcyclopeptidefamilies)illustratingthatcurrentlyknowncyclopeptidesrepresentjustasmallfractionofcyclopeptideswhosespectraarealreadydepositedintopublicdatabasessuchasGNPS.CycloNovoaddressesthechallengeofanalyzingthe“darkmatterofcyclopeptidome”byapplyingdeBruijngraphstocyclopeptidesequencing.ItcorrectlysequencedmanyknowncyclopeptidesinablindexperimentandreconstructednovelcyclopeptidesoriginatedfromplantsandcyanobacteriathatwerefurthervalidatedusingRNA-seqdataandgenomemining,thefirstcyclopeptidesdiscoveredinacompletelyautomateddenovofashion.Ouranalysisofhumanmicrobiomeisthefirstdemonstrationthatnumerousbioactivecyclopeptidesfromconsumedplantsremainstableintheproteolytichumangutenvironmentandthusareexpectedtointeractwithhumanmicrobiome.Inaddition,itrevealedalargenumberofstillunknowncyclopeptidesinthehumangutthatareeitherapartofthehumandietorareproductsofthehumangut’smicrobiome.
81
FUNCTIONALANNOTATIONOFGENOMICVARIANTSINSTUDIESOFLATE-ONSETALZHEIMER’SDISEASE
MariuszButkiewicz,JonathanL.Haines,WilliamS.Bush
InstituteforComputationalBiologyandDepartmentofPopulationandQuantitativeHealthSciences,CaseWesternReserveUniversity,Cleveland,OHUSA
William,BushAnnotationofgenomicvariantsisanincreasinglyimportantandcomplexpartoftheanalysisofsequence-basedgenomicanalyses.Computationalpredictionsofvariantfunctionareroutinelyincorporatedintogene-basedanalysesofrare-variants,thoughtodatemoststudiesuselimitedinformationforassessingvariantfunctionthatisoftenagnosticofthediseasebeingstudied.Inthiswork,weoutlineanannotationprocessmotivatedbytheAlzheimer’sDiseaseSequencingProject,andillustratetheimpactofincludingtissue-specifictranscriptsetsandsourcesofgeneregulatoryinformation,andassessthepotentialimpactofchanginggenomicbuildsontheannotationprocess.Whilethesefactorsonlyimpactasmallproportionoftotalvariantannotations(~5%),theyinfluencethepotentialanalysisofalargefractionofgenes(~25%).Variantannotationisavailableforbulkdownload,andindividualvariantannotationsarealsoavailableviatheNIAGADSGenomicsDB.
82
OCTAD:ANOPENCANCERTHERAPEUTICDISCOVERYWORKSPACEINTHEERAOFPRECISIONMEDICINE
BinChen,BenjaminS.Glicksberg,WilliamZeng,YuyingChen,KeLiu
InstituteforComputationalHealthSciences,UniversityofCalifornia,SanFrancisco,55016thStreet,SanFrancisco,California94143,USA
Bin,ChenRapidlydecreasingcostsofRNAsequencinghaveenabledlarge-scaleprofilingofcancertumorsampleswithpreciselydefinedclinicalandmolecularfeatures(e.g.,LowgradeIDH1mutantGlioma).Identifyingdrugstargetingaspecificsubsetofcancerpatients,particularlythosethatdonotrespondtoconventionaltreatments,iscriticallyimportantfortranslationalresearch.Manystudieshavedemonstratedtheutilityofasystems-basedapproachthatconnectscancerstoefficaciousdrugsthroughgeneexpressionsignaturestoprioritizedrugsfromalargedruglibrary.Fromourpreviousworkonlivercancer,Ewing’sSarcoma,andBasalcellcarcinoma,wehaveshownthatthesuccessofthisapproachismadepossiblebycriticalprocedures,suchasqualitycontroloftumorsamples,selectionofappropriatereferencetissues,evaluationofdiseasesignatures,andweightingcancercelllines.Thereisaplethoraofrelevantdatasetsandanalysismodulesthatarepubliclyavailable,yetareisolatedindistinctsilos,makingittedioustoimplementthisapproachintranslationalresearch.Assuch,wepresentthecurrentprotocol,whichweenvisionasabestpracticetoprioritizedrugsforfurtherexperimentalevaluation,primarilybasedonopentranscriptomicdatasetsandthefreeopen-sourceRlanguageandBioconductorpackages.Inthisproject,weretrievedpatienttumorsamplesbasedonspecifiedclinicaland/ormolecularfeaturesfromtheGenomicDataCommonsDataPortalusinganAPI.WethencreatedageneexpressionsignatureforthesesamplesthroughemployingnormalizedRNA-SeqcountsprocessedintheUCSCXenaproject,whereallRNA-SeqsamplesfromTCGA,TARGET,andGTExwerealignedandnormalizedusingthesamepipeline.Weevaluatedthequalityofsamplesbasedontheirpurityandcorrelationwithcancercelllines.ThereferencetissuesampleswereselectedbasedontheirprofilesimilaritywithGTExsamples.Weevaluatedeachdiseasesignatureviaacross-validationapproach.Wethencreateddrugsignaturesusingasimilarprocedurefromlarge-scale,openaccessplatforms,namelytheLINCSL1000library,whichconsistsofover20,000compounds.Ourpipelinecanthencomputeandassessthereversalpotencybetweenthediseasesignatureandeachdrugsignature.Thedrugsthatpresenthighreversalpotencyareprioritizedasdrughits.Finally,weperformedenrichmentanalysisofdrughitstoidentifycompellingenrichedtargetsandpathways.Forourpilotstudy,weuseIDH1mutantOligodendrogliomaasacasestudy,wheretheefficacyofover300LINCScompoundswasmeasuredinthreerelevantcelllines.Wehaveshownthatourpredictioncorroboratewiththeexperimentaldata.
83
DEEPLEARNINGPREDICTSTUBERCULOSISDRUGRESISTANCESTATUSFROMWHOLE-GENOMESEQUENCINGDATA
MichaelL.Chen,IsaacS.Kohane,AndrewL.Beam,MahaFarhat
DepartmentofBiomedicalInformatics,HarvardMedicalSchool,Boston,MAMichael,ChenBackgroundThediagnosisofmultidrugresistantandextensivelydrugresistanttuberculosisisaglobalhealthpriority.Thereisapressingneedforarapidandcomprehensivedrugsusceptibilitytestthatcancircumventthelimitedscopeofconventionalmethodsandtheassociatedlongwaittimes.WesoughttoimplementthefirstdeeplearningframeworkasapredictivediagnostictoolforMycobacteriumtuberculosis(MTB)drugresistance.MethodsUsingalargepublicdatasetof3,601MTBstrainsthatunderwenttargetedorwholegenomesequencingandconventionaldrugresistancephenotyping,webuiltthefirst-of-its-kindmultitaskwideanddeepneuralnetwork(WDNN)architecturetopredictphenotypicdrugresistanceto11anti-tuberculardrugs.WecomparedperformanceoftheproposedWDNNtoregularizedlogisticregressionandrandomforestmodelsusingfive-foldcrossvalidation.Weconductedpermutationtestsforevaluatingfeatureimportanceandat-distributedstochasticneighborhoodembedding(t-SNE)tovisualizethehighdimensionalmodeloutputonthefulldataset.ResultsThemultitaskWDNNachievedstate-of-the-artpredictiveperformancecomparedtoregularizedlogisticregressionandrandomforest:theaveragesensitivitiesandspecificities,respectively,forall11drugswere87.1%and93.7%(multitaskWDNN),85.4%and93.8%(randomforest),and82.2%and93.9%(regularizedlogisticregression).ThemultitaskWDNNachievedahighersumofspecificityandsensitivityfor9ofthe11drugscomparedtoboththerandomforestandregularizedlogisticregression.WeshowconsiderableperformancegainsinourcurrentmultitaskWDNNwithrespecttoourpreviouslyreportedrandomforestmodel,notingimprovementsofupto54.0%inthesumofspecificityandsensitivity.Patternsinsusceptibilitystatusemergedbetweendrugsafterapplyingt-SNEthatcorrelatewellwithwhatisknownabouttheorderofMTBdrugresistanceacquisition.Novelt-SNEfindingsincludedmajorclusterdifferencesbetweenpyrazinamideandotherfirst-linedrugsandincreasedamountsofresistanceclustersforcapreomycincomparedtoothersecond-linedrugs.Notablefindingsinthefeatureimportanceanalysesincludedexpectedsharedresistance-associatedmutationsbetweendrugsandprovidednewinsightpotentialmechanisticrelationships.Capreomycinexclusivelyshared10featureswithfirst-linedrugs,highlightingpotentialavenuesforfutureresearchintothediagnosticsimilaritiesbetweencapreomycinandothersubtypesofanti-tuberculardrugs.ConclusionOurproposedarchitectureprovidesaunifiedmodelofdrugresistanceacross11anti-tuberculardrugsandshowsconsiderableperformancegainsoversimpermethods.DeeplearninghasaclearroleinimprovingidentificationofdrugresistantMTBstrainsandholdspromiseinbringingsequencingtechnologiesclosertothebedside.
84
DESIGNINGPREDICTIONMODELFORHYPERURICEMIAWITHVARIOUSMACHINELEARNINGTOOLSUSINGHEALTHCHECK-UPEHRDATABASE
EunKyungChoe1,SangWooLee2
1DepartmentofSurgery,SeoulNationalUniversityCollegeofMedicine;2NetworkDivision,SamsungElectronics
EunKyung,ChoeHyperuricemiaisanelevateduricacidlevelinblood.Itcanleadtogoutandnephrolithiasisbutalsohasbeenimplicatedasanindicatorfordiseaselikemetabolicsyndrome,diabetesmellitus,cardiovasculardisease,andchronicrenaldisease.TheaimofthepresentstudyistodesignapredictionmodelforhyperuricemiausingEHRdatabasefromhealthcheck-upusingvariousmachinelearningtools.From2005to2015,self-paidpeoplehadcomprehensivehealthcheck-up.Inputfactorswereage,gender,bodymassindex(BMI),bloodpressure,waistcircumference,whitebloodcellcount,hemoglobin,glucoselevel,cholesterol,GOT/GPT,GGT,creatinine,triglyceride,urinealbumin,smoking/alcoholhabit,anddiabetes/hypertension/dyslipidemiamedicationstatus,whicharethefactorscoveredbynationalhealthinsurance.Outputfactorwasuricacidlevelwhichisnotincludedinthenationalhealthcheck-up.AllofthedatawereextractedfromtheEHRdatabaseandtextminingwasperformed.Wedesignedapredictionmodelforhyperuricemiausingmachinelearningtoolssuchaslinearregressionmodel(LR),supportvectormodel(SVM),classificationtreemodel(CT)andneuralnetworkmodel(NN).MachinelearningwasperformedbyMATLABR2016b(TheMathworks,Natick,MA).Thepredictionpowerofeachmodelswereevaluatedbycalculatingtheareaunderthecurve(AUC),sensitivity,specificityandaccuracy.Total55,227personswereincludedintheanalysis.Themedianagewas52years(range21-95years)and53.5%ofpersonsweremales.Therewere10,586(19.2%)personswhohaduricacidlevelinhyperuricemia.BMIwashigherinhyperuricemiagroup(25.2+/-3.0vs.normaluricacidgroup,23.4+/-2.9,p<0.001)andthereweremorealcoholdrinkinghabitsinhyperuricemiagroup(67.8%vs.normaluricacidgroup,52.4%,p<0.001).Sortingtheresultsbytheaccuracyofeachmachinelearningmodels,theCTshowedthehighestaccuracyof0.954(AUC=0.886;sensitivity=0.792;specificity=0.981)comparedtoSVMof0.892(AUC=0.630;sensitivity=0.261;specificity=0.999),NNof0.859(AUC=0.770;sensitivity=0.09;specificity=0.991)andLRof0.857(AUC=0.761;sensitivity=0.033;specificity=0.997).Thisstudyusedahealthcheck-upEHRdatabasetopredictadiseasestatus(hyperuricemia)usingvariousmachinelearningtools.SincetheamountofEHRdatabaseareincreasingrapidly,thedataincludedinthedatabasecouldbeusedasbiomarkerstopredictdiseasestatusorhighriskconditionsbymodelingapredictionmodelusingmachinelearningtools.Butsincetheoptimalanalysistooloranalyzingprotocolisnotwellestablishedandtheover-fittingproblemisyetnotsolved,moretrainingandresearchesinvarioussetofpopulationsshouldbeendoneinfuturestudyforreplication.
85
RICK:RNAINTERACTIVECOMPUTINGKIT
GalinaA.Erikson,LingHuang,MaximShokhirev
SalkInstituteforBiologicalStudiesGalina,EriksonTheadventofmassivelyparallelsequencingofRNA(RNA-Seq)enablesfastandinexpensiveglobalmeasurementofthousandsofgenesacrossbiologicalperturbationsinvolvingdrugtreatment,geneticmutations,andtimeseries.Tofacilitatecomparison,manytoolshavebeendeveloped,howevermostofthesetoolsrequireextensiveprogrammingandbioinformaticsknowledge:littleisavailableforthescientistthatwantstoanalyzetheirownRNA-seqdatabutlacksbioinformaticsexpertise.TheRNAInteractiveComputingKit(RICK)aimstofillthisgapbyprovidinganinteractivewebworkspacedesignedtofacilitateRNA-Seqanalysisandvisualization.RICKacceptsasinputafilewithrawreadcountsforeachtranscriptandsampleandperformssampleclustering),visualizestheglobalgeneexpressionwithheatmaps,runsprincipalcomponentanalysisandpreparesprintreadyfigures.Userscanaddandremovesamplesandregeneratenewfiguresonthefly.Fordifferentialgenesexpressionusershavetheoptiontouse:edgeR,Deseq2orthecombinationofallandfiltertheresultsbasedonadjustedp-valueandfoldchange.RICKisabletousetheDEresultsfromtheprevioussteptoidentifythesignificantlyalteredKEGGpathwaysorenrichedGOtermsusingthegageorGOseqpathwayanalysispackagewithvisualization.Usersalsohavetheoptiontouploadtheircustomizedgene/backgroundgenelisttodoaDAVID-likeanalysis.RICKsupportsRNA-Seqbasedresearchbyprovidingaworkflowthatrequiresnobioinformaticsskills.RICKisfreelyavailableatrick.salk.edu.
86
PRIVATEINFORMATIONLEAKAGEINFUNCTIONALGENOMICSEXPERIMENTS:QUANTIFICATIONANDLINKING
GamzeGursoy,MarkGerstein
PrograminComputationalBiologyandBioinformaticsYaleUniversityGamze,GursoyThesuccessoftheENCODE(EncyclopediaofDNAElements)projectopenedthedoorstoadeeperunderstandingofthefunctionalgenomethroughgenome-wideexperimentalassays.AlthoughidentifyingindividualsusingDNAvariantsfromwholegenomeorexomesequencingdataisamajorprivacyandsecurityconcern,nostudyongenomicprivacyhasfocusedonthequantityofinformationinfunctionalgenomicexperimentssuchasChIP-Seq,RNA-SeqandHi-C,sincethemajorityofthisdataispartialandbiased.Here,wequantifytheamountofleakedgenotypeinformationindifferentfunctionalgenomicassaysatvaryingcoverages.Weshowthatsequencingdatafromfunctionalgenomicsassaysprovidesenoughprivateinformationtobeabletolinkthesesamplestoapanelofindividualswithknowngenotypes.
87
CARPED.I.E.M:ADATAINTEGRATIONEXPECTATIONMAPFORTHEPOTENTIALOFMULTI-`OMICSINTEGRATIONINCOMPLEXDISEASE
TiaTateHudson,ClarLyndaWilliams-DeVane
NorthCarolinaCentralUniversity,Durham,NC,USATia,HudsonAdvancesinhighthroughputtechnologiesandtheavailabilityofmulti-`omicsdatapresenttheopportunityformoreholisticunderstandingsofbiologicalregulationincomplexdiseasesanddisparities.Thecomplexityanddisparatenatureofvariousdiseasesrequiresthedevelopmentofequallycomplexmodelswithmultiplelayersofbiologicalinformation.Thishowever,requirestheintegrationofbiological,computational,andstatisticaldomains.Currently,nonetheless,thereexistmajorgapsintheavailabilityandknowledgeamongstthethreedomains.Typically,biologistexperienceproblemswithprocessingandanalyzingbiologicaldata;therefore,seekingdatascientistformorecustomizedanalysis.Incontrast,somedatascientistslackathoroughunderstandingoftheregulationandcomplexinteractionsofvarioussystemsgivingrisetovaryingcomplexphenotypes.Thisgenerallyresultsinlesscomprehensiveanalysisandanoverallnarrowunderstandingofcomplexdiseasephenotypes,whichcanonlybethoroughlyunderstoodwhenvariouslevelsof`omicinteractionsareconsideredasawhole.Thus,developingthemostcomprehensivebiologicalmodelsmustconsiderthemultipleappropriatelayersofgenomic,epigenomic,transcriptomic,proteomic,andmetabolomicregulation,aswellasthepotentialroleenvironmentalandsocialfactorsplayateach`omiclevel.Historically,diversedatatypeshavebeenconsideredindependentlywhilecombinationsoftwoormoredatatypeshavebeenutilizedlessfrequently.Singularanalysisofindependent`omiccontributionsofdiseaseoftenneglecttheintricateinteractionsamongthedistinctlevelsgivingrisetothesecomplextraits.Althoughenvironmentalandsocialfactorshaveamajorroleinthedisparatenatureofdiversediseases,manydiseasesresultfrommutualalterationsinassortedpathwaysandbiologicalprocesses,includinggenemutations,epigeneticchanges,andmodificationsingeneregulation.Therefore,thevariousphenotypesindiversediseaserepresentamajorexampleoftheneedforintegratedbiologicalmodelsforcomplextraitanalysis.Inthisstudy,wepresenttheDataIntegrationExpectationMap(D.I.E.M),whereweexplorethescientificvalueofintegratingvarious`omicdatacombinationsthatcanrevealmechanismsofbiologicalregulationindiseasedisparities.Ourgoalistoconveythepotentialforintegrationofgenomic,epigenomic,transcriptomic,proteomic,andmetabolomicdataforimprovingourunderstandingofthecomplexityandnatureofdisparityincomplexdiseasetraits.Indoingso,thismapwilladdresstheholesinthevariousdomainsnecessaryforintegrateddataanalysisandinterpretation.D.I.E.Mwillalsorevealtheexpectedoutcomesforeach`omicdatatypeandthevariouscombinationsthatmayormaynotdivulgeaholisticviewintocomplexdiseasephenotypes.Withthat,weexpecttogainagreaterunderstandingofphysiologicalprocessescontributingtodisparitiesaswellastheroleeach`omicinteractionplaysinscreening,diagnosis,andprognosisofdisease.
88
IMPROVINGGENEFUSIONDETECTIONACCURACYWITHFUSIONCONTIGREALIGNMENTINTARGETEDTUMORSEQUENCING
JinHyunJu,XiaoChen,JuneSnedecor,Han-YuChuang,BenMishkanian,SvenBilke
IlluminaInc.,5200IlluminaWay,SanDiego,CA92122,USAJinHyun,JuGenefusionshavebeenidentifiedasdrivermutationsinmultiplecancertypes,andanumberofdrugstargetingspecificfusionshavebeendevelopedastreatmentoptions.Therefore,theabilitytoidentifyfusionsfromtumorsampleshasbecomecriticalfortheselectionofappropriatetreatmentsforpatients.Previously,genefusionshavebeendetectedbytargetedapproachessuchaspolymerasechainreaction(PCR)orFluorescentInSituHybridization(FISH).Thesemethodsnotonlyrequirepriorknowledgeofthefusion,butarealsolaborintensiveandnotefficient.NewermethodsutilizingRNAsequencing(RNA-seq)thatareabletodetectmultipletypesofgenefusionswithnopriorknowledgerequiredhavebeenintroducedwiththeemergenceofnext-generationsequencingtechnology.OnecriticalchallengeinusingRNA-seqdataforgenefusiondetectionisfalsepositivefindingsintroducedbyalignerspecificbiasesorregionswithsequencesimilarityinthegenome.Thisproblembecomesmoreapparentinclinicalsettingswheretheabundanceoffusiontranscriptscanbelimitedbythecompositionandheterogeneityofthetumorsample.Toavoidthecriticalriskoffailingtodetectapotentiallytreatablegenefusion,imposingastringentdetectionthresholdbecomesdifficultinthesesituationsleadingtotheinclusionoffusionsbasedonrelativelylowreadevidence.Toaddressthisproblem,wedescribeanovelfusionfilteringmethodbasedonfusioncontigrealignmentthatisdesignedtoidentifyspuriousfalsepositivefusions.Ourmethodcanbeusedtogetherwithanyassembly-basedfusioncallingmethodthatconstructsacontigsequenceforeachreportedfusion.ThefirststepistorealignthefusioncontigswithBasicLocalAlignmentSearchTool(BLAST),whichisrelativelymoreflexibleinfindingalternativealignmentresultswithhighsequencesimilarity.Subsequently,wedeterminewhetheraspecificfusioncallcanbesupportedbyevidencefoundinBLASTalignments.Specifically,weaimtofilteroutfusionsthatcanbeexplainedbyregionsoriginatingfromasinglegeneorgenomicregion,orhaveweaksupportoneithersideofthefusioninBLASTalignments.Inourpreliminaryanalysisof1171fusioncallsin322samples,111outof161falsepositivecalls(68%)werefilteredoutwhilenocallsfromthetotalof1010truepositiveswerefilteredout.
89
SPARSEREGRESSIONFORNETWORKGRAPHSANDITSAPPLICATIONTOGENENETWORKSOFTHEBRAIN
HidekoKawakubo,YusukeMatsui,TeppeiShimamura
NagoyaUniversityiHideko,KawakuboRecentrarevariantanalysesofsinglenucleotidevariations(SNVs)andcopynumbervariations(CNVs)hasidentifieddozensofcandidategenesthatmaycontributetoneurogeneticdisorderssuchasautismandschizophrenia.However,itisunclearwhetherandhowthesedisease-causinggenesareassociatedwithcellularmechanismsinbrain.Thisproblemisachallengingtask,sincethebraincontainshundredsofdistinctcelltypes,eachofwhichhasuniquemorphologies,projections,andfunctions,andthusdisease-causinggenesmaycontributetodifferentbehavioralabnormalitiesofdistinctcelltypesinthenervoussystem.Inordertoidentifycandidatecelltypesofthebrainrelatedtoacomplexgeneticdisorder,weproposeastatisticalmethod,calledgraphorientedsparselearning(GOSPEL).
90
GRIM-FILTER:FASTSEEDLOCATIONFILTERINGINDNAREADMAPPINGUSINGPROCESSING-IN-MEMORYTECHNOLOGIES
JeremieS.Kim1,2,DamlaS.Cali1,HongyiXin3,DonghyukLee1,4,SaugataGhose1,MohammedAlser5,HasanHassan2,6,OğuzErgin6,CanAlkan5,OnurMutlu2,1
1ECEDepartment,CarnegieMellonUniversity;2CSDepartment,ETHZurich;
3CSDepartment,CarnegieMellonUniversity;4NVIDIAResearch;5CEDepartment,BilkentUniversity;6CEDepartment,TOBBUniversityofEconomicsandTechnology
Jeremie,KimSeedlocationfilteringiscriticalinDNAreadmapping,aprocesswherebillionsofDNAfragments(reads)sampledfromadonoraremappedontoareferencegenomeinordertoidentifythegenomicvariantsofthedonor.State-of-the-artreadmappersdeterminetheoriginallocationofareadsequencewithinareferencegenomein3generalizedsteps.Areadmapper1)quicklygeneratespossiblemappinglocationsforseeds(i.e.,smallersegments)withinaread,2)extractsthereferencesequenceateachofthemappinglocations,and3)determinesthesimilarityscorebetweenthereadanditsassociatedreferencesequenceswithacomputationally-expensivealgorithm(i.e.,sequencealignment).Withthesimilarityscoresacrossallpossiblelocations,thereadmappercandeterminetheoriginallocationofthereadsequence.Thedifferencesbetweenthereadsequenceandthematchingreferencesequenceindicatethegenomicvariantsofthedonor,whichcanbefurtheranalyzedforpreventativecareordiagnosis.Aseedlocationfilter(e.g.,FastHASH[2],SHD[3],GateKeeper[4])comesintoplaybeforesequencealignment(step3)andreducesthenumberofunnecessaryalignments.Aseedlocationfilterefficientlydetermineswhetheracandidatemappinglocationwouldresultinanincorrectmappingbeforeperformingthecomputationally-expensivesequencealignmentstepforthatlocation.Intheidealcase,aseedlocationfilterwoulddiscardallpoorlymatchinglocationspriortoalignmentsuchthatthereisnowastedcomputationonunnecessaryalignments.Weproposeanovelseedlocationfilteringalgorithm,GRIM-Filter,optimizedtoexploit3D-stackedmemorysystemsthatintegratecomputationwithinalogiclayerstackedundermemorylayers,toperformprocessing-in-memory(PIM).GRIM-Filterquicklyfiltersseedlocationsby1)introducinganewrepresentationofcoarse-grainedsegmentsofthereferencegenome,and2)usingmassively-parallelin-memoryoperationstoidentifyreadpresencewithineachcoarse-grainedsegment.Ourevaluationsshowthatforasequencealignmenterrortoleranceof0.05,GRIM-Filter1)reducesthefalsenegativerateoffilteringby5.59x--6.41x,comparedtothebestpreviousseedlocationfilteringalgorithm,and2)providesanend-to-endreadmapperspeedupof1.81x--3.65x,comparedtoastate-of-the-artreadmapperemployingthebestpreviousseedlocationfilteringalgorithm[2].Thisworkwillappearatthe16thAsiaPacificBioinformaticsConferenceinJanuary2018[1].Thepreliminaryversionofthefullarticleisathttps://arxiv.org/pdf/1711.01177.pdf.[1]Kim,JeremieS,etal."GRIM-Filter:FastSeedLocationFilteringinDNAReadMappingUsingProcessing-in-MemoryTechnologies."toappearinBMCGenomics(2018).[2]Xin,Hongyi,etal.“AcceleratingreadmappingwithFastHASH.”BMCGenomics(2013).[3]Xin,Hongyi,etal.“ShiftedHammingdistance:afastandaccurateSIMD-friendlyfiltertoacceleratealignmentverificationinreadmapping.”Bioinformatics(2015).[4]Alser,Mohammed,etal."GateKeeper:anewhardwarearchitectureforacceleratingpre-alignmentinDNAshortreadmapping."Bioinformatics(2017).
91
MULTI-CLASSCLASSIFICATIONSTRATEGYFORSUPPORTVECTORMACHINESUSINGWEIGHTEDVOTINGANDVOTINGDROP
SunghoKim,TaehunKim
YeungnamUniversity,DGISTSungho,KimAnovelmulti-classstrategyforSupportVectorMachines(SVMs)wasdevelopedtoperformmulti-classclassification,suchasOneVersusOne,OneVersusAllandDynamicAcyclicGraph.Thesestrategiesdonotreflectthedistancebetweenthehyper-planethatseparatestwoclassesandinputdata.Thisisnotreasonablewhentheinputdataisplacednearthehyper-plane.TheproposedweightedvotingresolvesthisproblembyweightingthevotingvaluesaccordingtothedistancefromtheboundaryandtheenhancedperformanceoftheSVMswiththeproposedvotingdrop.TheproposedWeightedVotingisbasedonthevotingmethod.Thevotingmethodiscarriedoutbyaccumulatingvotes,thenchoosingthemostvotedclass.TheproposedWeightedVotingmethodisaweightingofthevotingvaluebyreflectingthedistancefromtheboundaryandmargin.SecondproposedVotingDropmethodisabouthowtoaccumulatevotes.ThenovelvotingmethodaccumulateseveryvotebutthismannercanbeaproblembecausethereareredundantlyrespondingSVMs.BecausetheSVMisabinaryclassifier,eachSVMlearnsonlyabouttwoclasses.Therefore,aSVMdoesnothavediscernmentforthenon-learnedclasses.ThisiswhywhenaSVMpredictsdatabelongingtoanon-learnedclass,theSVMrespondsredundantly.ThisirrelevantSVMcausesanincorrectvotethatmakesthedecisionconfused.Toresolvethisproblem,theVotingDropmethoddropstheredundantvotesbyremovingtheirrelevantSVM.ThisalgorithmfindstheirrelevantSVM,thendroppingthevotescausedbytheirrelevantSVM.ThewaytofindanirrelevantSVMistofindaleastvotedclassbecausealeastvotedclasscanbethoughtofasanirrelevantclasstoinputdata.Asshownintheexperiments,evenlyreflectingthedistancefromthehyper-planeandthediscernmentofthehyper-planeandremovingtheredundantSVM`svotingleadstohigherperformance.Theproposedmethodscanbeusedforarangeofclassificationtasks.
92
GENOME-WIDEANALYSISOFTRANSCRIPTIONALANDCYTOKINERESPONSEVARIABILITYINACTIVATEDHUMANIMMUNECELLS
SarahKim-Hellmuth1,2,MatthiasBechheim3,BennoPütz2,PejmanMohammadi1,4,JohannesSchumacher5,VeitHornung3,6,BertramMüller-Myhsok2,TuuliLappalainen1,4
1NewYorkGenomeCenter,NewYork,NY,USA;2Max-Planck-InstituteofPsychiatry,Munich,
Germany;3InstituteofMolecularMedicine,UniversityofBonn,Bonn,Germany;4DepartmentofSystemsBiology,ColumbiaUniversity,NewYork,NY,USA;5InstituteofHumanGenetics,
UniversityofBonn,Bonn,Germany;6GeneCenterandDepartmentofBiochemistry,Ludwig-Maximilians-UniversityMunich,Munich,Germany
Sarah,Kim-HellmuthTheimmunesystemplaysamajorroleinhumanhealthanddisease.Understandingvariabilityofimmuneresponsesonthepopulationlevelandhowitrelatestosusceptibilitytodiseasesisvital.Inthisstudy,weaimedtocharacterizethegeneticcontributiontointerindividualvariabilityofimmuneresponseusinggenome-wideassociationandfunctionalgenomicsapproaches.Forthispurpose,westudiedgeneticassociationstocellular(geneexpression)andmolecular(cytokine)phenotypesinprimaryhumancellsactivatedwithdiversemicrobialligands.Weisolatedmonocytesof134individualsandstimulatedthemwiththreebacterialandviralcomponents(LPS,MDP,andppp-dsRNA).Weperformedtranscriptomeprofilingatthreetimepoints(0min/90min/6h)andgenome-wideSNP-genotyping.Inaddition,weprofiledfivecytokinesproducedbyperipheralbloodmononuclearcellsactivatedbyfivecomponentsfromthesameindividualstoperformagenome-wideassociationstudy.Comparingexpressionquantitativetraitloci(eQTLs)underbaselineanduponimmunestimulationrevealed417immuneresponsespecificeQTLs(reQTLs).Wecharacterizedthedynamicsofgeneticregulationonearlyandlateimmuneresponse,andobservedanenrichmentofreQTLsindistalcis-regulatoryelements.AnalysisofsignsofrecentpositiveselectionandthedirectionoftheeffectofthederivedalleleofreQTLsonimmuneresponsesuggestedanevolutionarytrendtowardsenhancedimmuneresponse.Furthermore,multivariateGWASanalysisofcytokineresponsestodiversestimulirevealed159genome-widesignificantloci;however,onlyasmallnumberofthesecouldbereliablylinkedtopotentiallycausaleQTLsinmonocytes.Finally,giventhecentralroleofinflammationinmanydiseases,weexaminedreQTLsasapotentialmechanismunderlyinggeneticassociationstocomplexdiseases.WeuncoverednovelreQTLeffectsinmultipleGWASloci,andshowedastrongerenrichmentofresponsethanconstanteQTLsinGWASsignalsofseveralautoimmunediseases.Theseresultsindicateasubstantial,disease-specificroleofenvironmentalinteractionswithmicrobialligandsingeneticrisktocomplexautoimmunediseases.Whiletissue-specificityofmoleculareffectsofGWASvariantsisincreasinglyappreciated,ourresultssuggestthatinnateimmunestimulationisakeycellularstatetoconsiderinfutureeQTLstudiesaswellasintargetedfunctionalfollow-upofGWASloci.
93
PREDICTINGFATIGUESEVERITYINONCOLOGYPATIENTSONEWEEKFOLLOWINGCHEMOTHERAPY
KordM.Kober,XiaoHu,BruceA.Cooper,StevenM.Paul,ChristineMiaskowski
UniversityofCaliforniaSanFranciscoKord,KoberEffectivesymptommanagementisacriticalcomponentofcancertreatment.Computationaltoolsthatpredictthecourseandseverityofthesesymptomshavethepotentialtoassistoncologyclinicianstopersonalizethepatient’streatmentregimenmoreefficientlyandprovidemoreaggressiveandtimelyinterventions.Cancer-relatedfatigue(CRF)isthemostcommonsymptomassociatedwithcanceranditstreatments.CRFhasanegativeimpactonthepatients’abilitytotoleratetreatmentsandontheirqualityoflife.OneofthelimitationstoeffectivetreatmentofCRFistheavailabilityofavalidandreliablemodeltopredicttheseverityofCRF.Theobjectiveofthispilotstudywastogenerateapredictivemodelforfatigueseverity1weekafterchemotherapy(CTX)administration(T2)using28demographicandclinicalcharacteristicsthatwerecollectedjustpriortoCTXadministration(T1)inasampleof1042cancerpatientsundergoingCTX.Inthispilotstudy,weusedsupportvectorregression(SVR)withapolynomialkerneltopredicttheseverityoftheeveningfatiguebetweentwodifferenttimepointsduringacycleofCTX.Patientswithmissingdatawereremoved,leavingatotalof689forthisanalysis.Trainingandtestinggroupsconsistedof518and171patients,respectively.Weused10-times10-foldcross-validationroot-mean-squareerror(RMSE)toassessthefitofthepredictivemodel.OurmodelachievedanRMSE/meanof0.269.Thefivepredictorswiththehighestimportancewere:eveningfatigueatT1,morningfatigueatT1,attentionalfunction,sleepdisturbance,andperformancestatus.Thefivepredictorswiththelowestimportancewere:livingalone,caregivertoadult,andlevelofeducation,cyclelength,andnumberofmetastaticsites.Overall,clinicalcharacteristicsassociatedwithcanceranditstreatment,includingcancerdiagnosis,hadlowimportanceinthemodel.ThesefindingssuggestthattheexperienceandmechanismsofCRFmaybegeneralandnotcancerspecific.Thistypeofpredictivemodelcanbeusedtoidentifyhighriskpatients,educatepatientsabouttheirsymptomexperience,andimprovethetimingofpre-emptiveandpersonalizedsymptommanagementinterventions.Theseresultssuggestthattheintegrationofdemographicandclinicaldatacanenhanceclinicalprognosticprediction,whichwillcontributetothedevelopmentofprecisioncancermedicine.Ourmethodsaregeneralizabletoothertypesofsymptoms.
94
SINGLE-MOLECULEPROTEINIDENTIFICATIONBYSUB-NANOPORESENSORS
MikhailKolmogorov1,EamonnKennedy2,ZhuxinDong2,GregoryTimp2,PavelA.Pevzner1
1DepartmentofComputerScienceandEngineering,UniversityofCaliforniaSanDiego,USA;2ElectricalEngineeringandBiologicalScience,UniversityofNotreDame,USA
Mikhail,KolmogorovRecentadvancesintop-downmassspectrometryenabledidentificationofintactproteins,butthistechnologystillfaceschallenges.Forexample,top-downmassspectrometrysuffersfromalackofsensitivitysincetheioncountsforasinglefragmentationeventareoftenlow.Incontrast,nanoporetechnologyisexquisitelysensitivetosingleintactmolecules,butithasonlybeensuccessfullyappliedtoDNAsequencing,sofar.Here,weexplorethepotentialofsub-nanoporesforsingle-moleculeproteinidentification(SMPI)anddescribeanalgorithmforidentificationoftheelectricalcurrentblockadesignal(nanospectrum)resultingfromthetranslocationofadenaturated,linearlychargedproteinthroughasub-nanopore.Theanalysisofidentificationp-valuessuggeststhatthecurrenttechnologyisalreadysufficientformatchingnanospectraagainstsmallproteindatabases,e.g.,proteinidentificationinbacterialproteomes.
95
GENEEXPRESSIONPROFILEOFOSTEOARTHRITISAFFECTEDFINGERJOINTS
MilicaKrunic1,KlausBobacz2,ArndtvonHaeseler3
1CenterforIntegrativeBioinformaticsVienna,MaxF.PerutzLaboratories,UniversityofVienna,MedicalUniversityofVienna,Vienna,Austria;2DepartmentofInternalMedicine
III,DivisionofRheumatology,MedicalUniversityofVienna,Vienna,Austria;3BioinformaticsandComputationalBiology,FacultyofComputerScience,Universityof
Vienna,Vienna,AustriaMilica,KrunicOsteoarthritis(OA)isajointdisease,whichcanaffectanyjoint.However,themostfrequentnon-weightbearingjointsaffectedbyOAarehandjoints.ThemostcommonclinicalpresentationofhandOAispainandlossofhandstrength,whichrestrictstheabilityofpeopletoperformdailyactivities.MultiplefactorscancontributetothedevelopmentofthehandOA,ofwhichthemostfrequentlyobservedare:age,gender,genetics,obesity,occupation,andrepetitivejointusage.OAinproximalinterphalangeal(PIP)anddistalinterphalangeal(DIP)jointsisconsideredtobethemostcommoncauseofhandpainnowadays.Toourbestknowledge,thereisnopublishedresearch,whichindetailsaddressesuncleargeneticetiologyofthefingerOA.SincecartilageisoneofthemostcommonlydefectedtissueinOA,theaimofourstudywastoexploregeneexpressionprofileofchondrocitessampledfromtwofingerjoints:PIPandDIP,andtoinvestigatewhichpathwaysandgeneontologytermswerealteredinpatientsaffectedbythisdisease.
96
DISCOVERYANDPRIORITIZATIONOFDENOVOMUTATIONSINAUTISMSPECTRUMDISORDER
TaeyeopLee,JaehoOh,MinGyunBae,JunHyeongLee,JungKyoonChoi
DepartmentofBioBrainEngineering,KoreaAdvancedInstituteofScienceandTechnology(KAIST),RepublicofKorea
Taeyeop,LeeAutismspectrumdisorder(ASD)isaneurodevelopmentaldisordercharacterizedbyimpairedsocial-interaction,andrestrictedandrepetitivebehaviors.Previousstudieshavereportedthatthegeneticcontributionorheritabilityisashighas80%inASD.InordertoelucidatethegeneticarchitectureofASD,manyresearchersperformedextensivestudiesanddiscoveredsomesignificantfindings.Currently,hundredsofdifferentgeneshavebeenunveiled,mostlythroughidentificationofrelatedrarevariants.Raregeneticvariants,bothinheritedanddenovo,areproposedtobecausalin~30%ofASDpatients.Incomparison,commongeneticvariantsalsoareestimatedtocontributetoapproximately50%ofASDetiology.However,nospecificcommonriskvarianthasbeenfoundtodate,possiblyduetoinsufficientsamplesize.Here,wereportawholegenomesequencingstudyofASDpatientstodiscoverandcharacterizedenovomutationsinAsianpopulation.Bysequencing101autismtriosandunaffectedsiblings,welocatedcausalvariantsin74candidategenes.Thevariantsincludednotonlylossoffunctionandmissensevariants,butalsointronicandintergenicnon-codingvariants.Thecandidategenesetshowedsignificantoverlapwithknownautism,intellectualdisability,andchromatinrelatedgeneset.Furthermore,toprioritizethenon-codingdenovomutations,wedevelopedadeeplearningframeworkbasedon>2,000functionalfeatures.ThefeaturesincludedDNaseIhypersensitivesites,histonemodificationprofiles,diseasepathways,andtranscriptionfactorbindingsites,wherethenonlinearcombinationsofthefeaturesindicatethecausalprobabilityofanon-codingvariant.Theperformanceofthemodelwasevaluatedwithareaundercurve(AUC)andF1score.OurresultssuggestthatdenovovariantsarerelatedtoimportantASDriskgenes,andthatnoncodingdenovovariantshaveanon-zeroeffectinASD.
97
CROSSTALKER:ANOPENNETWORKANDPATHWAYANALYSISPLATFORM
SeanMaxwell,MarkR.Chance
CaseWesternReserveUnIversityandNeoProteomics,Cleveland,OhioMark,ChanceIntroduction:Networkanalysismethodshavebecomecommonplaceresearchtoolsduetotheirprovenabilitytointerrogateandorganizelistsofmoleculartargetsofinterestidentifiedbybasicstatisticsalone,anduseofnetworkanalysistorefineclassifierfeaturesetshasbeenshowntoprovidesuperiorperformancecomparedtotargetsidentifiedsingly.WeintroduceCrosstalkerasafreewareplatformforacademicusethatiswebbasedandincorporatesmultiplepublicinteractionandgenesetdatabasestoperformnetworkanalysis,enrichmenttestingandvisualizationinamodernHTML5+JSinterface.Theuseofopendatabasesandalgorithmscoupledtoconvenientuserchoicesallowscrosscomparisonoffindingsandpermitseasyreplicationofresultsbyanylaboratoryimprovingreproducibilityandrigor.Methods:Listsofseedmoleculesaremappedontoareferenceinteractionnetworkselectedbytheuserandarandomwalkwithrestarts(RWR)isperformedusingtheseedmoleculesastherestartnodes.TheRWRscoresareadjustedtoz-scoresusingMonte-Carloestimatedscoredistributionsforeachnodeintheinteractionnetwork,andwehaveoptimizedtheMonte-Carloestimationparametersusinganalyticmethodsandcomputationaltesting.Assumingthez-scoresfollowanormaldistribution,theadjustedscoresareusedtoselectnodesthathaveap<0.001chanceofachievingthesameorhigherRWRscorebychanceastheydofromtheuserinput.Theresultingmoleculesaretestedforenrichmentsagainstuserselectedgenesetdatabasesandusedtoinduceresultsubnetworksfromthereferencenetwork.Theinducedsubnetworksarevisualizedwithoptionstoannotatenodes(molecules)andedges(interactions).Results:Computationalexperimentsusinginputsgeneratedbycombiningannotatedsetsoffunctionallyrelatedmoleculeswithunrelated“noisemolecules”showedthatadjustingproximityscoresbynull-distributionimprovedpredictionsoffunctionallyrelatedmoleculesoverrank-onlymethodswhentheinputscontainedmorenoisemoleculesthanannotatedmolecules.Choicesofmultipleinteractionnetworks(likeBioGRID,BioPlexorCOXPRESdb)enabletestingofdifferenthypotheseswithinthesameinterface,suchasco-expressionordirect/indirectphysicalinteractionsofrelatedmolecules.Theoptimizedalgorithmsusedbythecomputationalportionofthesoftwarefacilitateanalysistimesunder1minute,minimizingwaittimesandmaximizingthenumberofconcurrentusersthesystemcansupport.NovelAspect:AnalyticallyverifiedMonte-Carloestimationparameters.Multipleoptionsforinteractionnetworksandgenesets.Web-basedwithoptionstoexportresultsanddatainopen(JSON,CSV)andbinary(XLSX)formats.Freeforacademicuse
98
SIGNATURESOFNON–SMALL-CELLLUNGCANCERRELAPSEPATIENTS:DIFFERENTIALEXPRESSIONANALYSISANDGENENETWORKANALYSIS
AbigailE.Moore1,BrandonZheng2,PatriciaM.Watson3,RobertC.Wilson3,DennisK.Watson3,PaulE.Anderson4
1DepartmentofNaturalScience,HampshireCollege,Amherst,MA01002,USA;2DepartmentofBiology,BardCollege,Annandale-On-Hudson,NY12504,USA;
3DepartmentofPathologyandLaboratoryMedicine,MedicalUniversityofSouthCarolina,Charleston,NC29425,USA;4DepartmentofComputerScience,Collegeof
Charleston,Charleston,SC29424,USAAbigail,MooreBackgroundLungcancerisboththesecondmostrepresentedcancerdiagnosisandtheleadingcauseofcancerdeathwithintheUnitedStates.Despitethehighoccurrenceofnon–small-celllungcancer(NSCLC),30%to55%ofpatientsrelapseaftercurativeresection,andthe5-yearrelativesurvivalrateis15%to21%.Thehighcostsofcancermedicationandcancerdrugfailuresareimpactedbybiomarkerprograms,whichhelpselectpatientswhomaybenefitfromagivendrug.MethodsNSCLCRNAsamplesweretakenfrom38patients,andclinicaloutcomesweredeterminedbytheAmericanCollegeofSurgeryOncologyGroup.Ofthesepatients,20werediagnosedasdisease-free,and18asrelapsepatientswithin3yearsofsurgicalresection.RNA-Seqlibrarieswerepaired-endsequencedonHiScanSQandHiSeq2500systems.ReadqualitywasdeterminedbyFastQC,andadaptersandlow-qualityreadsweretrimmedwithTrimmomatic.Trimmedpaired-endreadswerealignedtothehumangenome(HG38,UCSC)withRSEM.AlignedreadswereinputintotheR/BioconductorEBSeqpackagetoperformmediannormalizationanddifferentialexpressionanalysis.Differentiallyexpressedgeneswereanalyzedforover-representationofproteincomplexes,geneontologytermsandpathwaysviaConcensusPathDB.ResultsEmpiricalBayesianmethodsidentified122differentiallyexpressedgenes(FDR<0.05).Manylungcancer-relatedgeneswererecognized,suchasBAMBI,CPS1,CD70,SHISA3,andWNT11.Alsoidentifiedwerenovelgeneswithupregulatedexpressioninrelapsepatients:LILRA2,ALOX12,TSPAN-11,andCADM3,whichareinvolvedinimmuneresponse,arachidonicacidmetabolism,cellsurfacereceptorsignaling,andcell-celladhesion,respectively.Novelgeneswithdownregulatedexpressioninrelapsepatientswereidentified,includingMCCC1,MRGPRF,PRR4,andSLC7A14,whichareassociatedwithbiotinmetabolism,signaltransduction,celladhesion,andnegativeregulationofphosphataseactivity,respectively.Ahypergeometrictestrevealedover-representationofgeneontologytermsforbiologicalprocessesrelatedtocancerdevelopment:positiveregulationofcellproliferation(p=4.66e-06),lipoxygenasepathway(p=6.95e-05),andbeta-amyloidmetabolicprocess(p=0.000531).Onlyoneproteincomplex-basedsetwasover-represented:Gprotein-coupledreceptorligand.Accordingly,sixGPCR-relatedpathwayswereover-represented(p-valuesfrom6.77e-05to0.000196).Over-representationofothercancer-relatedpathwayswerefoundandincludeprostaglandinsynthesisandregulation(p=8.8e-05),fluoxetinemetabolismpathway(p=0.000217),andarachidonicacidmetabolism(p=0.000243).ConclusionsIdentifyingNSCLCpatientsatriskofrecurrenceiscrucialincancerresearch.Ouranalysesidentified122differentiallyexpressedgenesamongdisease-freeandrelapseNSCLCpatients,includingknownlungcancer-relatedgenesandnewcandidatebiomarkergenesthatareinvolvedinthediverseprocessesrelatedtoNSCLCdevelopment.Futureresearchinalternativesplicingandthedevelopmentofapredictivemodelbasedonourresultscouldsupportanewmethodforidentifyingindividualrecurrencerisk.
99
RANKINGBIOLOGICALFEATURESBYDIFFERENTIALABUNDANCE
SoumyashantNayak,NicholasLahens,EunJiKim,GregoryGrant
UniversityofPennsylvaniaSoumyashant,NayakWeoftenwanttorankfeaturesbytheirdifferentialabundancebetweentwopopulations.InRNA-Seqforexample,weobtainquantifiedvaluesfortensofthousandsofgenesacrossawidespectrumofexpressionintensities.Anaiverankingbyfold-changeleadstoseveralissues.Oneofthemisthedivision-by-zeroissuewhichhappenswhenthechangeisfrom0toapositivequantity.Thisproblemisusuallydealtwithbyusingapseudo-countof1.FoldchangesfromsmallernumbershowevercantendtodominatethetopofrankinglistsincaseofdiscretedatalikeRNA-Seq.Therefore,onemightwonderwhetherachangefrom1to2(foldchangeof2)istobeconsideredmoresignificantthanachangefrom100to190(foldchangeof1.9).Wesystematicallystudythisissueatboththeoreticalandempiricallevels.WeconcludethatinRNA-Seqdatathereisanoptimalvalueofthepseudo-countwhichyieldsthebestsignificancecomparisons.Weformulatethenecessaryfoundationalmathematicsintermsofaphilosophicalaxiomaticframeworktoenablethesystematicexplorationoftherankingproblem.Additionallywedemonstratehowtheuseofpseudo-countsactuallyintegratesfold-changeanddifferenceandthisobservationcanbeusedtoobtaintheadvantagesofbothmethods,whileminimizingthedisadvantages.
100
SYSTEMATICANALYSISOFOBESITYASSOCIATEDVARIATIONSTHROUGHMACHINELEARNINGBASEDONGENOMICSANDEPIGENOMICS
JaehoOh,JunHyeongLee,TaeyeopLee,MinGyunBae,JungKyoonChoi
DepartmentofBioBrainEngineering,KoreaAdvancedInstituteofScienceandTechnology(KAIST),RepublicofKorea
Jaeho,OhObesity,oneofthemajorglobalhealthconcerns,isametabolicdisorderresultingfrombothbehavioralandheritablecauses.Varioussolutions,suchasdiet,exericse,surgeryanddrugtherapies,havebeenproposedbutthesefailedtoprovidelong-termeffects.Manyresearchersperformedgenome-wideassociationstudies(GWAS)toidentifydisease-associatedgenomicregions,butinterpretationofthedataposesgreatchallenge.NumerousGWASanalysisstudiesreportthatFTOistheregionmostcloselyassociatedwithobesity,butthemechanismremainsunresolved.Accordingtoonerecentpaper,‘outsidevariants’,definedasSNPsthatareinweakLDwithGWASriskSNPsandinfluencetargetgene’sregulatorycircuitryincombination,shouldbefurtherinvesitgated.‘Outsidevariant’approachsuggestthatnotonlystatisticallysignificantGWASSNPsbutalsootherSNPsmaybebiologicallymeaningful.Todevelopanobesity-relatedmodelandunravelthemechanismthrough‘outsidevariant’approach,weusedtheimputedGWASdataof14,122subjectwithBMIinformation.Toselectfunctionalepigeneticregion,weusedhistonemodificationChIP-seqdatafromadipocytesandobesity-associatedtissuesandextractedSNPsetthatishighlyrelatedtoFTO.ByperformingregressionbetweenSNPsandFTOSNPs,wefoundSNPswithhighexplanatory-powerforobesityinthefunctionalepigenetic-region.Ourresultssuggestthatthe‘outsidevariant’analysis,alongwithseveralepigeneticdata,isanovelapproachtodiscoverasetofSNPs,includingSNPsthatappearstatisticallyinsignificant,thataffectobesity.
101
SPARSEREGRESSIONMODELINGOFDRUGRESPONSEWITHALOCALIZEDESTIMATIONFRAMEWORK
TeppeiShimamura,HidekoKawakubo,HyunhaNam,YusukeMatsui
DivisionofSystemsBiology,NagoyaUniversityGraduateSchoolofMedicine,JapanTeppei,ShimamuraAmajorchallengeinpharmacogenomicstudiesisdifferencesintheclinicalcharacterizationofpatientsandtheirreactions,whichmakesitdifficulttoidentifyclinicallymeaningfulgene-druginteractionsandpredictdrugresponseforeachpatient.Inthisstudy,weconsideralocalizedregressionmodelforeachsampletopredictadrugresponsewithasetofmaineffectsandsecond-orderinteractionsforoncogenicalterationsforpatients.Weproposeasparsemodelingofinteractionswithlocalizedestimationframework(SMILE)forthistask.Wetakearegularizationapproachtoinducingstronghierarchyinthesensethataninteractioncoefficientcanhaveanon-zeroestimateonlyifbothofcorrespondingmaineffectcoefficientsarenon-zero.Weincorporatetwodifferentconstraintsintothegrouplassoandthelassowithintheframeworkoflocallikelihood,todeterminethetypeofstructuresuchasstronghierarchyandenhancesparsityontheinteractioncoefficients,whichenabletogenerateaninterpretablelocalizedinteractionmodelforeachsample.Itcanbeformulatedasthesolutiontoaconvexoptimizationproblem,whichweusethealternatingdirectionmethodofmultipliers(ADMM)methodforsolvingSMILE.Wethendemonstratetheperformanceofourproposedmethodinasimulationstudyandonapharmacogenomicdataset.
102
PDBMAP:APIPELINEANDDATABASEFORMAPPINGGENETICVARIATIONINTOPROTEINSTRUCTURESANDHOMOLOGYMODELS
R.MichaelSivley1,JohnA.Capra2,WilliamS.Bush3
1DepartmentofBiomedicalInformatics,VanderbiltGeneticsInstitute,VanderbiltUniversity;2DepartmentofBiologicalSciences,VanderbiltGeneticsInstitute,VanderbiltUniversity;3DepartmentofPopulationandQuantitativeHealthSciences,Institutefor
ComputationalBiology,CaseWesternReserveUniversityRobert,SivleyRaregeneticvariantsidentifiedfromsequencingstudiesareoftengroupedbygenes,functionaldomains,andotherannotationstoincreasepowerintraitassociationtestsandidentifysharedphenotypiceffects.However,associationtestsrarelyconsidervariants’orientationintheirfunctionalcontext—three-dimensional(3D)proteinstructures.Varioustoolshavebeendevelopedforvisualizingspecificvariantsinthecontextofindividualproteinstructures;however,thesetoolsdonotsupportacomplete,systematicmappingofvariantsinidentifiedinsequencingstudiesintoallavailablesolvedandcomputationallypredictedproteinstructures.WedescribePDBMap,acomputationalpipelinetoefficientlymaphumangeneticvariationgeneratedbysequencingstudiesintothestructome.Wealsopresentthecompletemappingofmissensevariantsfromthe1000GenomesProject,GenomeAggregationDatabase(gnomAD,N=3,010,061),CatalogueofSomaticMutationsinCancer(COSMIC,N=1,104,417),ClinVar(N=56,235),andtheAlzheimer'sDiseaseSequencingProject(ADSP,N=891,849)intosolvedproteinstructuresfromtheProteinDataBank(N=31,688)andcomputationallypredictedhomologymodelsfromModBase(N=186,802).Sourcecodeisavailablefromhttps://github.com/capralab/pdbmapanddownloadsareavailableathttp://astrid.icompbio.net.
103
REPETITIVERNAANDGENOMICINSTABILITYINHIGH-GRADESEROUSOVARIANCANCERPROGRESSIONANDDEVELOPMENT
JamesR.Torpy1,NenadBartonicek1,DavidD.L.Bowtell2,MarcelE.Dinger1
1GarvanInstituteofMedicalResearch,384VictoriaStreet,Darlinghurst2010,Sydney,Australia;2PeterMacCallumCancerCentre,EastMelbourne,Victoria3002,Australia
James,TorpyOvariancancerisahighlycomplexdiseasewitharangeofdifferenthistologicalsubtypes.Thishighlylethaldiseaseisestimatedtobethefifthmostcommoncauseofdeathfromcancerinfemales,withafive-yearrelativesurvivalrateof46.2%.High-gradeserousovariancancer(HGSOC),characterizedbywidespreadgenomicinstability,accountsfor70-80%ofovariancancerdeaths,andsurvivalrateshavenotimprovedsignificantlyforthelastfewdecades.Furthermore,theunderlyingcauseofaround1/3ofHGSOCcasescannotbeexplained.EvidencesuggeststhatRNAderivedfromrepetitiveregionsofthegenomeplaysaroleingenomicinstabilityanddevelopmentofcancerssuchashigh-gradeserousovariancancer,andmayplayaroleintheunexplainedHGSOCcases.Aberrantexpressionofcentromere-derivedRNAcausesdysfunctionalchromosomalsegregationduringmitosisandaneuploidy.Telomere-derivedRNAmaintainstelomeres,preventingchromosomalfusion,breakageandsubsequentrearrangementofthechromosomes.RetrotransposableelementssuchasLINE1sandAlusinsertintodifferentgenomiclocations,disruptingsequencesandcausingrearrangementssuchasduplications,inversionsandtranslocations.Wehaveanalysedover120HGSOCcaseandcontrolRNA-sequencingdatasetsofprimarysamplesfromtheAustralianOvarianCancerStudy,comparingdifferencesinexpressionofrepetitiveRNAtranscriptsacrossmultipleHGSOCsubtypesandcontrols.WefoundarangeofdifferentiallyexpressedrepetitiveRNAspeciesincludingLINE1,Aluandcentromere-derivedRNAwhichmaybecontributingtogenomicinstabilityinthesetumours.InordertoinvestigatethepotentialcausesofthedifferencesinrepeatRNAlevels,theirexpressionwascorrelatedwithexpressionofarangeofmethyltransferasessuchasDNMT1andDNMT3A-Cthatareknowntoregulatemethylationatrepetitiveheterochromatin,controllingRNAexpressionfromtheseregions.ExpressionofRNAi-associatedfactorssuchasDicerwasalsoassessedasthesefactorscancontributetorepetitiveRNAregulation.
104
DIMENSIONREDUCTIONOFGENOME-WIDESEQUENCINGDATABASEDONLINKAGEDISEQUILIBRIUMSTRUCTURE
YunJooYoo1,Suh-RyungKim1,SunAhKim2,ShelleyB.Bull3
1DepartmentofMathematicsEducation,SeoulNationalUniversity;2DepartmentofStatistics,SeoulNationalUniversity;3ProssermanCentreforHealthResearch,The
Lunenfeld-TanenbaumResearchInstituteYunJoo,YooGeneticassociationanalysisusinghigh-densitygenome-widesequencingdataconsistingofsinglenucleotidepolymorphism(SNP)genotypescanbenefitfromvariousdimensionreductionstrategiesforseveralreasons.First,genome-widesignificancelevelforindividualSNPtestsshouldbedeterminedconsideringthecorrelationstructureofgenotypedata.AdjustmentforTypeIerrorinflationduetomultiplehypothesistestingcanbesoughtbasedonthedimensionreductionmethods.Second,increasedTypeIerrormaybereducedasthenumberofvariablesintheanalysisdecreasesbydimensionreduction.Third,thecomputationalburdencanbereducedasthecomplexityoftheanalysismodelisreduced.Fourth,thepowerofassociationtestcanbegainedbycombiningmultiplesignalsinagroupasaresultofthedimensionreductionstrategy.WedevelopedagenomepartitioningmethodbyclusteringSNPsintoblocksbasedonlinkagedisequilibriumstructure.ThealgorithmusesagraphmodelingofcommunitiesofhighlycorrelatedSNPsandappliesacliquepartitioningalgorithmtothegraphtopartitionSNPsintoblocks.Weappliedthealgorithmto1000GenomesProjectdata,andobtained162K,173K,334Kblocksincludingsingletonblocksintheautosomalregionsof22chromosomesforAsian,European,andAfricandatarespectively.TheaverageLDmeasurer^2(thePearsoncorrelationcoefficientoftwoadditivelycodedgenotypevariables)valueswithinblocksare0.465,0.437and0.329forAsian,European,andAfricandatawhereastheaverager^2valuesbetweenconsecutiveblocksare0.156,0.145,and0.098forthreepopulations.WeevaluatedtheTypeIerrorandthepowergainfromthesepartitionsforseveralmulti-SNPassociationtestsusingthesimulateddatabasedon1000GenomesProjectdata.Comparedtootherclusteringmethods,severaltestsusinglocaldimensionreductionstrategiescombinedwithgenome-widedimensionreductionshowedbetterpowerthanothermethods.Wealsodevelopedalocaldimensionreductionmethodforgenome-widesequencingdataespeciallytargetingthemulti-collinearityissueofdenseSNPgenotypedatatobeanalyzedbymultipleregressionanalysis.ThismethodclustersSNPsinmulti-collinearitybyexaminingthevarianceinflationfactor(VIF),andreplacessuchgroupbyprincipalcomponents.ThealgorithmproceedsiterativelyuntilallVIFvaluesareunderathresholdvalue.WhenwecomparedthepowerbetweentheanalysisbasedonoriginaldataandtheanalysisbasedonthedimensionreduceddatausingVIFevaluation,weobservedthepowergaininquadratic-typetestssuchasWaldtest.
105
THEMULTIPLEGENEISOFORMTEST
YaoYu,ChadD.Huff
DepartmentofEpidemiology,TheUniversityofTexasMDAndersonCancerCenter,Houston,Texas,USA
Chad,HuffGene-basedassociationtestsaggregatemultiplevariantsinagenetoevaluatestatisticalevidenceforrarevariantassociation.Typically,thesetestsincludevariantsfromallcodingexonsinagene,irrespectiveofgeneisoform.Forgeneswithmultipleisoforms,thisisoftenapproximatelyequivalenttoatestofthelargestisoform,whichisnotnecessarilyoptimal.Becausesmallerisoformstendtobeenrichedforthecorefunctionaldomainsofagene,theymayalsobeenrichedforpathogenicvariantsorlargervarianteffectsizes.Toaddresstheopportunitiespresentedbyisoform-specificpatternsofdiseasesusceptibility,weintroducetheMultipleGeneIsoformTest(MGIT).MGITemploysapermutationapproachtotesteachisoformofagene,summarizingthecontributionofeachtranscripttocalculateasinglegene-levelp-value,withouttheneedtoexplicitlymodelcorrelationbetweentranscripts.MGITcanbeappliedinconjunctionwithanygene-basedassociationtesttoassessgene-levelsignificanceandtoidentifyisoformsthatmaybeenrichedforproteindomainsimpactingdiseaserisk.TodemonstratetheutilityofMGIT,wereportresultsfromagene-basedassociationtest(VAAST)involving783breastcancercases,322skincutaneousmelanomacases,and3,607controlsofEuropeanancestry.Fortwoestablishedcancergenes,weobservedatwo-foldandthree-foldreductioninp-valuewithMGITrelativetoawhole-genetest,forMITFinmelanomaandBRCA1inbreastcancer,respectively.Incontrast,forotherestablishedcancergenes,weobservedeithernochangeinp-value(RAD51BandBRCA2inbreastcancerandMC1R,MTAP,andBRCA2inmelanoma)oramodestattenuationofassociationsignal(CHEK2inbreastcancer).InthecaseofBRCA1,thedifferenceintheMGITassociationsignalwasprimarilydrivenbyrare,predicteddamagingmissensevariants,whichexhibitedlargedifferencesineffectsizebetweenthesmallestandlargestisoforms.MGITisimplementedinthesoftwarepackageXPAT,withsupportforVAAST,SKAT-O,and27additionalgene-basedassociationtests.
106
IMAGINGGENOMICS
POSTERPRESENTATIONS
107
GENETICANALYSISOFCEREBRALBLOODFLOWIMAGINGPHENOTYPESINALZHEIMER’SDISEASE
XiaohuiYao1,ShannonL.Risacher2,KwangsikNho2,AndrewJ.Saykin2,HengHuang3,ZeWang4,LiShen2
1SchoolofInformaticsandComputing,IndianaUniversity,Indianapolis;2DepartmentofRadiologyandImagingSciences,IndianaUniversitySchoolofMedicine;3Departmentof
ElectricalandComputerEngineering,UniversityofPittsburgh;4DepartmentofRadiology,LewisKatzSchoolofMedicine,TempleUniversity
Heng,HuangCerebralbloodflow(CBF)providesameanstoassesstheneuronalandneurovascularconsequencesofAlzheimer’sdisease(AD)pathology.BothADspecificandnon-specificCBFchangesmaybedrivenbyuniqueorcommongeneticfactors.ToidentifygeneticvariantsassociatedwithADpathogenesis,weperformedatargetedanalysistoexamineassociationbetween4,033SNPsof24ADcandidategenesandCBFphenotypesmeasuredbyarterialspinlabeling(ASL)magneticresonanceimaging(MRI)infourbrainregionsofinterest(ROIs)includingleftangular,rightangular,lefttemporalandrighttemporalgyri.Participantsinclude258non-HispanicCaucasiansubjectsfromtheAlzheimer'sDiseaseNeuroimagingInitiative(ADNI)cohort.TargetedgeneticassociationanalysisofCBFoneachROIwastestedusinglinearregressionunderanadditivegeneticmodelinPLINK,whereage,genderandAPOEɛ4statuswereincludedascovariates.Post-hocanalysisusedBonferronicorrectionforadjustingboththegeneticandCBFmeasures.GATESwasusedtocalculategene-levelp-values.TheadditiveeffectsoftheidentifiedgeneticvariantsfromtheaboveassociationanalysiswerealsoassessedateachvoxelusingSPM12underone-wayANOVAtestwithage,genderandAPOEɛ4statusascovariates.Thesinglenucleotidepolymorphism(SNP)levelanalysisidentifiedanovellocusinINPP5D(inositolpolyphosphate-5-phosphataseD)significantlyassociatedwithleftangulargyrus(L-AG)CBF.Ingene-basedanalysis,bothINPP5DandCD2AP(CD2associatedprotein)wereassociatedwithL-AGCBF.ThediscoveredINPP5Dlocusexplained8.29%varianceofleftangularCBFafteradjustingforage,genderandAPOEɛ4status.FurtheranalysesonanindependentsubsetoftheADNIsamples(N=906)revealedthattheminoralleleofthelocuswasassociatedwithlowercerebrospinalfluidt-tau/Aβ1-42ratio.INPP5Dfunctionsasanegativeregulatorinimmunesystemandanumberofinflammatoryresponses,andhasbeenfoundrelatedtoinhibitTREM2signaling.TheidentifiedCBFriskfactorhasthepotentialtoprovidenovelinsightsforbetterrevealingthecomplexmolecularmechanismsofAD.ItwarrantsfurtherinvestigationwhethertheriskfactorisassociatedwiththeADpathophysiology,thevascularpathophysiology,and/ortheirinteraction.
108
PBRM1MUTATIONSAREASSOCIATEDWITHTISSUEMORPHOLOGICALCHANGESINKIDNEYCANCER
JunCheng1,JieZhang2,ZhiHan2,LiangCheng2,QianjinFeng1,KunHuang2
1SouthernMedicalUniversity,2IndianaUniversitySchoolofMedicineKun,HuangBackground:Clearcellrenalcellcarcinoma(CCRC)isthemostcommonkidneycancer.Withtheaccumulationoflargescalegenomicdata,geneswithmutationsthatarecommontoCCRCpatientshavebeenidentified.Forinstance,VHLhasmutationsinalmost49.9%oftheCCRCpatientsinTheCancerGenomeAtlas(TCGA)projectfollowedbyPBRM1,MUC4andSETD2.WhilesomeofthesegeneshavebeenestablishedasdrivergenesforCCRC(e.g.,VHLandSETD2),thefunctionalimplicationsoftheirmutationsarestillbeingcharacterized.Previousstudiesoftenfocusedontheeffectsofthemutationsonmolecularlevelssuchasgene/microRNAexpressionandDNAmethylation.Inthisstudyweaimtocharacterizethemorphologicalchangesatcellularandtissuelevelsassociatedwiththesemutations.Methods:Mutationalstatusandhistopathologicalimagingdatafor448CCRCpatientswereobtainedfromTCGAthroughtheNCIGenomicDataCommons.Therearesixgeneswithmutationsinmorethan7%ofthepatients,theyareVHL,PBRM1,MUC4,SETD2,BAP1,andMTOR.Theimagingfeatureswerethenextractedusingcomputationalpipelinewehavepreviouslydeveloped.Ourpipelineconsistsofthreesteps:nucleussegmentation,cell-levelfeatureextraction,andaggregatingcell-levelfeaturesintopatient-levelfeatures.Tentypesofcell-levelfeatureswereextractedincludingnucleararea(area),lengthsofmajorandminoraxesofcellnucleusandtheirratio(major,minor,andratio),meanpixelvaluesofnucleusinRGBthreechannelsrespectively(rMean,gMean,andbMean),andmean,maximum,andminimumdistances(distMean,distMax,anddistMin)toneighboringnucleiinDelaunaytriangulationgraph.Atlast,allcell-levelfeaturesfromthesamepatientwereaggregatedintopatient-levelfeaturesusingabag-of-visual-wordsmodelwithK-means(K=10)algorithmforlearningwords.Fiveadditionalparameterswerecalculatedforeachtypeoffeatures-mean,standarddeviation,skewness,kurtosis,andentropy.Thusthereare150imagefeaturesintotal.Foreachselectedgene,thefeatureswerecomparedbetweenpatientswithandwithoutmutationsusingMann-Whitney-Utests.Results:Whilethereareimagingfeatureswithp-valuelessthan0.05foreverygene,multipletestcompensation(BHFDR)suggestedthatonlyPBRM1mutationsareassociatedwithsignificantlydifferentimagingfeatures(69featureswithq-value<0.05).Amongthem‘distMax_bin2’,‘distMin_bin3’,‘ratio_bin9’showsignificantlyincreasesinthemutationgroupwhile‘distMean_std’,‘major_std’and‘ratio_std’showsignificantdecreases.DiscussionandConclusion:TheaboveresultssuggestthattumorcellsinthepatientswithPBRM1mutationsaremorecompactandtheirnucleishapesaremorehomogeneousandclosertoaroundshape.TheseresultsareconsistentwithvisualinspectionandpreviousreportthatPBRM1mutationleadstodecreaseofextracellularmatrixgeneexpressionandthusareductionofstroma.
109
IMAGEGENOMICSOFINTRA-TUMORHETEROGENEITYUSINGDEEPNEURALNETWORKS
HuiQu1,SubhajyotiDe2,DimitrisMetaxas1
1RutgersUniversity,2CancerInstituteofNewJerseyDimitris,MetaxasIntra-tumorheterogeneityi.e.genetic,molecular,andphenotypicdifferencesbetweentumorcellswithinasingletumorisamajorchallengeforclinicalmanagementofcancerpatients,contributingtotherapeuticfailure,diseaserelapsesanddrugresistance.Whilerecentfindingssuggestthatthereisextensiveintra-tumorgeneticheterogeneityinallmajorcancertypes,itremainstobeunderstoodhowthatrelatestointra-tumorheterogeneityatthepathway-andcellphenotypelevel.Wehavedevelopedaninnovativecomputationalframeworkbasedonneuralnetworkstoidentifycellularfeaturesfromhistologicalslidesandthenassociatethemwithgenomicandpathway-levelfeaturesinamulti-scalemodel,beforeapplyingittoacohortof469bladdercancersampleswhichhasgenomic,transcriptomic,pathway,andhistologicalimagingdata.Inbrief,ourmethodfirstusesaTumorSegmentationNetwork(TSN)andNucleiSegmentationNetwork(NCN)toidentifytumorcellsregionsandtumornucleiinthehistologicalslides.Fortumorsegmentation,wefirstlyextractedtumorandnormalpatchesfromthewholeslideimagesof40patients,thentrainedaTSNtoclassifyanypatchintotumorornormal.Givenanyotherwholeslideimage,thetrainedmodelcanidentifyalltumorpatches,whichformsthetumorregionsaftermorphologicaloperations.Thesegmentedtumorregionsandnucleiarethenusedtocomputeq-statistic,andalsoalphaandbetadiversitymeasureswhichreflectextentoflocalandregionalintra-tumorphenotypicheterogeneity.Benchmarkingagainstpathologicallycuratedestimatesindicatesthatthisapproachhashighaccuracyinidentifyingtumorcellfeaturesinaheterogeneoustumor.Wethenintegrateimagingandgenomicsdatatopredictaspectsofphenotypicheterogeneitybasedoncancer-relatedmutationsandgeneexpressionusinguni-andmultivariateapproachessuchasRelationNetwork(RN).Ourpreliminaryresultsareconsistentwithbiologicalknowledge.Forexample,weestimatedthenumberofsubclonesineachtumorbasedonmutationdata,andobservedthatindeedthesampleswithahighnumberofsubcloneshavehighphenotypicheterogeneityscores.WealsoestimatedmRNAexpressionlevelofKi67,amarkerofcellgrowthandobservedthatthesampleswithhigherq-statisticalsohadhigherKi67expression,suggestingthatcertainpatternsofintra-tumorheterogeneitycorrelatewithtumorcellgrowthrates.Multi-scaleanalysisintegratinggenetic,pathway-andphenotypicheterogeneitywillprovidefundamentalinsightsinto“functional”variabilitywithinandacrosscancers,helpingtorefineprecisionmedicineapproachestoimproveclinicalmanagementofcancerpatients.
110
THENEUROIMAGINGINFORMATICSTOOLSANDRESOURCESCOLLABORATORY(NITRC)ANDITSIMAGINGGENOMICSDOMAIN
LiShen1,DavidKennedy2,ChristianHaselgrove2,AbbyPaulson3,NinaPreuss3,RobertBuccigrossi3,MatthewTravers3,AlbertCrowley3,andTheNITRCTeam3
1DepartmentofRadiologyandImagingSciences,IndianaUniversitySchoolofMedicine;
2DepartmentofPsychiatry,UniversityofMassachusettsMedicalSchool;3TCG,Inc.Li,ShenAimofInvestigation:NeuroimagingInformaticsToolsandResourcesCollaboratory(NITRC)isaneuroinformaticsknowledgeenvironmentforMR,PET/SPECT,CT,EEG/MEG,opticalimaging,clinicalneuroinformatics,computationalneuroscience,andimaginggenomicstoolsandresources.WeencourageresearcherstolisttheirImagingGenomicstoolsattheNITRCwebsitewww.nitrc.org.Methods:Initiatedin2006throughtheNIHBlueprintforNeuroscienceResearch,NITRC’smissionistofosterauser-friendlyknowledgeenvironmentfortheneuroinformaticscommunity.In2012,NITRCaddedImagingGenomicstoitsbroadenedscientificscope.Bycontinuingtoidentifyexistingsoftwaretoolsandresourcesvaluabletothiscommunity,NITRC’sgoalistosupportitsresearchersdedicatedtoenhancing,adopting,distributing,andcontributingtotheevolutionofneuroinformaticsanalysissoftware,data,andcomputeresources.Results:Locatedonthewebatwww.nitrc.org,theResourcesRegistry(NITRC-R)promotessoftwaretoolsandresources,vocabularies,testdata,anddatabases,therebyextendingtheimpactofpreviouslyfunded,neuroimaginginformaticscontributionstoabroadercommunity.NITRC-Rgivesresearchersgreaterandmoreefficientaccesstothetoolsandresourcestheyneed,bettercategorizingandorganizingexistingtoolsandresources,facilitatinginteractionsbetweenresearchersanddevelopers,andpromotingbetterusethroughenhanceddocumentationandtutorials—allwhiledirectingthemostrecentupgrades,forums,andupdates.Asof11/2017,over970publicresourcesarelistedonNITRC-R,wheretheImagingGenomicsdomainincludes60resourcessuchasADNI,TCGA,ENIGMA,UKBiobank,andothers.NITRC-ImageRepository(NITRC-IR)makes8,285imagingsessionspubliclyavailableatnocharge,andNITRCComputationalEnvironment(NITRC-CE)providescloud-basedcomputationservicesdownloadabletoyourmachinesorviacommercialcloudproviderssuchasAmazonWebServicesandMicrosoftAzure.Conclusions:Insummary,NITRCisnowanestablishedknowledgeenvironmentfortheneuroimagingcommunitywheretoolsandresourcesarepresentedinacoherentandsynergisticenvironment.Withitsexpandedscopeintoimaginggenomics,NITRCaimstobecomeatrustedsourceforidentificationofresourcesinthishighlyactiveandpromisingdomainbridgingadvancedneuroimagingandgenomics.Weencouragetheimaginggenomicsresearchcommunitytocontinueprovidingvaluableresources,designandcontentfeedbackandtoutilizetheseresourcesinsupportofdatasharingrequirements,softwaredisseminationandcost-effectivecomputationalperformance.Acknowledgements:FundedbytheNIHBlueprintforNeuroscienceResearch,NIBIB,NIDA,NIMH,andNINDS.
111
IDENTIFYINGTHEGISTOFCNNS:FINDINGINTERPRETABLESIGNATURESOFHISTOLOGYIMAGEMODELSBUILTUSINGNEURALNETWORKS
ArunimaSrivastava1,ChaitanyaKulkarni1,KunHuang2,ParagMallick3,RaghuMachiraju1
1TheOhioStateUniversity,2IndianaUniversitySchoolofMedicine,3StanfordUniversityArunima,SrivastavaConvolutionalNeuralNetworks(CNNs)havegainedsteadypopularityastheselectedmethodofhistologyimageanalysisandsubsequentdiseasemodeling.SinceCNNsarepurelydatadrivenlearningmodels,theyhaveanedgeovermorphologydriven(pre-selected)tissueimagefeaturesthatmaybebiasedanddifficulttogeneralize.Morphologicalfeatures,namelytissuetexture,structure,nucleisizeandshape,presenceoffibroblastsandlymphocytesetc.,mightnotbecomprehensiveenoughfordifferentdatasets,buttheydoprovideaninherentlyinterpretablecharacterizationofthehistology.WhileCNNsandtheirsubsequentfeaturesprovetobepowerfulclassifiers,theyfailtoprovideanexplanationforthisclassification,asthefeaturesareONLYinterpretablebytheCNNsthemselves.Translating“underthehood”activitiesofaCNNwouldendeavortomakeitmoregeneralizablewhilethefinalmodelwillnotonlybeabletoeffectivelyclassifywholeslidetissueimages,itwillalsohavethepotentialtoeducateusonthenuancesofthehistologicaldata.Thisworkaimstousebothtypesofinterpretable(morphological)andpowerfulbutun-interpretable(CNNbased)featurestoderiveasignatureforsuccessfulCNNmodels,whichhelprelatethemtoknownbiologicalattributesandshedlightoncomponentsthatarecriticaltothevarioussubtypesunderinvestigation.WeuseastratifiedbreastcancerhistologyclassificationdatasetfromtheBioImaging(2015)Challengethatcontainssampleimagesfromfourdifferentkindsofbreasttissue(Normal,Benignlesion,In-situcarcinomaandInvasivecarcinoma).Byfollowingatwo-prongedapproachofmodelingthesamedatasetusingCNNs(usingtheGoogLeNetarchitecture)andmorphologicalfeatures(usingCellProfiler-abiologicalimageanalyticstool),itwaspossibletoinferaninterpretablesignatureoffeaturesutilizedbytheCNN.Weadditionallyexplorethepossibilityofcombiningthesetwotechniquestoextractamorepowerfulandpreciseclassification.Thisworksummarizestheneedforunderstandingthewidelytrustedmodelsbuiltusingdeeplearning,andaddsalayerofbiologicalcontexttoatechniquethatfunctionedasaclassificationonlyapproachtillnow.
112
PRECISIONMEDICINE:FROMDIPLOTYPESTODISPARITIESTOWARDSIMPROVEDHEALTHANDTHERAPIES
POSTERPRESENTATIONS
113
EXPLORINGTHEPOTENTIALOFEXOMESEQUENCINGINNEWBORNSCREENING
StevenE.Brenner1,AashishN.Adhikari1,YaqiongWang1,RobertJ.Currier2,RenataC.Gallagher3,RobertL.Nussbaum4,YangyunZou1,UmaSunderam5,JosephSheih3,FlaviaChen3,MarkKvale3,SeanD.Mooney6,RajSrinivasan5,BarbaraA.Koenig3,PuiKwok3,
JenniferM.Puck3,TheNBSeqProject
1UniversityofCalifornia-Berkeley,2CaliforniaDepartmentofPublicHealth,3UniversityofCalifornia-SanFrancisco,4Invitae,5TataConsultancyServices,6Universityof
WashingtonSteven,BrennerTheNBSeqprojectisevaluatingeffectivenessofwholeexomesequencing(WES)fordetectinginbornerrorsofmetabolism(IEM)fornewbornscreening(NBS).De-identifiedarchiveddriedbloodspotsfromMS/MStruepositiveandfalsepositivecasespreviouslyidentifiedintheCaliforniaNBSwerestudied.18outof137affectedindividualslackedtworarepotentiallydamagingsinglenucleotidevariantsorshortindelsingenesresponsiblefortheirMendeliandisorders.Thesensitivityofcausalmutationdetectionin137PhaseINBSeqexomesvariedacrossdisorders;allaffectedPKUcaseswerepredictedcorrectly,butseveralcasesofotherIEMsweremissed.Insomecases,exomesalsoconfidentlyidentifieddisordersdifferentfromthemetaboliccenterdiagnoses,suggestingthatsequencinginformationwouldhavebeenvaluableforproperclinicaldiagnosesinthosecases.Deeperanalysisofthedatawasundertakentoassesssourcesofdiscrepancybetweensequencingresults,MS/MScall,andclinicaldiagnosis.Copynumbervariation(CNV)callingtoolswereevaluatedonNBSeqexomesforabilitytoresolvesomeoftheseexomefalsenegatives.CNVtoolscanbothmissCNVsinexomesandreportthemspuriously.Weoptimizedtoolsforourdataandfilteredoutgenes(PRODH,HCFC1,ETFA)harboringcommonCNVs(identifiedfromCNVcallsonthe1000genomesprojectexomes).Thisidentifieddeletionsinthecorrectgenesfor4ofthe32exomefalsenegativesusingXHMM:2isovalericacidemiacases,1methylmalonicacidemiacaseand1OTCdeficiencycase.Wealsosystematicallyreviewedeveryvariantin78metabolicdisordergenesannotatedbyHGMDorClinVaraspathogenicorlikelypathogenicwith1000genomesMAF>0.1%.Ourre-assessmentoftheprimaryliteraturefor59suchvariantsfoundthatonly18werereportable(manystillVUS)andtherestweexcludedfromthepipeline.Literaturereviewalsohelpedidentify8casesdiagnosedwithshort-chainacyl-CoAdehydrogenase(SCAD)deficiencybutnotflaggedbyexomes.All8individualsharboredacommon(1000GenomesMAF:18.2%)ACADSallele(c.625A>G)presentinseveralNBSeqexomes,whichsometimesconfersapartialbiochemicalphenotypebutnotclinicaldisease.Forassessment,wetreatedtheseindividualsasunaffected.IncorporationofCNVdetectionandvariantcurationintoouranalysispipelineimprovedoverallsensitivityfrom77.9%to87.6%onthe137affectedPhaseINBSeqsamples.ThisupdatedpipelinewillberunonadditionalNBSeqexomestoassessthepotentialroleforWESinNBS.WhilestillnotsufficientlyspecificaloneforscreeningofmostIEMs,WEScanfacilitatetimelyandmoreprecisecaseresolution.
114
AMETHODFORIMPROVEDVARIANTCALLINGATHOMOPOLYMERMARGINS(ANDELSEWHERE)
J.Buckley,M.Hiemenz,J.Biegel,T.Triche,A.Ryutov,D.Maglinte,D.Ostrow,X.Gai
CenterforPersonalizedMedicine,Children’sHospitalofLosAngelesJonathan,BuckleyAllsequencingtechnologiesaresubjecttoreaderrorswhich,inthecontextofvariantcalling(particularlylowvariant-allele-frequency(VAF)variantcalling),canyieldmiscalls.Readerrorsaremostproblematicwhengenomiccontext(suchasproximitytohomopolymers)influencestheerrorrateTheCenterofPersonalizedMedicineatChildren’sHospitalofLosAngeles(CHLA)recentlycollaboratedwithThermo-Fisher(TF)indevelopmentofaclinicalpediatriccancerpanelforsomaticvariantdetection(OncoKidsTM),usingTF’sIonTorrentsequencingplatform.Thetestneededtoidentifyvariantsintumorsub-clonesandinsampleswithanadmixtureoftumorandnormalcells,bothsituationsthatcanyieldlowVAFs.Ourchallengewastooptimizevariantcallingathomopolymermargins,andothergenomiclociwithahighbackgrounderrorrate(noise).TheTFapproachwastoidentifyproblematiclociandtoeitherlimitbasecallstoreadsfromonestrand(whenerrorsclusteredmostlyontheotherstrand),or‘blacklist’thelocusaltogether.Whilethisapproachwasconservative,avoidingmostfalsepositives,itresultedinunacceptablefalsenegativerates,particularlyforInDels.Giventhedeepcoverage(over1000xinmanyregions),itseemedlikelythatamorenuancedapproachmightyieldaccuratecalls,eveninthepresenceofsubstantialnoise.Thispresentationoutlinesanalgorithm(LocalAdjustmentforBackground,orLAB)developedatCHLAthatusesareferencedataset(filteringouttruepositives)toestablishthenoisedistributionateachlocus.Thenoisedistributionvariesgreatlyacrossthepanelgenes,fromessentiallyerror-freelocitolociinwhichthemajorityofreadsshowaspuriousbasesubstitutionorInDel.Whileproximitytoahomopolymerisastrongdeterminantofnoise,non-homopolymerregionscanalsohavehighnoiseandmanyhomopolymersyieldrelativelycleandata.VariantcallsaremadethroughcomparisonoftheobservedVAFwiththelocus-specificVAFdistributioninthereference.Optionally,thereferencesetcanbelimitedtosamplesofthesametypeasthetestsample(e.g.FFPE).Adjustmentsmaybemadeforsampleswithgloballyincreasederrorrates.InregionsofcomplexInDelpatterns,astatisticalmodeltestsforshiftsinthesepatterns,indicativeofatruevariant.AnimportantcomponentisaGUIthatprovidesavisualrepresentationofthebasisforacall,andoptionssuchasstrand-specificanalysis.ApplicationtosampleswithknownSNVsandInDels(Acrometrix‘groundtruth’samples)resultedinimprovementinInDelcallsfrom65%to100%.Thepresentationwilldescribethecallingpipeline,withillustrativeexamples,andpresentcomparativeperformancedata.
115
EFFICIENTSURVIVALMULTIFACTORDIMENSIONALITYREDUCTIONMETHODFORDETECTINGGENE-GENEINTERACTION
JiangGui,XuemeiJi,ChristopherI.Amos
DepartmentofBiomedicalDataScience,GeiselSchoolofMedicine,Dartmouth,Lebanon,NH03756
TheproblemofidentifyingSNP-SNPinteractionsincase-controlstudieshasbeenstudiedextensivelyandanumberofnewtechniqueshavebeendeveloped.Littleprogresshasbeenmade,howeverintheanalysisofSNP-SNPinteractionsinrelationtocensoredsurvivaldata.Wepresentanextensionofthetwoclassmultifactordimensionalityreduction(MDR)algorithmthatenablesdetectionandcharacterizationofepistaticSNP-SNPinteractionsinthecontextofsurvivaloutcome.TheproposedanEfficientSurvivalMDR(ES-MDR)methodhandlescensoreddatabymodifyingMDR’sconstructiveinductionalgorithmtouselogrankTest.WeappliedES-MDRtogeneticdataofover470,000SNPsfromtheOncoArrayConsortium.Weuseonsetageoflungcancerandcase-control(n=27,312)statusasthesurvivaloutcomeanddivideddataintotrainingandtestingsets.Wealsoadjustforsubject’sage,genderandsmokingstatus.Fromtrainingset,weidentifiedinterationbetweenSNPsfromBRCA1andIL17RCgenesasthetopmodelthatisassciatedwithlungcanceronsetage.Thisresultisvalidatedinthetestingset.ES-MDRiscapableofdetectinginteractionmodelswithweakmaineffects.Theseepistaticmodelstendtobedroppedbytraditionalregressionapproaches.
116
BIOINFORMATICSPROCESSINGSTRATEGIESFOREFFICIENTSEQUENCINGDATASTORAGEUSINGGVCFBANDING
NicholasB.Larson,ShannonK.McDonnell,IainF.Horton,SaurabhBaheti,JeanetteE.Eckel-Passow,StevenN.Hart
MayoClinic
Nicholas,LarsonAnemergingchallengeintheeraofnext-generationsequencing(NGS)isefficientdatastoragepractices,particularlyforfileformatsthataccommodateadhocconstructionofanalysis-readydatasets.TheVariantCallFormat(VCF)isthepredominantfiletypeusedforstoringandanalyzingNGS-basedgeneticvariantinformation.However,itpresentsmultiplepracticallimitationswhenmergingindividualfilesformulti-samplerepresentations.RecentdevelopmentofthegVCFfileformatbyGATKaddressesmanyoftheseconcernsbycharacterizingsame-as-referencesegmentsofthegenomeasintervalentriesdefinedbyasharedgenotypequality(GQ)score.CurrentdefaultsettingstogeneratethisintermediatefileformatresultinanewdataentryateachbasepairpositiontheGQshifts,presentingcost-benefitconsiderationsofimprovedandcomputationallyefficientmulti-samplegenotypingattheexpenseoflargeintermediatefiles.However,additionaloptionsallowforcontiguousentriestobemergediftheyfallwithinapredefinedGQbin,aprocessknownasbanding.WehypothesizedthatsubstantialgVCFfilesizereductioncouldbeattainedforwhole-genomesequencing(WGS)throughtheuseofcoarseGQbandingoptions;althoughtheimpactofthisapproachonoutputqualityofmulti-samplevariantcallingiscurrentlyunknown.ToinvestigatethepropertiesofgVCFbandingongenotypingintegrity,weprocessed50WGSsamplesaswellas50whole-exomesequencing(WES)samplesfromtheMayoClinicBiobankunderavarietyofGQbandingsettings(default,intervalsof10,{0,20,60},{0,20}).Thesesingle-samplegVCFfilesweresubsequentlymergedandjointgenotypedundervaryingcombinationsofbandingoptions,separatelybysequencingapplication,andoutputgenotypesforchromosome22werecomparedforconcordancewithresultsusingcompleteinformation(i.e.,nobanding).Overall,WGSsamplesexhibitedsubstantiallysmallergVCFfiles,with{0,20}bandingresultinginameanfilesizereductionof87%(range:84-90%)relativetodefaultsettings.Genotypeconcordanceexceeded99.9%underallcomparisons,whileweadditionallyobservedmorevariablepositionsemittedascoarserbindefinitionswereapplied.ComparablefindingswereobservedforWESdata.OurresultshighlightimpressiveimprovementsinNGSvariantcalldatastorageefficiencygainedbycoarsebandingoptionsforgVCFoutput,withminimalimpactonaccompanyinggenotypingquality.
117
IDENTIFICATIONOFANOVELTSC2MUTATIONINAPATIENTWITHTUBEROUSSCLEROSISCOMPLEX
Jae-HyungLee1,Su-KyeongHwang2,Jung-eunYang3,Chae-SeokLim3,Jin-ALee4,KyungminLee5,Bong-KiunKaang3,Yong-SeokLee6
1KyungHeeUniversity,2KyungpookNationalUniversityHospital,3SeoulNational
University,4HannamUniversity,5KyungpookNationalUniversityGraduateSchoolofMedicine,6SeoulNationalUniversityCollegeofMedicine
Yong-Seok,LeeTuberoussclerosiscomplex(TSC)isaneurocutaneousdisordercharacterizedbymultiplesymptomsincludingneuropsychologicaldeficitssuchasseizures,intellectualdisability,andautism.TSCisinheritedinanautosomaldominantpatternandiscausedbymutationsineithertheTSC1orTSC2genes,whichresultinthehyperactivationofthemammaliantargetofrapamycin(mTOR)signalingpathway.Inthisstudy,weidentifiedanovelsmalldeletionmutationinTSC2byperformingwholeexomesequencinginaKoreanpatient,whoexhibitedmultipleTSC-associatedsymptomsincludingfrequentseizures,intellectualdisability,languagedelays,andsocialproblems.Inaddition,wevalidatedthefunctionalsignificanceofthenovelmutationbyexaminingtheeffectofthedeletionmutantonmTORpathwayactivation.RecentstudieshavesuggestedthatmTORinhibitorssuchasrapamycincanbeeffectivetotreatTSC-associateddeficitsinrodentmodelsofTSC.Accordingly,wefoundthateverolimustreatmenthasbeneficialeffectsonSEGAsizeandautismrelatedbehaviorsinthepatient.
118
CONSIDERATIONSFORAUTOMATEDMACHINELEARNINGINCLINICALMETABOLICPROFILING:ALTEREDHOMOCYSTEINEPLASMACONCENTRATION
ASSOCIATEDWITHMETFORMINEXPOSURE
AlenaOrlenko1,JasonH.Moore1,PatrykOrzechowski1,2,RandalS.Olson1,JunmeiCairns3,PedroJ.Caraballo3,RichardM.Weinshilboum3,LieweiWang3,MatthewK.
Breitenstein1
1UniversityofPennsylvania;2AGHUniversityofScienceandTechnology,Krakow,Poland;3MayoClinic
Alena,OrlenkoWiththematurationofmetabolomicsscienceandproliferationofbiobanks,clinicalmetabolicprofilingisanincreasinglyopportunisticfrontierforadvancingtranslationalclinicalresearch.AutomatedMachineLearning(AutoML)approachesprovideexcitingopportunitytoguidefeatureselectioninagnosticmetabolicprofilingendeavors,wherepotentiallythousandsofindependentdatapointsmustbeevaluated.Inpreviousresearch,AutoMLusinghigh-dimensionaldataofvaryingtypeshasbeendemonstrablyrobust,outperformingtraditionalapproaches.However,considerationsforapplicationinclinicalmetabolicprofilingremaintobeevaluated.Particularly,regardingtherobustnessofAutoMLtoidentifyandadjustforcommonclinicalconfounders.Inthisstudy,wepresentafocusedcasestudyregardingAutoMLconsiderationsforusingtheTree-BasedOptimizationTool(TPOT)inmetabolicprofilingofexposuretometformininabiobankcohort.First,weproposeatandemrank-accuracymeasuretoguideagnosticfeatureselectionandcorrespondingthresholddeterminationinclinicalmetabolicprofilingendeavors.Second,whileAutoML,usingdefaultparameters,demonstratedpotentialtolacksensitivitytolow-effectconfoundingclinicalcovariates,wedemonstratedresidualtrainingandadjustmentofmetabolitefeaturesasaneasilyapplicableapproachtoensureAutoMLadjustmentforpotentialconfoundingcharacteristics.Finally,wepresentincreasedhomocysteinewithlong-termexposuretometforminasapotentiallynovel,non-replicatedmetaboliteassociationsuggestedbyTPOT;anassociationnotidentifiedinparallelclinicalmetabolicprofilingendeavors.Whilewarrantingindependentreplication,ourtandemrank-accuracymeasuresuggestshomocysteinetobethemetabolitefeaturewithlargesteffect,andcorrespondingpriorityforfurthertranslationalclinicalresearch.ResidualtrainingandadjustmentforapotentialconfoundingeffectbyBMIonlyslightlymodifiedthesuggestedassociation.IncreasedhomocysteineisthoughttobeassociatedwithvitaminB12deficiency–evaluationforpotentialclinicalrelevanceissuggested.Whileconsiderationsforclinicalmetabolicprofilingarerecommended,includingadjustmentapproachesforclinicalconfounders,AutoMLpresentsanexcitingtooltoenhanceclinicalmetabolicprofilingandadvancetranslationalresearchendeavors.
119
PHARMGKB:NEWWEBSITERELEASE2017
MichelleWhirl-Carrillo1,RyanM.Whaley1,MarkWoon1,KatrinSangkuhi1,LiGong1,JuliaBarbarino1,CarolineThorn1,RachelHuddart1,MariaAlvarellos1,JillRobinson1,RussB.
Altman2,TeriE.Klein3
1DepartmentofBiomedicalDataScience,StanfordUniversity;2DepartmentofBioengineering,MedicineandGenetics,StanfordUniversity;3DepartmentofBiomedical
DataScienceandMedicine,StanfordUniversityAlena,OrlenkoWithPharmGKBisthelargestpubliclyavailableresourceforpharmacogenomics(PGx)discoveryandimplementation.Itsmissionistocollect,curate,integrateanddisseminateknowledgeabouthowhumangeneticvariationinfluencesdrugresponse.ThePharmGKBwebsiteallowsuserstoselectandviewinformationviasearch,filterandbrowseoptions.DataisalsoavailablebydirectdownloadthroughthewebsiteandthroughthePharmGKBAPI.PharmGKBlaunchedanewandimproveduserinterfaceinSeptember2017.Thenewwebsiteoffersbenefitssuchasadisplaythatworksonmobileandsmallscreendevices,improvedsearchingandfilteringcapabilities,andfasterpageloadspeeds.WhilethelookofPharmGKBhaschanged,allthecontentthatwasavailablepreviouslyisstillavailable,including:
• 5500annotatedgeneticvariants• 14,000curatedpeer-reviewedPGxarticles• 125evidence-basedpharmacokineticandpharmacodynamicspathways• 60reviewsofkeyPGxgenes(veryimportantpharmacogenes)• 450curateddruglabels• 90gene-drugpairswithcuratedgenotype-baseddrugdosingguidelines
Thewebsitefeaturesanonlinetutorialthatuserscanaccessbyfollowingthescreenprompts.Formoreinformation,pleasevisitPharmGKBathttp://www.pharmgkb.org.
120
READINGBETWEENTHEGENES:COMPUTATIONALMODELSTODISCOVERFUNCTIONAND/ORCLINICALUTILITYFROM
NONCODINGDNA
POSTERPRESENTATIONS
121
NETWORKANALYSISOFPSEUDOGENE-GENERELATIONSHIPS:FROMPSEUDOGENEEVOLUTIONTOTHEIRFUNCTIONALPOTENTIALS
TravisS.Johnson1,SihongLi1,JohnathanR.Kho2,KunHuang3,YanZhang1
1OhioStateUniversity,2GeorgiaInstituteofTechnology,3IndianaUniversityTravis,JohnsonPseudogenesarefossilrelativesofgenes.Pseudogeneshavelongbeenthoughtofas“junkDNAs”,sincetheydonotcodeproteinsinnormaltissues.Althoughmostofthehumanpseudogenesdonothavenoticeablefunctions,~20%ofthemexhibittranscriptionalactivity.TherehasbeenevidenceshowingthatsomepseudogenesadoptedfunctionsaslncRNAsandworkasregulatorsofgeneexpression.Furthermore,pseudogenescanevenbe“reactivated”insomeconditions,suchascancerinitiation.Somepseudogenesaretranscribedinspecificcancertypes,andsomeareeventranslatedintoproteinsasobservedinseveralcancercelllines.Alltheabovehaveshownthatpseudogenescouldhavefunctionalrolesorpotentialsinthegenome.Evaluatingtherelationshipsbetweenpseudogenesandtheirgenecounterpartscouldhelpusrevealtheevolutionarypathofpseudogenesandassociatepseudogeneswithfunctionalpotentials.Italsoprovidesaninsightintotheregulatorynetworksinvolvingpseudogeneswithtranscriptionalandeventranslationalactivities.Inthisstudy,wedevelopanovelapproachintegratinggraphanalysis,sequencealignmentandfunctionalanalysistoevaluatepseudogene-generelationships,andapplyittohumangenehomologsandpseudogenes.Wegeneratedacomprehensivesetof445pseudogene-gene(PGG)familiesfromtheoriginal3,281genefamilies(13.56%).Ofthese438(98.4%PGG,13.3%total)werenon-trivial(containingmorethanonepseudogene).EachPGGfamilycontainsmultiplegenesandpseudogeneswithhighsequencesimilarity.Foreachfamily,wegenerateasequencealignmentnetworkandphylogenetictreesrecapitulatingtheevolutionarypaths.Wefindevidencesupportingtheevolutionhistoryofolfactoryfamily(bothgenesandpseudogenes)inhuman,whichalsosupportsthevalidityofouranalysismethod.Next,weevaluatethesenetworksinrespecttothegeneontologyfromwhichweidentifyfunctionsenrichedinthesepseudogene-genefamiliesandinferfunctionalimpactofpseudogenesinvolvedinthenetworks.ThisdemonstratestheapplicationofourPGGnetworkdatabaseinthestudyofpseudogenefunctionindiseasecontext.
122
RANDOMWALKSONMUTUALMICRORNA-TARGETGENEINTERACTIONNETWORKIMPROVETHEPREDICTIONOFDISEASE-ASSOCIATEDMICRORNAS
Duc-HauLe1,LievenVerbeke2,LeHoangSon3,Dinh-ToiChu4,Van-HuyPham5
1VinmecResearchInstituteofStemCellandGeneTechnology,458MinhKhai,HaiBaTrung,Hanoi,Vietnam;2DepartmentofInformationTechnology,GhentUniversity-imec,Ghent,
Belgium;3VNUUniversityofScience,VietnamNationalUniversity,Hanoi,Vietnam;4FacultyofBiology,HanoiNationalUniversityofEducation,Hanoi,Vietnam;5FacultyofInformation
Technology,TonDucThangUniversity,HoChiMinhCity,VietnamDuc-Hau,LeBackgroundMicroRNAs(miRNAs)havebeenshowntoplayanimportantroleinpathologicalinitiation,progressionandmaintenance.Becauseidentificationinthelaboratoryofdisease-relatedmiRNAsisnotstraightforward,numerousnetwork-basedmethodshavebeendevelopedtopredictnovelmiRNAsinsilico.Homogeneousnetworks(inwhicheverynodeisamiRNA)basedonthetargetssharedbetweenmiRNAshavebeenwidelyusedtopredicttheirroleindiseasephenotypes.Althoughsuchhomogeneousnetworkscanpredictpotentialdisease-associatedmiRNAs,theydonotconsidertherolesofthetargetgenesofthemiRNAs.Here,weintroduceanovelmethodbasedonaheterogeneousnetworkthatnotonlyconsidersmiRNAsbutalsothecorrespondingtargetgenesinthenetworkmodel.ResultsInsteadofconstructinghomogeneousmiRNAnetworks,webuiltheterogeneousmiRNAnetworksconsistingofbothmiRNAsandtheirtargetgenes,usingdatabasesofknownmiRNA-targetgeneinteractions.Inaddition,asrecentstudiesdemonstratedreciprocalregulatoryrelationsbetweenmiRNAsandtheirtargetgenes,weconsideredtheseheterogeneousmiRNAnetworkstobeundirected,assumingmutualmiRNA-targetinteractions.Next,weintroducedanovelmethod(RWRMTN)operatingonthesemutualheterogeneousmiRNAnetworkstorankcandidatedisease-relatedmiRNAsusingarandomwalkwithrestart(RWR)basedalgorithm.Usingbothknowndisease-associatedmiRNAsandtheirtargetgenesasseednodes,themethodcanidentifyadditionalmiRNAsinvolvedinthediseasephenotype.ExperimentsindicatedthatRWRMTNoutperformedtwoexistingstate-of-the-artmethods:RWRMDA,anetwork-basedmethodthatalsousesaRWRonhomogeneous(ratherthanheterogeneous)miRNAnetworks,andRLSMDA,amachinelearning-basedmethod.Interestingly,wecouldrelatethisperformancegaintotheemergenceof“diseasemodules”intheheterogeneousmiRNAnetworksusedasinputforthealgorithm.Moreover,wecoulddemonstratethatRWRMTNisstable,performingwellwhenusingbothexperimentallyvalidatedandpredictedmiRNA-targetgeneinteractiondatafornetworkconstruction.Finally,usingRWRMTN,weidentified76novelmiRNAsassociatedwith23diseasephenotypeswhichwerepresentinarecentdatabaseofknowndisease-miRNAassociations.ConclusionsSummarizing,usingrandomwalksonmutualmiRNA-targetnetworksimprovesthepredictionofnoveldisease-associatedmiRNAsbecauseoftheexistenceof“diseasemodules”inthesenetworks.
123
TEXTMININGANDVISUALIZATIONFORPRECISIONMEDICINE
POSTERPRESENTATIONS
124
MININGELECTRONICHEALTHRECORDSFORPATIENT-CENTEREDOUTCOMESTOGUIDETREATMENTPATHWAYDECISIONSFOLLOWINGPROSTATECANCER
DIAGNOSIS
SelenBozkurt1,JungInPark2,DanielL.Rubin3,JamesD.Brooks4,TinaHernandez-Boussard5
1AkdenizUniversityFacultyofMedicineDepartmentofBiostatisticsandMedicalInformaticsAntalya,Turkey;2StanfordUniversityDepartmentofMedicine(BiomedicalInformatics);
3StanfordUniversityDepartmentofRadiology;4StanfordUniversityDepartmentofUrology;5StanfordUniversityDepartmentofMedicine(BiomedicalInformatics)
Tina,Hernandez-BoussardElectronichealthrecords(EHRs)havepotentialfornoveldiscoveryofpatient-centeredoutcomesthatcanbeusedtoimprovehealthcaredelivery.However,asignificantamountofdatastoredinEHRsishiddeninclinicalnarrativesasunstructuredtext.Forprostatecancerpatients,theseclinicnarrativescontainalargeamountofinformation.PreviousworksuggeststhatstructureddataregardingdysfunctionsaftertreatmentforprostatecancerarenotconsistentlycapturedintheEHRandthuscannotbereliablyextractedforclinicalandresearchpurposes.Therefore,inthispreliminarystudyweproposearule-basednaturallanguageprocessingpipelinetoextractpatient-centeredoutcomesrelatedtothepresenceofurinary,bowelanderectiledysfunctionfollowingtreatmentofprostatecancerfromthefreetextoftheEHRnotes.Wedevelopedalexiconoftermsrelatedtourinary,bowelorerectiledysfunctionsbasedondomainknowledge,priorexperienceinthefield,andreviewofmedicalnotes.Areferencestandardof100randomlyselecteddocumentsforeachoutcomefrominpatientadmissionswasannotatedbyaresearchnursetoidentifyallrelatedconceptsas:present,negated,historical,anddiscussedrisk.Wedevelopedarule-basednaturallanguageprocessing(NLP)pipelinewhichusesdictionarymappingcombinedwithConTextalgorithm.WetrainedourNLPpipelineusing1,336documentsandtestedon20documentstodetermineagreementwiththehumanreferencestandardandstandardprecision,recallandoverallaccuracyrateswereusedasmetricstoquantifytheautomaticannotationperformance.Theprecision,recall,andaccuracyscoresfortheurinaryincontinenceannotationsagainstthereferencestandardoutputcreatedbyadomainexpertwas62.5%,100%and76.9%,respectively.Formostofthemisclassifiedcases,whichannotatedaspresenceofurinaryincontinencebytheNLPalgorithmbutnotbytheexpert,itisseenthatmedicationinformationincludedinthetermdictionarycausedambiguityregardingphenotypeclassification.Fortheerectiledysfunctionannotations,precisionwas100%,recallwas75%andoverallaccuracywas90%.Ontheotherhand,sinceanyboweldysfunctionwasreportedintherandomlyselectedtestset,evaluationmetricswerenotcalculated.Inthispreliminarystudy,wehaveshownthatitispossibletoidentifythepatient-centeredoutcomesfromthefreetextofEHRsusingnaturallanguageprocessing.UsingEHRstoassesspatient-centeredoutcomespromotespopulation-basedassessmentsofthesevaluedyetdifficulttoassessoutcomesandwillenabledetailedsensitivityandsubgroupanalysis.Suchresultswillallowclinicianstoindividualizecarefortheirpatients.Theresultswillalsoprovidedesperatelyneededevidence-basedcriteriaforpatient-centeredoutcomes.Thesecriteriacanbeusedinresearchstudies,inclinicalpractice,andtodeveloppracticeguidelines.Futureworkwillcreatelargernumberofwell-annotateddatasetsandcombineourrule-basedapproachwithmachinelearningtechniques.
125
GDMINER:ABIOTEXTMININGSYSTEMFORGENE-DISEASERELATIONANALYSIS
SooJunPark1,JihyunKim2,SooYoungCho2,CharnyPark2,YoungSeekLee3
1ElectronicsandTelecommunicationsResearchInstitute,2NationalCancerCenter,3HanyangUniversity
SooJun,ParkResearchersofBiologyandMedicineoftenvisitPubMedtofindliteraturesfortheirstudies.WhilethekeywordsearchinPubMedmaybeapopulartooltoretrieveinformation,itislimitingasitonlyprovidesasmallnumberofresults.Thekeywordsearchdoesnotallowtheusertosiftthroughdecadesworthofresearchandextractallcorrespondingstudiesasneeded.ThisposterpresentationwillprovidesolutionsthroughabiotextminingsystemcalledGDMinerthatidentifiesbiologicalentities,extractstherelationshipfromthoseentities,anddiscoversassociationsbetweengenesanddiseases.WhenGDMinercollectsabstractsfromPubMed(PubMedcollector),anautomaticnamingentitysortstheinformationinto40biologicalcategories(EntityRecognizer).GDMinerthenextractsrelationsfromthebiomedicalcategories(RelationExtractor)byusingnaturallanguageprocessingtechniques,likePart-of-Speech(POS)taggingandsyntacticparsing.Thedisplayfeaturesgraphsandtablesshowingtheextractedrelations.Forexample,agene-diseaseassociationdataquerycanbeminedbyanalyzingtherelationsbetweengenesanddiseases.Thesystemconsistsofthefollowingthreeparts:PubMedcollector,relationextractorandrelationanalyzer.ThePubMedcollectorasksabstractswithaquerygivenbyauserandfetchesthem.Therelationextractordividesabstractsintosentencesandrecognizesbiomedicalnamedentitiesinsentences.Then,therelationanalyzerextractsrelationaleventsamongrecognizedentities.Relationsareextractedbysyntacticanalysisnotbyco-occurrenceinformation.OursystemparsessentencessyntacticallyinformsofthePennTreebanksyntactictagsandextractrelationsbyanalyzingparsingresults.OurrulesaresimpleandsmallbecausethesyntactictagsethavefewernumberoftagsthanthePOStagset,butnotlimitedtorelationtypes.Therelationvieweraccumulatesextractedrelationsandvisualizesingraphsandtables.Ifthenumberofnodesinthegeneratedrelationshipnetworkissmall,itiseasyfortheusertoeasilyfindtherelationshipbetweendesiredbioobjects(namedentities).However,ifthesizeofthegeneratednetworkisverylarge,itisverydifficulttofindtherelations.Oursystemhelpusertofindtherelationbetweenthedesiredbioobjectsbycreatingasmallsizesub-networkusingthesearchandfilteringfunction.Thereisarapidlygrowinginterestinproperlyutilizingbiomedicineliteraturewithintheresearchcommunityandtherateinwhichthebiomedicineliteratureisaccumulatingisacceleratingworldwide.Theimportanceofnotonlypreservingdata,butalsothewayinwhichresearchersextractinformationisnecessaryinaidingfuturebiologicalstudiesanddiscoveries.Implementinganautomatedsystemisnecessaryinkeepingupwiththegrowthandprovidingaccuracyinfindinganalogousinformationtoaresearcher’ssearch.
126
WORKSHOP
MACHINELEARNINGANDDEEPANALYTICSFORBIOCOMPUTING:CALLFORBETTEREXPLAINABILITY
POSTERPRESENTATIONS
127
METHODSFOREXAMININGDATAQUALITYINHEALTHCAREINTEGRATEDDATAREPOSITORIES
VojtechHuser1,MichaelG.Kahn2,JeffreyS.Brown3,RamkiranGouripeddi4
1NationalLibraryofMedicine,NationalInstittutesofHealth8600RockvillePk,Bld38aBethesda,MD,20852,USAEmail:[email protected];2DepartmentofPediatrics,UniversityofColorado13001East17thPlaceMS-F563Aurora,CO80045USAEmail:
[email protected];3DepartmentofPpopulationMedicine,HarvardMedicalSchoolandHarvardPilgrimHealthCareInstitute401ParkDrive,Suite401EastBoston,MA02215USAEmail:[email protected][email protected];4UniversityofUtah,SchoolofMedicineSaltLakeCity,84102,Utah,USAEmail:[email protected]
Vojtech,HuserThispapersummarizescontentoftheworkshopfocusedondataquality.Thefirstspeaker(VH)describeddataqualityinfrastructureanddataqualityevaluationmethodscurrentlyinplacewithintheObservationalDataScienceandInformatics(OHDSI)consortium.ThespeakerdescribedindetailadataqualitytoolcalledAchillesHeelandlatestdevelopmentforextendingthistool.InterimresultsofanongoingDataQualitystudywithintheOHDSIconsortiumwerealsopresented.Thesecondspeaker(MK)describedlessonslearnedandnewdataqualitychecksdevelopedbythePEDsNetpediatricresearchnetwork.Thelasttwospeakers(JB,RG)describedtoolsdevelopedbytheSentinelInitiativeandUniversityofUtah’sserviceorientedframework.Theworkshopdiscussedattheendandthroughouthowdataqualityassessmentcanbeadvancedbycombiningthebestfeaturesofeachnetwork.
128
MULTI-CLASSCLASSIFICATIONSTRATEGYFORSUPPORTVECTORMACHINESUSINGWEIGHTEDVOTINGANDVOTINGDROP
SunghoKim,TaehunKim
YeungnamUniversity,DGISTSungho,KimAnovelmulti-classstrategyforSupportVectorMachines(SVMs)wasdevelopedtoperformmulti-classclassification,suchasOneVersusOne,OneVersusAllandDynamicAcyclicGraph.Thesestrategiesdonotreflectthedistancebetweenthehyper-planethatseparatestwoclassesandinputdata.Thisisnotreasonablewhentheinputdataisplacednearthehyper-plane.TheproposedweightedvotingresolvesthisproblembyweightingthevotingvaluesaccordingtothedistancefromtheboundaryandtheenhancedperformanceoftheSVMswiththeproposedvotingdrop.TheproposedWeightedVotingisbasedonthevotingmethod.Thevotingmethodiscarriedoutbyaccumulatingvotes,thenchoosingthemostvotedclass.TheproposedWeightedVotingmethodisaweightingofthevotingvaluebyreflectingthedistancefromtheboundaryandmargin.SecondproposedVotingDropmethodisabouthowtoaccumulatevotes.ThenovelvotingmethodaccumulateseveryvotebutthismannercanbeaproblembecausethereareredundantlyrespondingSVMs.BecausetheSVMisabinaryclassifier,eachSVMlearnsonlyabouttwoclasses.Therefore,aSVMdoesnothavediscernmentforthenon-learnedclasses.ThisiswhywhenaSVMpredictsdatabelongingtoanon-learnedclass,theSVMrespondsredundantly.ThisirrelevantSVMcausesanincorrectvotethatmakesthedecisionconfused.Toresolvethisproblem,theVotingDropmethoddropstheredundantvotesbyremovingtheirrelevantSVM.ThisalgorithmfindstheirrelevantSVM,thendroppingthevotescausedbytheirrelevantSVM.ThewaytofindanirrelevantSVMistofindaleastvotedclassbecausealeastvotedclasscanbethoughtofasanirrelevantclasstoinputdata.Asshownintheexperiments,evenlyreflectingthedistancefromthehyper-planeandthediscernmentofthehyper-planeandremovingtheredundantSVM`svotingleadstohigherperformance.Theproposedmethodscanbeusedforarangeofclassificationtasks.
129
ATOPOLOGY-BASEDAPPROACHTOQUANTIFYNETWORKPERTURBATIONSCORESFORASSESSMENTOFDIFFERENTTOBACCOPRODUCTCLASSES
QuynhT.Tran1,LeeLarcombe2,SubhashiniArimilli3,G.L.Prasad1
1ReynoldsAmericanInc.ServicesCompany-WinstonSalemNC-USA27105;2AppliedExomicsLtd-StevenageUKSG12FX;3WakeForestBaptistHealth-WinstonSalemNCUSA27104
Quynh,TranBackground:Chroniccigarettesmokingisknowntocauseimmunesuppresion,whichinturncontributestoincreasedsusceptibilitytocancer.However,thereislimitedinformationontheeffectsofnon-combustibletobaccoproducts,suchasmoistsnuff.Tobetterunderstandthemolecularchangesthatresultfromconsumptionofdifferenttobaccoproducts,globalprofilingtechniqueshavebeenextensivelyutilized.Alimitationofsuchapproachesisthatdifferentialgeneexpressionalonemaybeinsufficienttoidentifyboththesourceofperturbationandtheextenttowhichperturbationspropagatethroughanetworkofinteractinggenes.Systemsbiologytoolssupporttheanalysesandintegrationofcomplexdatasets,andprovideaholisticviewoftheunderlyingbiologicalchanges.Hence,weimplementedanetwork-basedanalysistooltoelucidatemolecularchangesthatarisefromtheuseofdifferenttobaccoproducts.Methods:Wedevelopedananalyticalapproachtoquantifyandvisualizegene-levelperturbationscoresofapre-identifiednetwork.Thisapproachdifferentiatesbiologicaleffectsofmultipletreatments,usinggenome-scaleexpressiondataandconsideringinteractome-wideeffects.Weutilizedamicroarraygeneexpressiondatasetofperipheralbloodmononuclearcellstreatedwithaqueousextractsofwholesmokeconditionedmedium(WS-CM)andsmokelesstobaccoextract(STE)preparedfrom3R4Fcigarettesand2S3moistsnuffreferencetobaccoproducts,respectively,atbaselineandafterstimulationwithtoll-likereceptor(TLR)agonists.Theanalyticalpipelinetakesnormalizedgeneexpressionvaluesandperformsthefollowingsteps:1)generatesgene-levelnetworkscoresusingaweightedtopologyapproachconsideringboththegeneexpressiondataandthefullhumaninteractomeinformationavailableinIntAct(aliteraturecuratedmolecularinteractiondatabase);2)derivesgene-levelperturbationscoresforeachtreatmentconditioncomparedtoitsbaseline;and3)calculatesasingleimpactscoreforeachexposureconditionandcreatesanetworkgraphtobevisualizedusingCytoScape.Results:Thepipelinewasappliedtocalculateimpactscoresundereachstimulationandeachtreatmentconditionforaninflammatoryresponsenetwork,signalingthroughatriggeredreceptorexpressedonmyeloidcells1(TREM1).SamplesstimulatedwithTLRagonistshadhigherscoresormoreperturbationcomparedtonon-stimulatedsamples.ThoseexposedtohigherWS-CMdosesreceivedhigherscorescomparedtolowerdosesofWS-CM.SamplesexposedtoSTEreceivedalowerscoresuggestingSTEtreatmentperturbedTREM1networktoalowerdegreethanWS-CM.Ontheotherhand,theclassicaldifferentialgeneexpressionanalysisdidnotidentifysignificantchangesingeneexpressionforSTEtreatedsamplesstimulatedwithTLRagonists,comparedtountreatedcells.Conclusions:Insummary,thisnetworkscoringmethodologysuggeststhat,undertheseconditions,STEexertslessperturbationonselectimmunenetworkscomparedtocombustibletobaccoproducts.Thesescorespotentiallyserveastoolstodifferentiatethebiologicaleffectsresultingfromdifferenttobaccoclasses.
130
AUTHORINDEX
A
Abe,Sumiko·51Abyzov,Alexej·79Acharya,Ambika·5Achour,Ikbel·33Adhikari,AashishN.·113Adhikari,BhimM.·53Agrawal,Monica·8,70Aldana,Julian·76Al-Ghalith,Gabriel·75Alkan,Can·77,90Allette,Kimaada·2Alser,Mohammed·77,90Altman,RussB.·5,47,62,119Alvarellos,Maria·119Ambite,JoséLuis·51Amos,ChristopherI.·115Anderson,PaulE.·98Arimilli,Subhashini·129
B
Bada,Michael·45Bae,MinGyun·78,96,100Bae,Taejeong·79Baheti,Saurabh·116Baladandayuthapani,Veerabhadran·48Barbarino,Julia·119Bartonicek,Nenad·103BaumgartnerJr.,WilliamA.·37,45Beam,AndrewL.·83Beaulieu-Jones,BrettK.·9Bechheim,Matthias·92Behsaz,Bahar·80Berghout,Joanne·25Bharath,Karthik·48Bhattrai,Avnish·51Biegel,J.·114Bilke,Sven·88Blach,Colette·18Blangero,John·53Bobacz,Klaus·95Boguslav,Mayla·37Bowtell,DavidD.L.·103Bozkurt,Selen·124Bradford,Yuki·58Breitenstein,MatthewK.·28,118Brenner,StevenE.·113Bright,RoselieA.·5Brooks,JamesD.·124Brown,JeffreyS.·127Buccigrossi,Robert·110
Buchan,Z.R.·72Buckley,J.·114Bull,ShelleyB.·104Burns,Gully·51Bush,WilliamS.·57,81,102Butkiewicz,Mariusz·81
C
Cairns,Junmei·28,118Cali,DamlaS.·90Callahan,TiffanyJ.·45Capra,JohnA.·57,102Caraballo,PedroJ.·28,118CarsonIII,WilliamE.·43Cha,Hongui·65Chance,MarkR.·97Chen,Bin·82Chen,Flavia·113Chen,MichaelL.·83Chen,Rong·10,42Chen,Xiao·88Chen,Xintong·2Chen,Youdinghuan·71Chen,Yuying·82Cheng,Chao·71Cheng,Jun·108Cheng,Liang·108Chesi,Alessandra·35Cheung,Philip·67Chi,Chih-Lin·26Chidester,Benjamin·20Ching,Travers·60Cho,SooYoung·125Choe,EunKyung·84Choi,JungKyoon·78,96,100Christensen,BrockC.·71Chu,Dinh-Toi·122Chuang,Han-Yu·88Clark,NeilR.·3,64Cohen,K.Bretonnel·37Cooper,BruceA.·93Cox,RobertW.·53Crawford,DanaC.·57Crowley,Albert·110Currier,RobertJ.·113
D
deBelle,J.Steven·67De,Subhajyoti·109Deng,Siyuan·43Dinger,MarcelE.·103Do,MinhN.·20
131
Doherty,JenniferA.·11Dong,Zhuxin·94Dorrestein,PieterC.·80Duan,Qiaonan·3,64Dudley,JoelT.·3,10,12,42,64
E
Eckel-Passow,JeanetteE.·116Ergin,Oğuz·77,90Erikson,GalinaA.·85
F
Farhat,Maha·83Feng,Qianjin·108Fenger,Douglas·67Fieremans,Els·53Fierro,Lily·51Fish,AlexandraE.·57Fisher,MarkF.·80Flotte,T.J.·72Flotte,W.·72Foster,Ian·33
G
Gai,X.·114Gallagher,RenataC.·113Garmire,LanaX.·60Geiersbach,K.B.·72Gerstein,Mark·86Ghose,Saugata·90Glahn,DavidC.·53Glicksberg,BenjaminS.·10,12,42,82Gong,Li·119Gordon,Jonathan·51Gouripeddi,Ramkiran·127Grant,Gregory·99Grant,StruanF.A.·35Greene,CaseyS.·6,11,68Greenside,Peyton·41Griffith,Malachi·16Griffith,ObiL.·16Gui,Jiang·115Guo,Caiwei·46Gupta,Anika·27Gurevich,Alexey·80Gursoy,Gamze·86
H
Haas,DavidW.·58Haines,JonathanL.·81Hall,MollyA.·35
Han,Jiali·33Han,Jiawei·39Han,Lichy·54Han,Zhi·108Harrington,LiaX.·11Hart,StevenN.·72,116Hartman,Nicholas·31Haselgrove,Christian·110Hassan,Hasan·77,90He,Lu·26He,Mingze·73Hernandez-Boussard,Tina·124Hiemenz,M.·114Hillenmeyer,Maureen·41Hobbs,BrianP.·48Hodos,Rachel·3,12,64Hong,L.Elliot·53Hornung,Veit·92Horton,IainF.·116Houten,Sander·2Hu,Jianying·3,64Hu,Xiao·93Huang,Chenglong·21Huang,EdwardW.·38,39Huang,Heng·22,107Huang,Kun·34,55,108,111,121Huang,Ling·85Huddart,Rachel·119Hudson,TiaTate·87Huff,ChadD.·105Hunter,LawrenceE.·37,45Huo,Zhouyuan·22Huser,Vojtech·127Hwang,Su-Kyeong·117
I
Ideker,Trey·39
J
Jahanshad,Neda·53Jain,Priyambada·51Jenkins,NicoleP.·71Jeong,Hyun-Hwan·46Ji,Xuemei·115Johnson,Abigail·75Johnson,KippW.·10,12Johnson,TravisS.·34,121Ju,JinHyun·88Jung,Jae-Yoon·27
K
Kaang,Bong-Kiun·117Kahn,MichaelG.·127Kamdar,Jeana·51
132
Kamdar,MaulikR.·54Kang,JoonHo·65Kawakubo,Hideko·89,101Kennedy,David·110Kennedy,Eamonn·94Kettenbach,ArminjaN.·71Kho,JohnathanR.·34,121Kidd,Brian·3,64Kim,Dokyoon·23Kim,EunJi·99Kim,JeremieS.·90Kim,Jihyun·125Kim,Jinho·65Kim,Suh-Ryung·104Kim,SunAh·104Kim,Sungho·91,128Kim,Taehun·91,128Kim-Hellmuth,Sarah·92Klein,TeriE.·119Knights,Dan·75Kober,KordM.·93Kochunov,Peter·53Koenig,BarbaraA.·113Kohane,IsaacS.·83Kolmogorov,Mikhail·94Krunic,Milica·95Kulkarni,Anagha·62Kulkarni,Chaitanya·55,111Kulkarni,Shashikant·16Kundaje,Anshul·41Kvale,Mark·113Kwok,Pui·113
L
LaCava,William·13Lahens,Nicholas·99Lake,Bethany·31Lappalainen,Tuuli·92Larcombe,Lee·129Larson,NicholasB.·116Lawrence-Dill,CarolynJ.·73Le,Duc-Hau·122Lee,Boram·65Lee,Donghyuk·90Lee,Hao-Chih·3,64Lee,Jae-Hyung·117Lee,Jin-A·117Lee,JunHyeong·78,96,100Lee,Kyungmin·117Lee,SangWoo·84Lee,Seunggeun·23Lee,Taeyeop·78,96,100Lee,Yong-Seok·117Lee,YoungSeek·125Lei,Xiaoxiao·51Lerman,Kristina·51Leskovec,Jure·8,70Li,Binglan·58
Li,Fuhai·43Li,Haiquan·33Li,Jianrong·25,33Li,Justin·66Li,Li·10,42Li,Qike·25,30Li,Sihong·34,121Lim,Chae-Seok·117Liu,Gang·66Liu,Ke·82Liu,Zhandong·46Losic,Bojan·2Luo,Yunan·4Lussier,YvesA.·25,30,33
M
Ma,Jian·20Ma,Jianzhu·39Ma’ayan,Avi·3,64Machiraju,Raghu·55,111Madhavan,Subha·16Maglinte,D.·114Mallick,Parag·55,111Mallory,EmilyK.·5,62Manduchi,Elisabetta·35Mariani,Jessica·79Marotti,JonathanD.·71Matsui,Yusuke·89,101Maxwell,Sean·97McCoy,Matthew·16McDonnell,ShannonK.·116McGarvey,Peter·16Metaxas,Dimitris·109Miaskowski,Christine·93Micheel,Christine·16Miller,JasonE.·23Miller,ToddW.·71Miotto,Riccardo·10Mishkanian,Ben·88Mohammadi,Pejman·92Mohimani,Hosein·80Molina,MonicaCala·76Mooney,SeanD.·113Moore,AbigailE.·98Moore,JasonH.·9,13,17,28,35,118Morishita,Hirofumi·42Mounajjed,T.·72Müller-Myhsok,Bertram·92Mustahsan,Zairah·13Mutlu,Onur·77,90Mylne,JoshuaS.·80
N
Nam,Hyunha·101Nayak,Soumyashant·99Ng,ChaanS.·48
133
Nho,Kwangsik·23,107Nichols,ThomasE.·53Norgan,A.P.·72Novikov,DmitryS.·53Nussbaum,RobertL.·113
O
O’Driscoll,Caroline·51Oh,Jaeho·78,96,100Olson,RandalS.·13,17,28,118Orlenko,Alena·28,118Orzechowski,Patryk·9,28,118Ostrow,D.·114
P
Park,Charny·125Park,JungIn·124Park,SooJun·125Park,Woong-Yang·65Paskov,KelleyM.·27Paul,StevenM.·93Paulson,Abby·110Payne,PhilipR.O.·43Peng,Jian·4,39Pesce,Lorenzo·33Petkovic,Dragutin·47,62Pevzner,PavelA.·80,94Pham,Van-Huy·122Poole,Sarah·29Pouladi,Nima·25Prasad,G.L.·66,129Preuss,Nina·110Previde,Paul·62Prjibelski,Andrey·80Puck,JenniferM.·113Pütz,Benno·92Pyc,MaryA.·67
Q
Qu,Hui·109
R
RachidZaim,Samir·30Rao,Shruti·16Ravvaz,Kourosh·26Regan,Kelly·43Rensi,StefanoE.·5Reynolds,RichardC.·53Risacher,ShannonL.·23,107Ritchie,MarylynD.·14,58Ritter,Deborah·16
Robinson,Jill·119Roy,Angshumoy·16Rubin,DanielL.·124Ryutov,A.·114
S
Salas,LucasA.·71Sangkuhi,Katrin·119Sarkar,IndraNeil·50Saykin,AndrewJ.·23,107Schissler,A.Grant·30Schmitt,Peter·17Schumacher,Johannes·92Sebra,RobertP.·2Shah,K.K.·72Shah,Nigam·29Shameer,Khader·10,12Sharma,Vivekanand·50Shearer,Gregory·31Sheih,Joseph·113Shen,Dinggang·22Shen,Li·107,110Shestov,Maksim·17Shimamura,Teppei·89,101Shin,Hyun-Tae·65Shivakumar,ManuK.·23Shoemaker,Katherine·48Shokhirev,Maxim·85Shukla,Dinesh·53Shulman,Joshua,M.·46Sinha,Aakanchha·51Sivley,R.Michael·102Smarr,Larry·80Smith,MiloR.·42Snedecor,June·88Son,LeHoang·122Sonkin,Dmitriy·16Sontag,David·3,64Srinivasan,Raj·113Srivastava,Arunima·55,111Stefanski,AdrianneL.·45Stewart,Crystal·51Stockham,NateT.·27Stolovitzky,Gustavo·2Sun,MinWoo·27Sunderam,Uma·113
T
Tenenbaum,JessicaD.·18Thomas,Brook·62Thompson,PaulM.·53Thorn,Caroline·119Timp,Gregory·94Tintle,Nathan·31Tomasini,Livia·79Tonellato,PeterJ.·26
134
Torpy,JamesR.·103Tran,QuynhT.·129Travers,Matthew·110Triche,T.·114Tripodi,Ignacio·45Tully,Tim·67Turnbaugh,PeterJ.·5
U
Urban,AlexanderE.·79
V
Vaccarino,FloraM.·79VanHorn,JohnDarrell·51Vangay,Pajau·75Varik,Akshay·13Veraart,Jelle·53Verbeke,Lieven·122Verma,Anurag·58Verma,ShefaliS.·58Veturi,YogasudhaC.·14,58Vigil,Arthur·47vonHaeseler,Arndt·95
W
Wall,DennisP.·27Wang,Fei·3,64Wang,Liewei·28,118Wang,Sheng·4,38,39Wang,Yaqiong·113Wang,Yue·71Wang,Ze·107Wang,Zichen·3,64Watson,DennisK.·98Watson,PatriciaM.·98Way,GregoryP.·6,11,68Weinshilboum,RichardM.·28,118Weissert,John·26
Westra,Jason·31Whaley,RyanM.·119Whirl-Carrillo,Michelle·119White,ElizabethK.·45Williams-DeVane,ClarLynda·87Wilson,RobertC.·98Wong,Mike·47,62Woon,Mark·119
X
Xiao,Guanghua·21Xiao,Jinfeng·4Xin,Hongyi·77,90Xu,Jielin·43
Y
Yalamanchili,HariKrishna·46Yang,Jung-eun·117Yao,Xiaohui·107Yoo,YunJoo·104Yu,MichaelKu·39Yu,Yao·105Yun,JaeWon·65
Z
Zeng,William·82Zhai,ChengXiang·38Zhang,Albert·21Zhang,Jie·108Zhang,Ping·3,64Zhang,Yan·34,121Zheng,Brandon·98Zheng,Fan·39Zhou,Bo·79Zitnik,Marinka·8,70Zou,Yangyun·113Zuluaga,Martha·76