Upload
others
View
6
Download
0
Embed Size (px)
PACIFICSYMPOSIUMONBIOCOMPUTING2017
ABSTRACTBOOK
PosterPresenters:Posterspaceisassignedbyabstractpagenumber.Pleasefindthepagethatyourabstractisonandputyourposterontheposterboardwiththecorrespondingnumber(e.g.,ifyourabstractison
page50,putyourposteronboard#50).
Proceedingspaperswithoralpresentations#2-39arenotassignedposterspace.
Papersareorganizedfirstbysession,thenthelastnameofthefirstauthor.Presentingauthors’namesareunderlined.
i
TABLEOFCONTENTS
PROCEEDINGSPAPERSWITHORALPRESENTATIONCOMPUTATIONALAPPROACHESTOUNDERSTANDINGTHEEVOLUTIONOFMOLECULARFUNCTION 1IDENTIFICATIONANDANALYSISOFBACTERIALGENOMICMETABOLICSIGNATURES...2NathanBowerman,NathanTintle,MatthewDeJongh,AaronA.Best
WHENSHOULDWENOTTRANSFERFUNCTIONALANNOTATIONBETWEENSEQUENCEPARALOGS?.............................................................................................................................................................3MengfeiCao,LenoreJ.Cowen
PROSNET:INTEGRATINGHOMOLOGYWITHMOLECULARNETWORKSFORPROTEINFUNCTIONPREDICTION...................................................................................................................................4ShengWang,MengQu,JianPeng
ONTHEPOWERANDLIMITSOFSEQUENCESIMILARITYBASEDCLUSTERINGOFPROTEINSINTOFAMILIES...............................................................................................................................5ChristianWiwie,RichardRöttger
IMAGINGGENOMICS 6INTEGRATIVEANALYSISFORLUNGADENOCARCINOMAPREDICTSMORPHOLOGICALFEATURESASSOCIATEDWITHGENETICVARIATIONS.....................................................................7ChaoWang,HaiSu,LinYang,KunHuang
IDENTIFICATIONOFDISCRIMINATIVEIMAGINGPROTEOMICSASSOCIATIONSINALZHEIMER'SDISEASEVIAANOVELSPARSECORRELATIONMODEL......................................8JingwenYan,ShannonL.Risacher,KwangsikNho,AndrewJ.Saykin,LiShen
ENFORCINGCO-EXPRESSIONINMULTIMODALREGRESSIONFRAMEWORK.........................9PascalZille,VinceD.Calhoun,Yu-PingWang
METHODSTOENSURETHEREPRODUCIBILITYOFBIOMEDICALRESEARCH 10EXPLORINGTHEREPRODUCIBILITYOFPROBABILISTICCAUSALMOLECULARNETWORKMODELS.........................................................................................................................................11AriellaCohain,AparnaA.Divaraniya,KuixiZhu,JosephR.Scarpa,AndrewKasarskis,JunZhu,RuiChang,JoelT.Dudley,EricE.Schadt
REPRODUCIBLEDRUGREPURPOSING:WHENSIMILARITYDOESNOTSUFFICE...............12EmreGuney
EMPOWERINGMULTI-COHORTGENEEXPRESSIONANALYSISTOINCREASEREPRODUCIBILITY...........................................................................................................................................13WinstonA.Haynes,FrancescoVallania,CharlesLiu,ErikaBongen,AurelieTomczak,MartaAndres-Terrè,ShaneLofgren,AndrewTam,ColeA.Deisseroth,MatthewD.Li,TimothyE.Sweeney,PurveshKhatri
RABIX:ANOPEN-SOURCEWORKFLOWEXECUTORSUPPORTINGRECOMPUTABILITYANDINTEROPERABILITYOFWORKFLOWDESCRIPTIONS..........................................................14GauravKaushik,SinisaIvkovic,JankoSimonovic,NebojsaTijanic,BrandiDavis-Dusenbery,DenizKural
DATASHARINGANDCLINICALGENETICTESTING:SUCCESSESANDCHALLENGES........15ShanYang,MelissaCline,CanZhang,BenedictPaten,StephenE.Lincoln
ii
PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM? 16LEARNINGATTRIBUTESOFDISEASEPROGRESSIONFROMTRAJECTORIESOFSPARSELABVALUES.........................................................................................................................................................17VibhuAgarwal,NigamH.Shah
COMPUTERAIDEDIMAGESEGMENTATIONANDCLASSIFICATIONFORVIABLEANDNON-VIABLETUMORIDENTIFICATIONINOSTEOSARCOMA......................................................18HarishBabuArunachalam,RashikaMishra,BogdanArmaselu,OvidiuDaescu,MariaMartinez,PatrickLeavey,DineshRakheja,KevinCederberg,AnitaSengupta,MollyNi'Suilleabhain
MISSINGDATAIMPUTATIONINTHEELECTRONICHEALTHRECORDUSINGDEEPLYLEARNEDAUTOENCODERS..........................................................................................................................19BrettK.Beaulieu-Jones,JasonH.Moore,ThePooledResourceOpen-AccessALSClinicalTrialsConsortium
DEVELOPMENTANDPERFORMANCEOFTEXT-MININGALGORITHMSTOEXTRACTSOCIOECONOMICSTATUSFROMDE-IDENTIFIEDELECTRONICHEALTHRECORDS.......20BrittanyM.Hollister,NicoleA.Restrepo,EricFarber-Eger,DanaC.Crawford,MelindaC.Melinda C. Aldrich,AmyNon
DEMODASHBOARD:VISUALIZINGANDUNDERSTANDINGGENOMICSEQUENCESUSINGDEEPNEURALNETWORKS..........................................................................................................................21JackLanchantin,RitambharaSingh,BeilunWang,YanjunQi
PREDICTIVEMODELINGOFHOSPITALREADMISSIONRATESUSINGELECTRONICMEDICALRECORD-WIDEMACHINELEARNING:ACASE-STUDYUSINGMOUNTSINAIHEARTFAILURECOHORT.............................................................................................................................22KhaderShameer,KippW.Johnson,AlexandreYahi,RiccardoMiotto,LiLi,DoranRicks,JebakumarJebakaran,PatriciaKovatch,ParthoP.Sengupta,AnnetineGelijns,AlanMoskovitz,BruceDarrow,DavidL.Reich,AndrewKasarskis,NicholasP.Tatonetti,SeanPinney5,JoelT.Dudley
METHODSFORCLUSTERINGTIMESERIESDATAACQUIREDFROMMOBILEHEALTHAPPS........................................................................................................................................................................23NicoleTignor,PeiWang,NicholasGenes,LindaRogers,StevenG.Hershman,ErickR.Scott,MicolZweig,Yu-FengYvonneChan,EricE.Schadt
ANEWRELEVANCEESTIMATORFORTHECOMPILATIONANDVISUALIZATIONOFDISEASEPATTERNSANDPOTENTIALDRUGTARGETS.................................................................24ModestvonKorff,TobiasFink,ThomasSander
DISCOVERYOFFUNCTIONALANDDISEASEPATHWAYSBYCOMMUNITYDETECTIONINPROTEIN-PROTEININTERACTIONNETWORKS.................................................................................25StephenJ.Wilson,AngelaD.Wilkins,Chih-HsuLin,RhonaldC.Lua,OlivierLichtarge
PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES 26OPENINGTHEDOORTOTHELARGESCALEUSEOFCLINICALLABMEASURESFORASSOCIATIONTESTING:EXPLORINGDIFFERENTMETHODSFORDEFININGPHENOTYPES......................................................................................................................................................27ChristopherR.Bauer,DanielLavage,JohnSnyder,JosephLeader,J.MatthewMahoney,SarahA.Pendergrass
TEMPORALORDEROFDISEASEPAIRSAFFECTSSUBSEQUENTDISEASETRAJECTORIES:THECASEOFDIABETESANDSLEEPAPNEA.......................................................................................28MetteBeck,DavidWestergaard,LeifGroop,SorenBrunak
iii
HUMANKINASESDISPLAYMUTATIONALHOTSPOTSATCOGNATEPOSITIONSWITHINCANCER..................................................................................................................................................................29JonathanGallion,AngelaD.Wilkins,OlivierLichtarge
MUSE:AMULTI-LOCUSSAMPLING-BASEDEPISTASISALGORITHMFORQUANTITATIVEGENETICTRAITPREDICTION......................................................................................................................30DanHe,LaxmiParida
DIFFERENTIALPATHWAYDEPENDENCYDISCOVERYASSOCIATEDWITHDRUGRESPONSEACROSSCANCERCELLLINES..............................................................................................31GilSpeyer,DivyaMahendra,HaiJ.Tran,JeffKiefer,StuartL.Schreiber,PaulA.Clemons,HarshilDhruv,MichaelBerens,SeungchanKim
AMETHYLATION-TO-EXPRESSIONFEATUREMODELFORGENERATINGACCURATEPROGNOSTICRISKSCORESANDIDENTIFYINGDISEASETARGETSINCLEARCELLKIDNEYCANCER................................................................................................................................................32JeffreyA.Thompson,CarmenJ.Marsit
DENOVOMUTATIONSINAUTISMIMPLICATETHESYNAPTICELIMINATIONNETWORK.............................................................................................................................................................33GuhanRamVenkataraman,ChloeO'Connell,FumikoEgawa,DornaKashef-Haghighi,DennisPaulWall
IDENTIFYINGGENETICASSOCIATIONSWITHVARIABILITYINMETABOLICHEALTHANDBLOODCOUNTLABORATORYVALUES:DIVINGINTOTHEQUANTITATIVETRAITSBYLEVERAGINGLONGITUDINALDATAFROMANEHR.................................................................34ShefaliS.Verma,AnastasiaM.Lucas,DanielR.Lavage,JosephB.Leader,RaghuMetpally,SarathbabuKrishnamurthy,FrederickDewey,IngridBorecki,AlexanderLopez,JohnOverton,JohnPenn,JeffreyReid,SarahA.Pendergrass,GerdaBreitwieser,MarylynD.Ritchie
STRATEGIESFOREQUITABLEPHARMACOGENOMIC-GUIDEDWARFARINDOSINGAMONGEUROPEANANDAFRICANAMERICANINDIVIDUALSINACLINICALPOPULATION.......................................................................................................................................................35LauraWiley,JacobVanHouten,DavidSamuels,MelindaAldrich,DanRoden,JoshPeterson,JoshuaDenny
SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY36PRODUCTIONOFAPRELIMINARYQUALITYCONTROLPIPELINEFORSINGLENUCLEIRNA-SEQANDITSAPPLICATIONINTHEANALYSISOFCELLTYPEDIVERSITYOFPOST-MORTEMHUMANBRAINNEOCORTEX...................................................................................................37BrianAevermann,JamisonMcCorrison,PratapVenepally,RebeccaHodge,TrygveBakken,JeremyMiller,MarkNovotny,DannyN.Tran,FranciscoDiez-Fuertes,LenaChristiansen,FanZhang,FrankSteemers,RogerS.Lasken,EdLein,NicholasSchork,RichardH.Scheuermann
TRACINGCO-REGULATORYNETWORKDYNAMICSINNOISY,SINGLE-CELLTRANSCRIPTOMETRAJECTORIES.............................................................................................................38PabloCordero,JoshuaM.Stuart
ANUPDATEDDEBARCODINGTOOLFORMASSCYTOMETRYWITHCELLTYPE-SPECIFICANDCELLSAMPLE-SPECIFICSTRINGENCYADJUSTMENT...........................................................39KristinI.Fread,WilliamD.Strickland,GarryP.Nolan,EliR.Zunder
iv
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONSIMAGINGGENOMICS 40ADAPTIVETESTINGOFSNP-BRAINFUNCTIONALCONNECTIVITYASSOCIATIONVIAAMODULARNETWORKANALYSIS...............................................................................................................41ChenGao,JunghiKim,WeiPan
EXPLORINGBRAINTRANSCRIPTOMICPATTERNS:ATOPOLOGICALANALYSISUSINGSPATIALEXPRESSIONNETWORKS...........................................................................................................42ZhanaKuncheva,MichelleL.Krishnan,GiovanniMontana
PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM? 43ADEEPLEARNINGAPPROACHFORCANCERDETECTIONANDRELEVANTGENEIDENTIFICATION...............................................................................................................................................44PadidehDanaee,RezaGhaeini,DavidHendrix
GENOME-WIDEINTERACTIONWITHSELECTEDTYPE2DIABETESLOCIREVEALSNOVELLOCIFORTYPE2DIABETESINAFRICANAMERICANS...................................................45JacobM.Keaton,JacklynN.Hellwege,MaggieC.Y.Ng,NicholetteD.Palmer,JamesS.Pankow,MyriamFornage,JamesG.Wilson,AdolofoCorrea,LauraJ.Rasmussen-Torvik,JeromeI.Rotter,Yii-DerI.Chen,KentD.Taylor,StephenS.Rich,LynneE.Wagenknecht,BarryI.Freedman,DonaldW.Bowden
META-ANALYSISOFCONTINUOUSPHENOTYPESIDENTIFIESAGENESIGNATURETHATCORRELATESWITHCOPDDISEASESTATUS.......................................................................................46MadeleineScott,FrancescoVallania,PurveshKhatri
LEARNINGPARSIMONIOUSENSEMBLESFORUNBALANCEDCOMPUTATIONALGENOMICSPROBLEMS...................................................................................................................................47AnaStanescu,GauravPandey
NETWORKMAPOFADVERSEHEALTHEFFECTSAMONGVICTIMSOFINTIMATEPARTNERVIOLENCE.......................................................................................................................................48KathleenWhiting,LarryY.Liu,MehmetKoyutürk,GunnurKarakurt
PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES 49APOWERFULMETHODFORINCLUDINGGENOTYPEUNCERTAINTYINTESTSOFHARDY-WEINBERGEQUILIBRIUM............................................................................................................50AndrewBeck,AlexanderLuedtke,KeliLiu,NathanTintle
MICRORNA-AUGMENTEDPATHWAYS(MIRAP)ANDTHEIRAPPLICATIONSTOPATHWAYANALYSISANDDISEASESUBTYPING...............................................................................51DianaDiaz,MicheleDonato,TinNguyen,SorinDraghici
FREQUENTSUBGRAPHMININGOFPERSONALIZEDSIGNALINGPATHWAYNETWORKSGROUPSPATIENTSWITHFREQUENTLYDYSREGULATEDDISEASEPATHWAYSANDPREDICTSPROGNOSIS....................................................................................................................................52ArdaDurmaz,TimA.D.Henderson,DouglasBrubaker,GurkanBebek
CERNASEARCHMETHODIDENTIFIEDAMET-ACTIVATEDSUBGROUPAMONGEGFRDNAAMPLIFIEDLUNGADENOCARCINOMAPATIENTS.................................................................53HallaKabat,LeoTunkle,InhanLee
IMPROVEDPERFORMANCEOFGENESETANALYSISONGENOME-WIDETRANSCRIPTOMICSDATAWHENUSINGGENEACTIVITYSTATEESTIMATES...................54ThomasKamp,MicahAdams,CraigDisselkoen,NathanTintle
v
METHYLDMV:SIMULTANEOUSDETECTIONOFDIFFERENTIALDNAMETHYLATIONANDVARIABILITYWITHCONFOUNDERADJUSTMENT.................................................................55PeiFenKuan,JunyanSong,ShuyaoHe
IDENTIFYCANCERDRIVERGENESTHROUGHSHAREDMENDELIANDISEASEPATHOGENICVARIANTSANDCANCERSOMATICMUTATIONS.................................................56MengMa,ChangchangWang,BenjaminGlicksberg,EricE.Schadt,ShuyuLi,RongChen
IDENTIFYINGCANCERSPECIFICMETABOLICSIGNATURESUSINGCONSTRAINT-BASEDMODELS.................................................................................................................................................................57AndréSchultz,SanketMehta,ChenyueW.Hu,FiekeW.Hoff,TerzahM.Horton,StevenM.Kornblau,AminaA.Qutub
SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY58MAPPINGNEURONALCELLTYPESUSINGINTEGRATIVEMULTI-SPECIESMODELINGOFHUMANANDMOUSESINGLECELLRNASEQUENCING...................................................................59TravisJohnson,ZacharyAbrams,YanZhang,KunHuang
ASPATIOTEMPORALMODELTOSIMULATECHEMOTHERAPYREGIMENSFORHETEROGENEOUSBLADDERCANCERMETASTASESTOTHELUNG........................................60KimberlyR.KanigelWinner,JamesC.Costello
SCALABLEVISUALIZATIONFORHIGH-DIMENSIONALSINGLE-CELLDATA.........................61JuhoKim,NateRussell,JianPeng
POSTERPRESENTATIONSCOMPUTATIONALAPPROACHESTOUNDERSTANDINGTHEEVOLUTIONOFMOLECULARFUNCTION 62CLUSTER-BASEDGENOTYPE-ENVIRONMENT-PHENOTYPECORRELATIONALGORITHM.........................................................................................................................................................63ErnestoBorrayo,RyokoMachida-Hirano
QUANTITATINGTRANSLATIONALCONTROL:MRNAABUNDANCE-DEPENDENTANDINDEPENDENTCONTRIBUTIONS..............................................................................................................64JingyiJessicaLi,Guo-LiangChew,MarkD.Biggin
PROSNET:INTEGRATINGHOMOLOGYWITHMOLECULARNETWORKSFORPROTEINFUNCTIONPREDICTION................................................................................................................................65ShengWang,MengQu,JianPen
GENERAL 66IDENTIFICATIONOFDIFFERENTIALLYPHOSPHORYLATEDMODULESINPROTEININTERACTIONNETWORKS...........................................................................................................................67MarziehAyati,DanicaWiredja,DanielaSchlatzer,GouthamNarla,MarkChance,MehmetKoyutürk
CLUSTERINGMETHODFORPRIORITIZINGBREASTCANCERRISKGENESANDMIRNAS..................................................................................................................................................................68YongshengBai,NaureenAslam,AliSalman
FUSIONDB:ASSESSINGMICROBIALDIVERSITYANDENVIRONMENTALPREFERENCESVIAFUNCTIONALSIMILARITY....................................................................................................................69ChengshengZhu,YannickMahlich,YanaBromberg
vi
THEGEORGEM.O’BRIENKIDNEYTRANSLATIONALCORECENTERATTHEUNIVERSITYOFMICHIGAN......................................................................................................................................................70FrankC.Brosius,WenjunJu,KeithBellovich,ZeenatBhat,CrystalGadegbeku,DebbieGipson,JenniferHawkins,JuliaHerzog,SusanMassengill,RichardC.McEachin,SubramaniamPennathur,KalyaniPerumal,RogerWiggins,MatthiasKretzler
MININGDIRECTIONALDRUGINTERACTIONEFFECTSONMYOPATHYUSINGTHEFAERSDATABASE............................................................................................................................................................71DanaiChasioti,XiaohuiYao,PengyueZhang,XiaNing,LangLi,LiShen
DECIPHERINGNEURONALBROADHISTONEH3K4ME3DOMAINSASSOCIATEDWITH GENE-REGULATORYNETWORKSANDCONSERVEDEPIGENOMICLANDSCAPESINTHEHUMANBRAIN...................................................................................................................................................72AslihanDincer,EricE.Schadt,BinZhang,JoelT.Dudley,DavinGavin,SchahramAkbarian
NORMALIZATIONTECHNIQUESANDMACHINELEARNINGCLASSIFICATIONFORASSIGNINGMOLECULARSUBSETSINAUTOIMMUNEDISEASEANDCANCER....................73JenniferM.Franks,GuoshuaiCai,JaclynN.Taroni,MichaelL.Whitfield
MULTI-OMICSDATAINTEGRATIONTOSTRATIFYPOPULATIONINHEPATOCELLULARCARCINOMA.........................................................................................................................................................74KumardeepChaudhary,OlivierPoirion,LiangqunLu,LanaGarmire
TOWARDSSTANDARDS-BASEDCLINICALDATAWEBAPPLICATIONLEVERAGINGSHINYRANDHL7FHIR..................................................................................................................................75NaHong,NareshProdduturi,ChenWang,GuoqianJiang
ADATALAKEPLATFORMOFCONTEXTUALBIOLOGICALINFORMATIONFORAGILETRANSLATIONALRESEARCH......................................................................................................................76AustinHuang,DmitriBichko,MathieuBoespflug,EdskodeVries,FacundoDominguez,DanielZiemek
GENOMEREADIN-MEMORY(GRIM)FILTER:FASTLOCATIONFILTERINGINDNAREADMAPPINGUSINGEMERGINGMEMORYTECHNOLOGIES................................................................77JeremieKim,DamlaSenol,HongyiXin,DonghyukLee,MohammedAlser,HasanHassan,OguzErgin,CanAlkan,OnurMutlu
BCL-2FAMILYMEMBERSASREGULATORSOFRESPONSIVENESSTOBORTEZOMIBINAMULTIPLEMYELOMAMODEL.....................................................................................................................78MelissaE.Ko,CharisTeh,ChristopherS.Playter,EliR.Zunder,DanielH.Gray,WendyJ.Fantl,SylviaK.Plevritis,GarryP.Nolan
BIOMEDICALTEXT-MININGAPPLICATIONSFORTHESYSTEMDEEPDIVE.............................79EmilyK.Mallory,ChrisRe,RussB.Altman
PROFILINGADAPTIVEIMMUNEREPERTOIRESACROSSMULTIPLEHUMANTISSUESBYRNASEQUENCING.............................................................................................................................................80SergheiMangul,IgorMandric,HarryTaegyunYang,DennisMontoya,NicolasStrauli,JeremyRotman,BenjaminStatz,WillVanDerWey,AlexZelikovsky,RobertoSpreafico,MauraRossetti,SagivShifman,MarkAnsel,NoahZaitlen,EleazarEskin
THECMHVARIANTWAREHOUSE-ACATALOGOFGENETICVARIATIONINPATIENTSOFACHILDREN'SHOSPITAL.......................................................................................................................81NeilMIller,GreysonTwist,ByunggilYoo,AndreaGaedigk
MUTPRED2ANDITSAPPLICATIONTOTHEINFERENCEOFMOLECULARSIGNATURESOFDISEASE..........................................................................................................................................................82VikasPejaver,LiliaM.Iakoucheva,SeanD.Mooney,PredragRadivojac
HIV-TRACE:MONITORINGTHEHIVEPIDEMICINNEARREALTIMEUSINGLARGENATIONALANDGLOBALSCALEMOLECULAREPIDEMIOLOGY..................................................83SergeiPond,StevenWeaver,JoelWertheim,AndrewJ.LeighBrown
vii
THEEXTREMEMEMORY®CHALLENGE:ASEARCHFORTHEHERITABLEFOUNDATIONSOFEXCEPTIONALMEMORY........................................................................................84MaryA.Pyc,EmilyGiron,PhilipCheung,DouglasFenger,J.StevendeBelle,TimTully
RESCUETHEMISSINGVARIANTS-LESSONSLEARNEDFROMLARGESEQUENCINGPROJECTS..............................................................................................................................................................85YingxueRen,JosephS.Reddy,VivekanandaSarangi,JasonP.Sinnwell,SteveG.Younkin,NilüferErtekin-Taner,OwenA.Ross,RosaRademakers,ShannonK.McDonnell,JoannaM.Biernacka,YanW.Asmann
TOWARDEFFECTIVEMICRORNAQUANTIFICATIONFROMSMALLRNA-SEQ.......................86PamelaRussell,RichardRadcliffe,BrianVestal,WenShi,PratyaydiptaRudra,LauraSaba,KaterinaKechris
NANOPORESEQUENCINGTECHNOLOGYANDTOOLS:COMPUTATIONALANALYSISOFTHECURRENTSTATE,BOTTLENECKSANDFUTUREDIRECTIONS..........................................87DamlaSenol,JeremieKim,SaugataGhose,CanAlkan,OnurMutlu
DETECTINGOUTLIERSFROMMULTIDIMENSIONALDATAWITHAPPLICATIONINCANCER..................................................................................................................................................................88KyleSmith,SubhajyotiDe,DebashisGosh
HUEMR:INTUITIVEMININGOFELECTRONICMEDICALRECORDS...........................................89AbiodunOtolorin,NanaOsafo,WilliamSoutherland
DECIPHERINGLUNGADENOCARCINOMAMORPHOLOGYANDPROGNOSISBYINTEGRATINGOMICSANDHISTOPATHOLOGY..................................................................................90Kun-HsingYu,GeraldJ.Berry,DanielL.Rubin,ChristopherRé,RussB.Altman,MichaelSnyder
EXPLORINGDEEPLEARNINGFORCOPYNUMBERVARIATIONDETECTIONWITHNGSDATA.......................................................................................................................................................................91Yao-zhongZhang,RuiYamaguchi,SeiyaImoto,SatoruMiyano
IMAGINGGENOMICS 92PERIPHERALEPIGENETICASSOCIATIONSWITHBRAINGRAYMATTERINSCHIZOPHRENIA................................................................................................................................................93DongdongLin,VinceD.Calhoun,JuanR.Bustillo,NoraPerrone-Bizzozero,JingyuLiu
THEINTERPLAYBETWEENOLIGO-TARGETSPECIFICANDGENOME-WIDEOFF-TARGETINTERACTIONS...................................................................................................................................................94OlgaV.Matveeva,NafisaN.Nazipova,AlekseyY.Ogurtsov,SvetlanaA.Shabalina
PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM? 95WARS2IMPLICATEDASACOMMONMODIFIEROFMETFORMINMETABOLITEBIOMARKERSINABIOBANKCOHORT...................................................................................................96AlyssaI.Clay,RichardM.Weinshilboum,K.SreekumaranNair,RimaF.Kaddurah-Daouk,LieweiWang,MatthewK.Breitenstein
ESTIMATIONOFFALSENEGATIVERATESVIAEMBEDDINGSIMULATEDEVENTS..........97StephenV.Gliske,KatyL.Lau,BenjaminH.Brinkman,GregA.Worrell,CrisG.Fink,WilliamC.Stacey
INTEGRATIVE,INTERPRETABLEDEEPLEARNINGFRAMEWORKSFORREGULATORYGENOMICSANDEPIGENOMICS..................................................................................................................98ChuanShengFoo,AvantiShrikumar,JohnnyIsraeli,PeytonGreenside,ChrisProbert,AnnaScherbina,RahulMohan,NathanBoley,AnshulKundaje
VISUALIZATIONOFCOMPLEXDISEASESANDRELATEDGENESETS......................................99ModestvonKorff,TobiasFink,ThomasSander
viii
PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES 100FINDINGSFROMTHEFOURTHCRITICALASSESSMENTOFGENOMEINTERPRETATION,ACOMMUNITYEXPERIMENTTOEVALUATEPHENOTYPEPREDICTION............................101StevenE.Brenner,GaiaAndreoletti,RogerAHoskins,JohnMoult,CAGIParticipants
ASTROLABE:EXPANSIONTOCYP2C9ANDCYP2C1.......................................................................102AndreaGaedigk,GreysonP.Twist,SarahSoden,EmilyG.Farrow,NeilA.Miller
HUMANKINASESDISPLAYMUTATIONALHOTSPOTSATCOGNATEPOSITIONSWITHINCANCER................................................................................................................................................................103JonathanGallion,AngelaD.Wilkins,OlivierLichtarge
SCOTCH:ANOVELMETHODTODETECTINSERTIONSANDDELETIONSFROMNGSDATA.....................................................................................................................................................................104RachelGoldfeder,EuanAshley
MAYOOMICSREPOSITORYFORTRANSLATIONALMEDICINE..................................................105IainHorton,JeanetteEckel-Passow,StevenHart,ShannonMcDonnell,DavidMead,GayGay Reed,GregDougherty,JasonRoss,JulieSwank,MarkMyers,MathieuWiepert,RamaVolety,TonyStai,YaxiongLin,RobertFreimuth
PHARMACOGENOMICSCLINICALANNOTATIONTOOL(PHARMCAT).....................................106T.E.Klein,M.Whirl-Carrillo,R.M.Whaley,M.Woon,K.Sangkuhl,LesterG.Carter,H.M.Dunnenberger,P.E.Empey,A.T.Frase,R.R.Freimuth,A.Gaedigk,A.Gordon,C. Haidar,J.K.Hicks,J.M.Hoffman,M.T.Lee,N.Miller,S.D.Mooney,T.N.Person,J.F.Peterson,M.V.Relling,S.A.Scott,G.Twist,A.Verma,M.S.Williams,C.Wu,W.Yang,M.D.Ritchie
PCSK9MODULATINGVARIANTSINFAMILIALHYPERCHOLESTEROLEMIA......................107SarathbabuKrishnamurthy,DianeSmelser,ManickamKandamurugu,JosephLeader,NouraS.Abul-Husn,AlanR.Shuldiner,DavidH.Ledbetter,FrederickE.Dewey,DavidJ.Carey,MichaelF.Murray,RaghuP.R.Metpally
INTEGRATIVENETWORKANALYSISOFPROSTATETISSUELINCRNA-MRNAEXPRESSIONPROFILESREVEALSPOTENTIALREGULATORYMECHANISMSOFPROSTATECANCERRISKLOCI.................................................................................................................108NicholasB.Larson,ShannonMcDonnell,ZachFogarty,MelissaLarson,JohnCheville,ShaunRiska,SaurabhBaheti,AshaA.Nair,DanielO’Brien,Jaime Davila, Daniel Schaid, Stephen N. Thibodeau
INTEGRATEDANALYSISOFGENOMICS,PROTEOMICS,ANDPHOSPHOPROTEOMICSINCELLSANDTUMORSAMPLES...................................................................................................................109JasonE.McDermott,TaoLiu,SamuelPayne,VladislavPetyuk,RichardSmith,PhilippMertins,StevenCarr,KarinRodland
NETDX:PATIENTCLASSIFICATIONUSINGINTEGRATEDPATIENTSIMILARITYNETWORKS........................................................................................................................................................110ShraddhaPai,ShirleyHui,RuthIsserlin,HussamKaka,GaryD.Bader
PREVALENCEANDDETECTIONOFLOW-ALLELE-FRACTIONVARIANTSINCLINICALCANCERSAMPLES...........................................................................................................................................111Hyun-TaeShin,JaeWonYun,NayoungK.D.Kim,Yoon-LaChoi,Woong-YangPark,PeterJ.Park
AMETHYLATION-TO-EXPRESSIONFEATUREMODELFORGENERATINGACCURATEPROGNOSTICRISKSCORESANDIDENTIFYINGDISEASETARGETS.......................................112JeffreyA.Thompson,CarmenJ.Marsit
CYP2D6DIPLOTYPECALLINGFROMWGSUSINGASTROLABE:UPDATE............................113AndreaGaedigk,GreysonP.Twist,SarahSoden,EmilyG.Farrow,NeilA.Miller
ix
INTEGRATION,INTERPRETATIONANDDISPLAYOFMULTI-OMICDATAFORPRECISIONMEDICINE...........................................................................................................................................................114DavidS.Wishart,AnaMarcu,AnChiGuo,AshAnwar,SolveigJohannessen,CraigKnox,MichaelWilson,ChristophH.Borchers,PieterCullis,RobertFraser
BIOTHINGSAPIS:LINKEDHIGH-PERFORMANCEAPISFORBIOLOGICALENTITIES..........115JiwenXin,CyrusAfrasiabi,SebastienLelong,GingerTsueng,SeanD.Mooney,AndrewI.Su,ChunleiWu
SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY116SINGLECELLSIGNALINGSTATESREVEALINDUCTIONOFNON-GENETICVARIATIONINRESISTANCETOTRAIL-INDUCEDAPOPTOSIS..................................................................................117ReemaBaskar,HarrisFienberg,GarryNolan,SeanBendall
ANOVELK-NEARESTNEIGHBORSAPPROACHTOCOMPAREMULTIPLEBIOLOGICALCONDITIONSINSINGLECELLDATA......................................................................................................118TylerJ.Burns,GarryP.Nolan,NikolaySamusik
SINGLE-CELLRNASEQUENCINGINPRIMARYGLIOBLASTOMA:IMPROVINGANALYSISOFHETEROGENEOUSSAMPLESBYINCORPORATINGQUANTIFICATIONOFUNCERTAINTY..................................................................................................................................................119WendyMarieIngram,DebdiptoMisra,NicholasF.Marko,MarylynRitchie
REGISTRATIONOFFLOWCYTOMETRYDATAUSINGSWIFTCLUSTERTEMPLATESTOREMOVECHANNEL-SPECIFICORCLUSTER-SPECIFICVARIATION.........................................120JonathanA.Rebhahn,SallyA.Quataert,GauravSharma,TimR.Mosmann
WORKSHOP:NOBOUNDARYTHINKINGINBIOINFORMATICS 121ENABLINGRICHERDATAINTEGRATIONFORGENOMICEPIDEMIOLOGY..........................122E. Griffiths,D.Dooley,C.Bertelli,J.Adam,F.Bristow,T.Matthews,A.Petkau,M.Courtot,J.A. Carriço,A.Keddy,R.Beiko,L.M.Schriml,E.Taboada,M.Graham,G.VanDomselaar,W. Hsiao,F.Brinkman
AUTHORINDEX 123
1
COMPUTATIONALAPPROACHESTOUNDERSTANDINGTHEEVOLUTIONOFMOLECULARFUNCTION
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
2
IDENTIFICATIONANDANALYSISOFBACTERIALGENOMICMETABOLICSIGNATURES
NathanBowerman1,NathanTintle2,MatthewDeJongh3,AaronA.Best1
1DepartmentofBiology,HopeCollege;2DepartmentofMathematicsandStatistics,DordtCollege,3DepartmentofComputerScience,HopeCollege
BestAaronWithcontinuedrapidgrowthinthenumberandqualityoffullysequencedandaccuratelyannotatedbacterialgenomes,wehaveunprecedentedopportunitiestounderstandmetabolicdiversity.Weselected101diverseandrepresentativecompletelysequencedbacteriaandimplementedamanualcurationefforttoidentify846uniquemetabolicvariantspresentinthesebacteria.Thepresenceorabsenceofthesevariantsactasametabolicsignatureforeachofthebacteria,whichcanthenbeusedtounderstandsimilaritiesanddifferencesbetweenandacrossbacterialgroups.Weproposeanovelandrobustmethodofsummarizingmetabolicdiversityusingmetabolicsignaturesandusethismethodtogenerateametabolictree,clusteringmetabolicallysimilarorganisms.Resultinganalysisofthemetabolictreeconfirmsstrongassociationswithwell-establishedbiologicalresultsalongwithdirectinsightintoparticularmetabolicvariantswhicharemostpredictiveofmetabolicdiversity.Thepositiveresultsofthismanualcurationeffortandnovelmethoddevelopmentsuggestthatfutureworkisneededtofurtherexpandthesetofbacteriatowhichthisapproachisappliedandusetheresultingtreetotestbroadquestionsaboutmetabolicdiversityandcomplexityacrossthebacterialtreeoflife.
3
WHENSHOULDWENOTTRANSFERFUNCTIONALANNOTATIONBETWEENSEQUENCEPARALOGS?
MengfeiCao,LenoreJ.Cowen
TuftsUniversity
LenoreCowenCurrentautomatedcomputationalmethodstoassignfunctionallabelstounstudiedgenesofteninvolvetransferringannotationfromorthologousorparalogousgenes,howeversuchgenescanevolvedivergentfunctions,makingsuchtransferinappropriate.Weconsidertheproblemofdeterminingwhenitiscorrecttomakesuchanassignmentbetweenparalogs.Weconstructabenchmarkdatasetoftwotypesofsimilarparalogouspairsofgenesinthewell-studiedmodelorganismS.cerevisiae:onesetofpairswheresingledeletionmutantshaveverysimilarphenotypes(implyingsimilarfunctions),andanothersetofpairswheresingledeletionmutantshaveverydivergentphenotypes(implyingdifferentfunctions).Stateoftheartmethodsforthisproblemwilldeterminetheevolutionaryhistoryoftheparalogswithreferencestomultiplerelatedspecies.Here,weaskafirstandsimplerquestion:weexploretowhatextentanycomputationalmethodwithaccessonlytodatafromasinglespeciescansolvethisproblem.Weconsiderdivergencedata(atboththeaminoacidandnucleotidelevels),andnetworkdata(basedontheyeastprotein-proteininteractionnetwork,ascapturedinBioGRID),andaskifwecanextractfeaturesfromthesedatathatcandistinguishbetweenthesesetsofparalogousgenepairs.Wefindthatthebestfeaturescomefrommeasuresofsequencedivergence,however,simplenetworkmeasuresbasedondegreeorcentralityorshortestpathordiffusionstatedistance(DSD),orsharedneighborhoodintheyeastprotein-proteininteraction(PPI)networkalsocontainsomesignal.Oneshould,ingeneral,nottransferfunctionifsequencedivergenceistoohigh.Furtherimprovementsinclassificationwillneedtocomefrommorecomputationallyexpensivebutmuchmorepowerfulevolutionarymethodsthatincorporateancestralstatesandmeasureevolutionarydivergenceovermultiplespeciesbasedonevolutionarytrees.
4
PROSNET:INTEGRATINGHOMOLOGYWITHMOLECULARNETWORKSFORPROTEINFUNCTIONPREDICTION
ShengWang,MengQu,JianPeng
UniversityofIllinoisUrbana-Champaign
ShengWangAutomatedannotationofproteinfunctionhasbecomeacriticaltaskinthepost-genomicera.Network-basedapproachesandhomology-basedapproacheshavebeenwidelyusedandrecentlytestedinlarge-scalecommunity-wideassessmentexperiments.Itisnaturaltointegratenetworkdatawithhomologyinformationtofurtherimprovethepredictiveperformance.However,integratingthesetwoheterogeneous,high-dimensionalandnoisydatasetsisnon-trivial.Inthiswork,weintroduceanovelproteinfunctionpredictionalgorithmProSNet.Anintegratedheterogeneousnetworkisfirstbuilttoincludemolecularnetworksofmultiplespeciesandlinktogetherhomologousproteinsacrossmultiplespecies.Basedonthisintegratednetwork,adimensionalityreductionalgorithmisintroducedtoobtaincompactlow-dimensionalvectorstoencodeproteinsinthenetwork.Finally,wedevelopmachinelearningclassificationalgorithmsthattakethevectorsasinputandmakepredictionsbytransferringannotationsbothwithineachspeciesandacrossdifferentspecies.Extensiveexperimentsonfivemajorspeciesdemonstratethatourintegrationofhomologywithmolecularnetworkssubstantiallyimprovesthepredictiveperformanceoverexistingapproaches.
5
ONTHEPOWERANDLIMITSOFSEQUENCESIMILARITYBASEDCLUSTERINGOFPROTEINSINTOFAMILIES
ChristianWiwie,RichardRöttger
UniversityofSouthernDenmark
RichardRöttgerOverthelastdecades,wehaveobservedanongoingtremendousgrowthofavailablesequencingdatafueledbytheadvancementsinwet-labtechnology.Thesequencinginformationisonlythebeginningoftheactualunderstandingofhoworganismssurviveandprosper.Itis,forinstance,equallyimportanttoalsounraveltheproteomicrepertoireofanorganism.Aclassicalcomputationalapproachfordetectingproteinfamiliesisasequence-basedsimilaritycalculationcoupledwithasubsequentclusteranalysis.Inthisworkwehaveintensivelyanalyzedvariousclusteringtoolsonalargescale.Weusedthedatatoinvestigatethebehaviorofthetools'parametersunderliningthediversityoftheproteinfamilies.Furthermore,wetrainedregressionmodelsforpredictingtheexpectedperformanceofaclusteringtoolforanunknowndatasetandaimedtoalsosuggestoptimalparametersinanautomatedfashion.Ouranalysisdemonstratesthebenefitsandlimitationsoftheclusteringofproteinswithlowsequencesimilarityindicatingthateachproteinfamilyrequiresitsowndistinctsetoftoolsandparameters.Allresults,atoolpredictionservice,andadditionalsupportingmaterialisalsoavailableonlineunderhttp://proteinclustering.compbio.sdu.dk/
6
IMAGINGGENOMICS
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
7
INTEGRATIVEANALYSISFORLUNGADENOCARCINOMAPREDICTSMORPHOLOGICALFEATURESASSOCIATEDWITHGENETICVARIATIONS
ChaoWang1,HaiSu2,LinYang2,KunHuang1
1TheOhioStateUniversity,2UniversityofFlorida
KunHuangLungcancerisoneofthemostdeadlycancersandlungadenocarcinoma(LUAD)isthemostcommonhistologicaltypeoflungcancer.However,LUADishighlyheterogeneousduetogeneticdifferenceaswellasphenotypicdifferencessuchascellularandtissuemorphology.Inthispaper,wesystematicallyexaminetherelationshipsbetweenhistologicalfeaturesandgenetranscription.Specifically,wecalculated283morphologicalfeaturesfromhistologyimagesfor201LUADpatientsfromTCGAprojectandidentifiedthemorphologicalfeaturewithstrongcorrelationwithpatientoutcome.Wethenmodeledthemorphologyfeatureusingmultipleco-expressedgeneclustersusingLasso-regression.Manyofthegeneclustersarehighlyassociatedwithgeneticvariations,specificallyDNAcopynumbervariations,implyingthatgeneticvariationsplayimportantrolesinthedevelopmentcancermorphology.Asfarasweknow,ourfindingisthefirsttodirectlylinkthegeneticvariationsandfunctionalgenomicstoLUADhistology.Theseobservationswillleadtonewinsightonlungcancerdevelopmentandpotentialnewintegrativebiomarkersforpredictionpatientprognosisandresponsetotreatments.
8
IDENTIFICATIONOFDISCRIMINATIVEIMAGINGPROTEOMICSASSOCIATIONSINALZHEIMER'SDISEASEVIAANOVELSPARSECORRELATIONMODEL
JingwenYan,ShannonL.Risacher,KwangsikNho,AndrewJ.Saykin,LiShen
IndianaUniversity
JingwenYanBrainimagingandproteinexpression,frombothcerebrospinalfluidandbloodplasma,havebeenfoundtoprovidecomplementaryinformationinpredictingtheclinicaloutcomesofAlzheimer'sdisease(AD).Buttheunderlyingassociationsthatcontributetosuchacomplementaryrelationshiphavenotbeenpreviouslystudiedyet.Inthiswork,wewillperformanimagingproteomicsassociationanalysistoexplorehowtheyarerelatedwitheachother.Whiletraditionalassociationmodels,suchasSparseCanonicalCorrelationAnalysis(SCCA),cannotguaranteetheselectionofonlydisease-relevantbiomarkersandassociations,weproposeanoveldiscriminativeSCCA(denotedasDSCCA)modelwithnewpenaltytermstoaccountforthediseasestatusinformation.Givenbrainimaging,proteomicanddiagnosticdata,theproposedmodelcanperformajointassociationandmulti-classdiscriminationanalysis,suchthatwecannotonlyidentifydisease-relevantmultimodalbiomarkers,butalsorevealstrongassociationsbetweenthem.Basedonarealimagingproteomicdataset,theempiricalresultsshowthatDSCCAandtraditionalSCCAhavecomparableassociationperformances.Butinafurtherclassificationanalysis,canonicalvariablesofimagingandproteomicdataobtainedinDSCCAdemonstratemuchmorediscriminationpowertowardmultiplepairsofdiagnosisgroupsthanthoseobtainedinSCCA.
9
ENFORCINGCO-EXPRESSIONINMULTIMODALREGRESSIONFRAMEWORK
PascalZille1,VinceD.Calhoun2,Yu-PingWang1
1TulaneUniversity,2UniversityofNewMexico
PascalZilleWeconsidertheproblemofmultimodaldataintegrationforthestudyofcomplexneurologicaldiseases(e.g.schizophrenia).Amongthechallengesarisinginsuchsituation,estimatingthelinkbetweengeneticandneurologicalvariabilitywithinapopulationsamplehasbeenapromisingdirection.Awidevarietyofstatisticalmodelsarosefromsuchapplications.Forexample,Lassoregressionanditsmultitaskextensionareoftenusedtofitamultivariatelinearrelationshipbetweengivenphenotype(s)andassociatedobservations.Otherapproaches,suchascanonicalcorrelationanalysis(CCA),arewidelyusedtoextractrelationshipsbetweensetsofvariablesfromdifferentmodalities.Inthispaper,weproposeanexploratorymultivariatemethodcombiningthesetwomethods.MoreSpecifically,werelyona'CCA-type'formulationinordertoregularizetheclassicalmultimodalLassoregressionproblem.Theunderlyingmotivationistoextractdiscriminativevariablesthatdisplayarealsoco-expressedacrossmodalities.Wefirstevaluatethemethodonasimulateddataset,andfurthervalidateitusingSingleNucleotidePolymorphisms(SNP)andfunctionalMagneticResonanceImaging(fMRI)dataforthestudyofschizophrenia.
10
METHODSTOENSURETHEREPRODUCIBILITYOFBIOMEDICALRESEARCH
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
11
EXPLORINGTHEREPRODUCIBILITYOFPROBABILISTICCAUSALMOLECULARNETWORKMODELS
AriellaCohain,AparnaA.Divaraniya,KuixiZhu,JosephR.Scarpa,AndrewKasarskis,JunZhu,RuiChang,JoelT.Dudley,EricE.Schadt
IcahnInstituteandDepartmentofGeneticsandGenomics,IcahnSchoolofMedicineatMountSinai
AriellaCohainNetworkreconstructionalgorithmsareincreasinglybeingemployedinbiomedicalandlifesciencesresearchtointegratelarge-scale,high-dimensionaldatainformingonlivingsystems.OneparticularclassofprobabilisticcausalnetworksbeingappliedtomodelthecomplexityandcausalstructureofbiologicaldataisBayesiannetworks(BNs).BNsprovideanelegantmathematicalframeworkfornotonlyinferringcausalrelationshipsamongmanydifferentmolecularandhigherorderphenotypes,butalsoforincorporatinghighlydiversepriorsthatprovideanefficientpathforincorporatingexistingknowledge.WhilesignificantmethodologicaldevelopmentshavebroadlyenabledtheapplicationofBNstogenerateandvalidatemeaningfulbiologicalhypotheses,thereproducibilityofBNsinthiscontexthasnotbeensystematicallyexplored.Inthisstudy,weaimtodeterminethecriteriaforgeneratingreproducibleBNsinthecontextoftranscription-basedregulatorynetworks.Weutilizetwouniquetissuesfromindependentdatasets,wholebloodfromtheGTExConsortiumandliverfromtheStockholm-TartuAtherosclerosisReverseNetworkEngineeringTeam(STARNET)study.WeevaluatedthereproducibilityoftheBNsbycreatingnetworksondatasubsampledatdifferentlevelsfromeachcohortandcomparingthesenetworkstotheBNsconstructedusingthecompletedata.Tohelpvalidateourresults,weusedsimulatednetworksatvaryingsamplesizes.OurstudyindicatesthatreproducibilityofBNsinbiologicalresearchisanissueworthyoffurtherconsideration,especiallyinlightofthemanypublicationsthatnowemployfindingsfromsuchconstructswithoutappropriateattentionpaidtoreproducibility.Wefindthatwhileedge-to-edgereproducibilityisstronglydependentonsamplesize,identificationofmorehighlyconnectedkeydrivernodesinBNscanbecarriedoutwithhighconfidenceacrossarangeofsamplesizes.
12
REPRODUCIBLEDRUGREPURPOSING:WHENSIMILARITYDOESNOTSUFFICE
EmreGuney
JointIRB-BSC-CRGPrograminComputationalBiology-InstituteforResearchinBiomedicine(IRB)Barcelona
EmreGuneyRepurposingexistingdrugsfornewuseshasattractedconsiderableattentionoverthepastyears.Toidentifypotentialcandidatesthatcouldberepositionedforanewindication,manystudiesmakeuseofchemical,target,andsideeffectsimilaritybetweendrugstotrainclassifiers.Despitepromisingpredictionaccuraciesofthesesupervisedcomputationalmodels,theiruseinpractice,suchasforrarediseases,ishinderedbytheassumptionthattherearealreadyknownandsimilardrugsforagivenconditionofinterest.Inthisstudy,usingpubliclyavailabledatasets,wequestionthepredictionaccuraciesofsupervisedapproachesbasedondrugsimilaritywhenthedrugsinthetrainingandthetestsetarecompletelydisjoint.WefirstbuildaPythonplatformtogeneratereproduciblesimilarity-baseddrugrepurposingmodels.Next,weshowthat,whileasimplechemical,target,andsideeffectsimilaritybasedmachinelearningmethodcanachievegoodperformanceonthebenchmarkdataset,thepredictionperformancedropssharplywhenthedrugsinthefoldsofthecrossvalidationarenotoverlappingandthesimilarityinformationwithinthetrainingandtestsetsareusedindependently.Theseintriguingresultssuggestrevisitingtheassumptionsunderlyingthevalidationscenariosofsimilarity-basedmethodsandunderlinetheneedforunsupervisedapproachestoidentifynoveldrugusesinsidetheunexploredpharmacologicalspace.WemakethedigitalnotebookcontainingthePythoncodetoreplicateouranalysisthatinvolvesthedrugrepurposingplatformbasedonmachinelearningmodelsandtheproposeddisjointcrossfoldgenerationmethodfreelyavailableatgithub.com/emreg00/repurpose.
13
EMPOWERINGMULTI-COHORTGENEEXPRESSIONANALYSISTOINCREASEREPRODUCIBILITY
WinstonA.Haynes,FrancescoVallania,CharlesLiu,ErikaBongen,AurelieTomczak,MartaAndres-Terrè,ShaneLofgren,AndrewTam,ColeA.Deisseroth,MatthewD.Li,
TimothyE.Sweeney,PurveshKhatri
StanfordUniversity
WinstonHaynesAmajorcontributortothescientificreproducibilitycrisishasbeenthattheresultsfromhomogeneous,single-centerstudiesdonotgeneralizetoheterogeneous,realworldpopulations.Multi-cohortgeneexpressionanalysishashelpedtoincreasereproducibilitybyaggregatingdatafromdiversepopulationsintoasingleanalysis.Tomakethemulti-cohortanalysisprocessmorefeasible,wehaveassembledananalysispipelinewhichimplementsrigorouslystudiedmeta-analysisbestpractices.Wehavecompiledandmadepubliclyavailabletheresultsofourownmulti-cohortgeneexpressionanalysisof103diseases,spanning615studiesand36,915samples,throughanovelandinteractivewebapplication.Asaresult,wehavemadeboththeprocessofandtheresultsfrommulti-cohortgeneexpressionanalysismoreapproachablefornon-technicalusers.
14
RABIX:ANOPEN-SOURCEWORKFLOWEXECUTORSUPPORTINGRECOMPUTABILITYANDINTEROPERABILITYOFWORKFLOWDESCRIPTIONS
GauravKaushik,SinisaIvkovic,JankoSimonovic,NebojsaTijanic,BrandiDavis-Dusenbery,DenizKural
SevenBridgesGenomics
GauravKaushikAsbiomedicaldatahasbecomeincreasinglyeasytogenerateinlargequantities,themethodsusedtoanalyzeithaveproliferatedrapidly.Reproducibleandreusablemethodsarerequiredtolearnfromlargevolumesofdatareliably.Toaddressthisissue,numerousgroupshavedevelopedworkflowspecificationsorexecutionengines,whichprovideaframeworkwithwhichtoperformasequenceofanalyses.OnesuchspecificationistheCommonWorkflowLanguage,anemergingstandardwhichprovidesarobustandflexibleframeworkfordescribingdataanalysistoolsandworkflows.Inaddition,reproducibilitycanbefurtheredbyexecutorsorworkflowengineswhichinterpretthespecificationandenableadditionalfeatures,suchaserrorlogging,fileorganization,optimizationstocomputationandjobscheduling,andallowforeasycomputingonlargevolumesofdata.Tothisend,wehavedevelopedtheRabixExecutora,anopen-sourceworkflowengineforthepurposesofimprovingreproducibilitythroughreusabilityandinteroperabilityofworkflowdescriptions.
15
DATASHARINGANDCLINICALGENETICTESTING:SUCCESSESANDCHALLENGES
ShanYang1,MelissaCline2,CanZhang2,BenedictPaten2,StephenE.Lincoln1
1Invitae,2UniversityofCaliforniaSantaCruz
StephenLincolnOpensharingofclinicalgeneticdatapromisestobothmonitorandeventuallyimprovethereproducibilityofvariantinterpretationamongclinicaltestinglaboratories.AsignificantpublicdataresourcehasbeendevelopedbytheNIHClinVarinitiative,whichincludessubmissionsfromhundredsoflaboratoriesandclinicsworldwide.WeanalyzedasubsetofClinVardatafocusedonspecificclinicalareasandwefindhighreproducibility(>90%concordance)amonglabs,althoughchallengesforthecommunityareclearlyidentifiedinthisdataset.WefurtherreviewresultsforthecommonlytestedBRCA1andBRCA2genes,whichshowevenhigherconcordance,althoughthesignificantfragmentationofdataintodifferentsilospresentsanongoingchallengenowbeingaddressedbytheBRCAExchange.Weencouragealllaboratoriesandclinicstocontributetotheseimportantresources.
16
PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM?
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
17
LEARNINGATTRIBUTESOFDISEASEPROGRESSIONFROMTRAJECTORIESOFSPARSELABVALUES
VibhuAgarwal1,NigamH.Shah2
1BiomedicalInformaticsTrainingProgram,StanfordUniversity,2TheCenterforBiomedicalInformaticsResearch,StanfordUniversity
VibhuAgarwalThereisheterogeneityinthemanifestationofdiseases,thereforeitisessentialtounderstandthepatternsofprogressionofadiseaseinagivenpopulationfordiseasemanagementaswellasforclinicalresearch.Diseasestatusisoftensummarizedbyrepeatedrecordingsofoneormorephysiologicalmeasures.Asaresult,historicalvaluesofthesephysiologicalmeasuresforapopulationsamplecanbeusedtocharacterizediseaseprogressionpatterns.Weuseamethodforclusteringsparsefunctionaldataforidentifyingsub-groupswithinacohortofpatientswithchronickidneydisease(CKD),basedonthetrajectoriesoftheirCreatininemeasurements.Wedemonstratethroughaproof-of-principlestudyhowthetwosub-groupsthatdisplaydistinctpatternsofdiseaseprogressionmaybecomparedonclinicalattributesthatcorrespondtothemaximumdifferenceinprogressionpatterns.Thekeyattributesthatdistinguishthetwosub-groupsappeartohavesupportinpublishedliteratureclinicalpracticerelatedtoCKD.
18
COMPUTERAIDEDIMAGESEGMENTATIONANDCLASSIFICATIONFORVIABLEANDNON-VIABLETUMORIDENTIFICATIONINOSTEOSARCOMA
HarishBabuArunachalam1,RashikaMishra1,BogdanArmaselu1,OvidiuDaescu1,MariaMartinez1,PatrickLeavey1,DineshRakheja2,KevinCederberg2,AnitaSengupta2,Molly
Ni'Suilleabhain2
1UniversityofTexasatDallas,2UniversityofTexasSouthwesternMedicalCenter
HarishBabuArunachalamOsteosarcomaisoneofthemostcommontypesofbonecancerinchildren.Togaugetheextentofcancertreatmentresponseinthepatientaftersurgicalresection,theH&Estainedimageslidesaremanuallyevaluatedbypathologiststoestimatethepercentageofnecrosis,atimeconsumingprocesspronetoobserverbiasandinaccuracy.Digitalimageanalysisisapotentialmethodtoautomatethisprocess,thussavingtimeandprovidingamoreaccurateevaluation.TheslidesarescannedinAperioScanscope,convertedtodigitalWholeSlideImages(WSIs)andstoredinSVSformat.Thesearehighresolutionimages,oftheorderof10^9pixels,allowingupto40Xmagnificationfactor.Thispaperproposesanimagesegmentationandanalysistechniqueforsegmentingtumorandnon-tumorregionsinhistopathologicalWSIsofosteosarcomadatasets.Ourapproachisacombinationofpixel-basedandobject-basedmethodswhichutilizetumorpropertiessuchasnucleicluster,density,andcircularitytoclassifytumorregionsasviableandnon-viable.AK-Meansclusteringtechniqueisusedfortumorisolationusingcolornormalization,followedbymulti-thresholdOtsusegmentationtechniquetofurtherclassifytumorregionasviableandnon-viable.ThenaFlood-fillalgorithmisappliedtoclustersimilarpixelsintocellularobjectsandcomputeclusterdataforfurtheranalysisofregionsunderstudy.TothebestofourknowledgethisisthefirstcomprehensivesolutionthatisabletoproducesuchaclassificationforOsteosarcomacancer.Theresultsareveryconclusiveinidentifyingviableandnon-viabletumorregions.Inourexperiments,theaccuracyofthediscussedapproachis100%inviabletumorandcoagulativenecrosisidentificationwhileitisaround90%forfibrosisandacellular/hypocellulartumorosteoid,forallthesampleddatasetsused.Weexpectthedevelopedsoftwaretoleadtoasignificantincreaseinaccuracyanddecreaseininter-observervariabilityinassessmentofnecrosisbythepathologistsandareductioninthetimespentbythepathologistsinsuchassessments.
19
MISSINGDATAIMPUTATIONINTHEELECTRONICHEALTHRECORDUSINGDEEPLYLEARNEDAUTOENCODERS
BrettK.Beaulieu-Jones1,JasonH.Moore2,ThePooledResourceOpen-AccessALSClinicalTrialsConsortium
1GenomicsandComputationalBiologyGraduateGroup,ComputationalGeneticsLab,InstituteforBiomedicalInformatics,PerelmanSchoolofMedicine,UniversityofPennsylvania;2ComputationalGeneticsLab,InstituteforBiomedicalInformatics,
UniversityofPennsylvania
BrettBeaulieu-JonesElectronichealthrecords(EHRs)havebecomeavitalsourceofpatientoutcomedatabutthewidespreadprevalenceofmissingdatapresentsamajorchallenge.DifferentcausesofmissingdataintheEHRdatamayintroduceunintentionalbias.Here,wecomparetheeffectivenessofpopularmultipleimputationstrategieswithadeeplylearnedautoencoderusingthePooledResourceOpen-AccessALSClinicalTrialsDatabase(PRO-ACT).Toevaluateperformance,weexaminedimputationaccuracyforknownvaluessimulatedtobeeithermissingcompletelyatrandomormissingnotatrandom.WealsocomparedALSdiseaseprogressionpredictionacrossdifferentimputationmodels.Autoencodersshowedstrongperformanceforimputationaccuracyandcontributedtothestrongestdiseaseprogressionpredictor.Finally,weshowthatdespiteclinicalheterogeneity,ALSdiseaseprogressionappearshomogenouswithtimefromonsetbeingthemostimportantpredictor.
20
DEVELOPMENTANDPERFORMANCEOFTEXT-MININGALGORITHMSTOEXTRACTSOCIOECONOMICSTATUSFROMDE-IDENTIFIEDELECTRONICHEALTH
RECORDS
BrittanyM.Hollister1,NicoleA.Restrepo2,EricFarber-Eger3,DanaC.Crawford2,MelindaC.Aldrich4,AmyNon5
1VanderbiltGeneticInstitute,VanderbiltUniversity;2InstituteforComputationalBiologyandDepartmentofEpidemiologyandBiostatistics,CaseWesternReserveUniversity;3VanderbiltInstituteforClinicalandTranslationalResearch,VanderbiltUniversity;
4DepartmentofThoracicSurgeryandDivisionofEpidemiology,VanderbiltUniversityMedicalCenter;5DepartmentofAnthropology,UniversityofCaliforniaSanDiego
BrittanyHollisterSocioeconomicstatus(SES)isafundamentalcontributortohealth,andakeyfactorunderlyingracialdisparitiesindisease.However,SESdataarerarelyincludedingeneticstudiesdueinparttothedifficultlyofcollectingthesedatawhenstudieswerenotoriginallydesignedforthatpurpose.Theemergenceoflargeclinic-basedbiobankslinkedtoelectronichealthrecords(EHRs)providesresearchaccesstolargepatientpopulationswithlongitudinalphenotypedatacapturedinstructuredfieldsasbillingcodes,procedurecodes,andprescriptions.SESdatahowever,areoftennotexplicitlyrecordedinstructuredfields,butratherrecordedinthefreetextofclinicalnotesandcommunications.Thecontentandcompletenessofthesedatavarywidelybypractitioner.Toenablegene-environmentstudiesthatconsiderSESasanexposure,wesoughttoextractSESvariablesfromracial/ethnicminorityadultpatients(n=9,977)inBioVU,theVanderbiltUniversityMedicalCenterbiorepositorylinkedtode-identifiedEHRs.WedevelopedseveralmeasuresofSESusinginformationavailablewithinthede-identifiedEHR,includingbroadcategoriesofoccupation,education,insurancestatus,andhomelessness.TwohundredpatientswererandomlyselectedformanualreviewtodevelopasetofsevenalgorithmsforextractingSESinformationfromde-identifiedEHRs.Thealgorithmsconsistof15categoriesofinformation,with830uniquesearchterms.SESdataextractedfrommanualreviewof50randomlyselectedrecordswerecomparedtodataproducedbythealgorithm,resultinginpositivepredictivevaluesof80.0%(education),85.4%(occupation),87.5%(unemployment),63.6%(retirement),23.1%(uninsured),81.8%(Medicaid),and33.3%(homelessness),suggestingsomecategoriesofSESdataareeasiertoextractinthisEHRthanothers.TheSESdataextractionapproachdevelopedherewillenablefutureEHR-basedgeneticstudiestointegrateSESinformationintostatisticalanalyses.Ultimately,incorporationofmeasuresofSESintogeneticstudieswillhelpelucidatetheimpactofthesocialenvironmentondiseaseriskandoutcomes.
21
DEMODASHBOARD:VISUALIZINGANDUNDERSTANDINGGENOMICSEQUENCESUSINGDEEPNEURALNETWORKS
JackLanchantin,RitambharaSingh,BeilunWang,YanjunQi
UniversityofVirginia
JackLanchantinDeepneuralnetwork(DNN)modelshaverecentlyobtainedstate-of-the-artpredictionaccuracyforthetranscriptionfactorbinding(TFBS)siteclassificationtask.However,itremainsunclearhowtheseapproachesidentifymeaningfulDNAsequencesignalsandgiveinsightsastowhyTFsbindtocertainlocations.Inthispaper,weproposeatoolkitcalledtheDeepMotifDashboard(DeMoDashboard)whichprovidesasuiteofvisualizationstrategiestoextractmotifs,orsequencepatternsfromdeepneuralnetworkmodelsforTFBSclassification.WedemonstratehowtovisualizeandunderstandthreeimportantDNNmodels:convolutional,recurrent,andconvolutional-recurrentnetworks.Ourfirstvisualizationmethodisfindingatestsequence'ssaliencymapwhichusesfirst-orderderivativestodescribetheimportanceofeachnucleotideinmakingthefinalprediction.Second,consideringrecurrentmodelsmakepredictionsinatemporalmanner(fromoneendofaTFBSsequencetotheother),weintroducetemporaloutputscores,indicatingthepredictionscoreofamodelovertimeforasequentialinput.Lastly,aclass-specificvisualizationstrategyfindstheoptimalinputsequenceforagivenTFBSpositiveclassviastochasticgradientoptimization.Ourexperimentalresultsindicatethataconvolutional-recurrentarchitectureperformsthebestamongthethreearchitectures.ThevisualizationtechniquesindicatethatCNN-RNNmakespredictionsbymodelingbothmotifsaswellasdependenciesamongthem.
22
PREDICTIVEMODELINGOFHOSPITALREADMISSIONRATESUSINGELECTRONICMEDICALRECORD-WIDEMACHINELEARNING:ACASE-STUDYUSINGMOUNT
SINAIHEARTFAILURECOHORT
KhaderShameer1,2,KippW.Johnson1,2,AlexandreYahi7,RiccardoMiotto1,2,LiLi1,2,DoranRicks3,JebakumarJebakaran4,PatriciaKovatch1,4,ParthoP.Sengupta5,AnnetineGelijns8,Alan
Moskovitz8,BruceDarrow5,DavidL.Reich6,AndrewKasarskis1,NicholasP.Tatonetti7,SeanPinney5,JoelT.Dudley1,2,8*
1DepartmentofGeneticsandGenomics,IcahnInstituteofGenomicsandMultiscaleBiology;2InstituteofNextGenerationHealthcare,MountSinaiHealthSystem,NY;3DecisionSupport,
MountSinaiHealthSystem,NY;4MountSinaiDataWarehouse,IcahnInstituteofGenomicsandMultiscaleBiology,NY;5ZenaandMichaelA.WienerCardiovascularInstitute,IcahnSchoolofMedicineatMountSinai,NY;6DepartmentofAnesthesiology,IcahnSchoolofMedicineatMountSinai,NY;7DepartmentsofBiomedicalInformatics,SystemsBiologyandMedicine,
ColumbiaUniversityMedicalCenter,NY;8PopulationHealthScienceandPolicy,MountSinaiHealthSystem,NY
*CorrespondingAuthor,Email:joel.dudley@mssm.eduKhaderShameerReductionofpreventablehospitalreadmissionsthatresultfromchronicoracuteconditionslikestroke,heartfailure,myocardialinfarctionandpneumoniaremainsasignificantchallengeforimprovingtheoutcomesanddecreasingthecostofhealthcaredeliveryintheUnitedStates.Patientreadmissionratesarerelativelyhighforconditionslikeheartfailure(HF)despitetheimplementationofhigh-qualityhealthcaredeliveryoperationguidelinescreatedbyregulatoryauthorities.Multiplepredictivemodelsarecurrentlyavailabletoevaluatepotential30-dayreadmissionratesofpatients.Mostofthesemodelsarehypothesisdrivenandrepetitivelyassessthepredictiveabilitiesofthesamesetofbiomarkersaspredictivefeatures.Inthismanuscript,wediscussourattempttodevelopadata-driven,electronic-medicalrecord-wide(EMR-wide)featureselectionapproachandsubsequentmachinelearningtopredictreadmissionprobabilities.Wehaveassessedalargerepertoireofvariablesfromelectronicmedicalrecordsofheartfailurepatientsinasinglecenter.Thecohortincluded1,068patientswith178patientswerereadmittedwithina30-dayinterval(16.66%readmissionrate).Atotalof4,205variableswereextractedfromEMRincludingdiagnosiscodes(n=1,763),medications(n=1,028),laboratorymeasurements(n=846),surgicalprocedures(n=564)andvitalsigns(n=4).WedesignedamultistepmodelingstrategyusingtheNaïveBayesalgorithm.Inthefirststep,wecreatedindividualmodelstoclassifythecases(readmitted)andcontrols(non-readmitted).Inthesecondstep,featurescontributingtopredictiveriskfromindependentmodelswerecombinedintoacompositemodelusingacorrelation-basedfeatureselection(CFS)method.Allmodelsweretrainedandtestedusinga5-foldcross-validationmethod,with70%ofthecohortusedfortrainingandtheremaining30%fortesting.ComparedtoexistingpredictivemodelsforHFreadmissionrates(AUCsintherangeof0.6-0.7),resultsfromourEMR-widepredictivemodel(AUC=0.78;Accuracy=83.19%)andphenome-widefeatureselectionstrategiesareencouragingandrevealtheutilityofsuchdata-drivenmachinelearning.Finetuningofthemodel,replicationusingmulti-centercohortsandprospectiveclinicaltrialtoevaluatetheclinicalutilitywouldhelptheadoptionofthemodelasaclinicaldecisionsystemforevaluatingreadmissionstatus.
23
METHODSFORCLUSTERINGTIMESERIESDATAACQUIREDFROMMOBILEHEALTHAPPS
NicoleTignor1,PeiWang1,NicholasGenes1,LindaRogers1,StevenG.Hershman2,ErickR.Scott1,MicolZweig1,Yu-FengYvonneChan1,EricE.Schadt1
1IcahnSchoolofMedicineatMountSinai,2LifeMapSolutions
NicoleTignorInourrecentAsthmaMobileHealthStudy(AMHS),thousandsofasthmapatientsacrossthecountrycontributedmedicaldatathroughtheiPhoneAsthmaHealthApponadailybasisforanextendedperiodoftime.Thecollecteddataincludeddailyself-reportedasthmasymptoms,symptomtriggers,andrealtimegeographiclocationinformation.TheAMHSisjustoneofmanystudiesoccurringinthecontextofnowmanythousandsofmobilehealthappsaimedatimprovingwellnessandbettermanagingchronicdiseaseconditions,leveragingthepassiveandactivecollectionofdatafrommobile,handheldsmartdevices.Theabilitytoidentifypatientgroupsorpatternsofsymptomsthatmightpredictadverseoutcomessuchasasthmaexacerbationsorhospitalizationsfromthesetypesoflarge,prospectivelycollecteddatasets,wouldbeofsignificantgeneralinterest.However,conventionalclusteringmethodscannotbeappliedtothesetypesoflongitudinallycollecteddata,especiallysurveydataactivelycollectedfromappusers,givenheterogeneouspatternsofmissingvaluesdueto:1)varyingsurveyresponseratesamongdifferentusers,2)varyingsurveyresponseratesovertimeofeachuser,and3)non-overlappingperiodsofenrollmentamongdifferentusers.Tohandlesuchcomplicatedmissingdatastructure,weproposedaprobabilityimputationmodeltoinfermissingdata.Wealsoemployedaconsensusclusteringstrategyintandemwiththemultipleimputationprocedure.Throughsimulationstudiesunderarangeofscenariosreflectingrealdataconditions,weidentifiedfavorableperformanceoftheproposedmethodoverotherstrategiesthatimputethemissingvaluethroughlow-rankmatrixcompletion.WhenapplyingtheproposednewmethodtostudyasthmatriggersandsymptomscollectedaspartoftheAMHS,weidentifiedseveralpatientgroupswithdistinctphenotypepatterns.Furthervalidationofthemethodsdescribedinthispapermightbeusedtoidentifyclinicallyimportantpatternsinlargedatasetswithcomplicatedmissingdatastructure,improvingtheabilitytousesuchdatasetstoidentifyat-riskpopulationsforpotentialintervention.
24
ANEWRELEVANCEESTIMATORFORTHECOMPILATIONANDVISUALIZATIONOFDISEASEPATTERNSANDPOTENTIALDRUGTARGETS
ModestvonKorff,TobiasFink,ThomasSander
ResearchInformationManagement,ActelionPharmaceuticalsLtd.
ModestvonKorffAnewcomputationalmethodispresentedtoextractdiseasepatternsfromheterogeneousandtext-baseddata.Forthisstudy,22millionPubMedrecordswereminedforco-occurrencesofgenenamesynonymsanddiseaseMeSHterms.TheresultingpublicationcountsweretransferredintoamatrixMdata.Inthismatrix,adiseasewasrepresentedbyarowandagenebyacolumn.Eachfieldinthematrixrepresentedthepublicationcountforaco-occurringdisease–genepair.AsecondmatrixwithidenticaldimensionsMrelevancewasderivedfromMdata.TocreateMrelevancethevaluesfromMdatawerenormalized.Thenormalizedvaluesweremultipliedbythecolumn-wisecalculatedGinicoefficient.Thismultiplicationresultedinarelevanceestimatorforeverygeneinrelationtoadisease.FromMrelevancethesimilaritiesbetweenallrowvectorswerecalculated.TheresultingsimilaritymatrixSrelevancerelated5,000diseasesbytherelevanceestimatorscalculatedfor15,000genes.Threediseaseswereanalyzedindetailforthevalidationofthediseasepatternsandtherelevantgenes.CytoscapewasusedtovisualizeandtoanalyzeMrelevanceandSrelevancetogetherwiththegenesanddiseases.Summarizingtheresults,itcanbestatedthattherelevanceestimatorintroducedherewasabletodetectvaliddiseasepatternsandtoidentifygenesthatencodedkeyproteinsandpotentialtargetsfordrugdiscoveryprojects.
25
DISCOVERYOFFUNCTIONALANDDISEASEPATHWAYSBYCOMMUNITYDETECTIONINPROTEIN-PROTEININTERACTIONNETWORKS
StephenJ.Wilson,AngelaD.Wilkins,Chih-HsuLin,RhonaldC.Lua,OlivierLichtarge
BaylorCollegeofMedicine
StephenWilsonAdvancesincellular,molecular,anddiseasebiologydependonthecomprehensivecharacterizationofgeneinteractionsandpathways.Traditionally,thesepathwaysarecuratedmanually,limitingtheirefficientannotationand,potentially,reinforcingfield-specificbias.Here,inordertotestobjectiveandautomatedidentificationoffunctionallycooperativegenes,wecomparedanovelalgorithmwiththreeestablishedmethodstosearchforcommunitieswithingeneinteractionnetworks.Communitiesidentifiedbythenovelapproachandbyoneoftheestablishedmethodoverlappedsignificantly(q<0.1)withcontrolpathways.Withrespecttodisease,thesecommunitieswerebiasedtogeneswithpathogenicvariantsinClinVar(p<<0.01),andoftengenesfromthesamecommunitywereco-expressed,includinginbreastcancers.Theinterestingsubsetofnovelcommunities,definedbypooroverlaptocontrolpathwaysalsocontainedco-expressedgenes,consistentwithapossiblefunctionalrole.Thisworkshowsthatcommunitydetectionbasedontopologicalfeaturesofnetworkssuggestsnew,biologicallymeaningfulgroupingsofgenesthat,inturn,pointtohealthanddiseaserelevanthypotheses.
26
PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
27
OPENINGTHEDOORTOTHELARGESCALEUSEOFCLINICALLABMEASURESFORASSOCIATIONTESTING:EXPLORINGDIFFERENTMETHODSFORDEFINING
PHENOTYPES
ChristopherR.Bauer,DanielLavage,JohnSnyder,JosephLeader,J.MatthewMahoney,SarahA.Pendergrass
GeisingerHealthSystem,UniversityofVermont
ChristopherBauerThepastdecadehasseenexponentialgrowthinthenumbersofsequencedandgenotypedindividualsandacorrespondingincreaseinourabilityofcollectandcataloguephenotypicdataforuseintheclinic.Wenowfacethechallengeofintegratingthesediversedatainnewwaysnewthatcanprovideusefuldiagnosticsandprecisemedicalinterventionsforindividualpatients.Oneofthefirststepsinthisprocessistoaccuratelymapthephenotypicconsequencesofthegeneticvariationinhumanpopulations.Themostcommonapproachforthisisthegenomewideassociationstudy(GWAS).Whilethistechniqueisrelativelysimpletoimplementforagivenphenotype,thechoiceofhowtodefineaphenotypeiscritical.ItisbecomingincreasinglycommonforeachindividualinaGWAScohorttohavealargeprofileofquantitativemeasures.Thestandardapproachistotestforassociationswithonemeasureatatime;however,therearemanyjustifiablewaystodefineasetofphenotypes,andthegeneticassociationsthatarerevealedwillvarybasedonthesedefinitions.Somephenotypesmayonlyshowasignificantgeneticassociationsignalwhenconsideredtogether,suchasthroughprinciplecomponentsanalysis(PCA).Combiningcorrelatedmeasuresmayincreasethepowertodetectassociationbyreducingthenoisepresentinindividualvariablesandreducethemultiplehypothesistestingburden.HereweshowthatPCAandk-meansclusteringaretwocomplimentarymethodsforidentifyingnovelgenotype-phenotyperelationshipswithinasetofquantitativehumantraitsderivedfromtheGeisingerHealthSystemelectronichealthrecord(EHR).Usingadiversesetofapproachesfordefiningphenotypemayyieldmoreinsightsintothegeneticarchitectureofcomplextraitsandthefindingspresentedherehighlightaclearneedforfurtherinvestigationintoothermethodsfordefiningthemostrelevantphenotypesinasetofvariables.AsthedataofEHRcontinuetogrow,addressingtheseissueswillbecomeincreasinglyimportantinoureffortstousegenomicdataeffectivelyinmedicine.
28
TEMPORALORDEROFDISEASEPAIRSAFFECTSSUBSEQUENTDISEASETRAJECTORIES:THECASEOFDIABETESANDSLEEPAPNEA
MetteBeck1,DavidWestergaard1,LeifGroop2,SorenBrunak1
1NovoNordiskFoundationCenterforProteinResearch;2LundUniversityDiabetesCentre,DepartmentofClinicalSciences
MetteBeckMoststudiesofdiseaseetiologiesfocusononediseaseonlyandnotthefullspectrumofmultimorbiditiesthatmanypatientshave.Somediseasepairshavesharedcausalorigins,othersrepresentcommonfollow-ondiseases,whileyetotherco-occurringdiseasesmaymanifestthemselvesinrandomorderofappearance.Wediscussthesedifferenttypesofdiseaseco-occurrences,andusethetwodiseases“sleepapnea”and“diabetes”toshowcasetheapproachwhichotherwisecanbeappliedtoanydiseasepair.WebenefitfromsevenmillionelectronicmedicalrecordscoveringtheentirepopulationofDenmarkformorethan20years.Sleepapneaisthemostcommonsleep-relatedbreathingdisorderandithaspreviouslybeenshowntobebidirectionallylinkedtodiabetes,meaningthateachdiseaseincreasestheriskofacquiringtheother.Weconfirmthatthereisnosignificanttemporalrelationship,asapproximatelyhalfofpatientswithbothdiseasesarediagnosedwithdiabetesfirst.However,wealsoshowthatpatientsdiagnosedwithdiabetesbeforesleepapneahaveahigherdiseaseburdencomparedtopatientsdiagnosedwithsleepapneabeforediabetes.Thestudyclearlydemonstratesthatitisnotonlythediagnosesinthepatient’sdiseasehistorythatareimportant,butalsothespecificorderinwhichthesediagnosisaregiventhatmattersintermsofoutcome.Wesuggestthatthisshouldbeconsideredforpatientstratification.
29
HUMANKINASESDISPLAYMUTATIONALHOTSPOTSATCOGNATEPOSITIONSWITHINCANCER
JonathanGallion,AngelaD.Wilkins,OlivierLichtarge
BaylorCollegeofMedicine
JonathanGallionThediscoveryofdrivergenesisamajorpursuitofcancergenomics,usuallybasedonobservingthesamemutationindifferentpatients.Buttheheterogeneityofcancerpathwaysplusthehighbackgroundmutationalfrequencyoftumorcellsoftencloudthedistinctionbetweenlessfrequentdriversandinnocentpassengermutations.Here,toovercomethesedisadvantages,wegroupedtogethermutationsfromclosekinaseparalogsunderthehypothesisthatcognatemutationsmayfunctionallyfavorcancercellsinsimilarways.Indeed,wefindthatkinaseparalogsoftenbearmutationstothesamesubstitutedaminoacidatthesamealignedpositionsandwithalargepredictedEvolutionaryAction.Functionally,thesehighEvolutionaryAction,non-randommutationsaffectknownkinasemotifs,butstrikingly,theydosodifferentlyamongdifferentkinasetypesandcancers,consistentwithdifferencesinselectivepressures.Takentogether,theseresultssuggestthatcancerpathwaysmayflexiblydistributeadependenceonagivenfunctionalmutationamongmultipleclosekinaseparalogs.Therecognitionofthis“mutationaldelocalization”ofcancerdriversamonggroupsofparalogsisanewphenomenathatmayhelpbetteridentifyrelevantmechanismsandthereforeeventuallyguidepersonalizedtherapy.
30
MUSE:AMULTI-LOCUSSAMPLING-BASEDEPISTASISALGORITHMFORQUANTITATIVEGENETICTRAITPREDICTION
DanHe,LaxmiParida
IBMThomasJ.WatsonResearchCenter
DanHeQuantitativegenetictraitpredictionbasedonhigh-densitygenotypingarraysplaysanimportantroleforplantandanimalbreeding,aswellasgeneticepidemiologysuchascomplexdiseases.Thepredictioncanbeveryhelpfultodevelopbreedingstrategiesandiscrucialtotranslatethefindingsingeneticstoprecisionmedicine.Epistasis,thephenomenawheretheSNPsinteractwitheachother,hasbeenstudiedextensivelyinGenomeWideAssociationStudies(GWAS)butreceivedrelativelylessattentionforquantitativegenetictraitprediction.Asthenumberofpossibleinteractionsisgenerallyextremelylarge,evenpairwiseinteractionsisverychallenging.Toourknowledge,thereisnosolidsolutionyettoutilizeepistasistoimprovegenetictraitprediction.Inthiswork,westudiedthemulti-locusepistasisproblemwheretheinteractionswithmorethantwoSNPsareconsidered.WedevelopedanefficientalgorithmMUSEtoimprovethegenetictraitpredictionwiththehelpofmulti-locusepistasis.MUSEissampling-basedandweproposedafewdifferentsamplingstrategies.OurexperimentsonrealdatashowedthatMUSEisnotonlyefficientbutalsoeffectivetoimprovethegenetictraitprediction.MUSEalsoachievedverysignificantimprovementsonarealplantdatasetaswellasarealhumandataset.
31
DIFFERENTIALPATHWAYDEPENDENCYDISCOVERYASSOCIATEDWITHDRUGRESPONSEACROSSCANCERCELLLINES
GilSpeyer1,DivyaMahendra1,HaiJ.Tran1,JeffKiefer1,StuartL.Schreiber2,PaulA.Clemons2,HarshilDhruv1,MichaelBerens1,SeungchanKim1
1TheTranslationalGenomicsResearchInstitute,2BroadInstituteofHarvardandMIT
SeungchanKimTheefforttopersonalizetreatmentplansforcancerpatientsinvolvestheidentificationofdrugtreatmentsthatcaneffectivelytargetthediseasewhileminimizingthelikelihoodofadversereactions.Inthisstudy,thegene-expressionprofileof810cancercelllinesandtheirresponsedatato368smallmoleculesfromtheCancerTherapeuticsResearchPortal(CTRP)areanalyzedtoidentifypathwayswithsignificantrewiringbetweengenes,ordifferentialgenedependency,betweensensitiveandnon-sensitivecelllines.Identifiedpathwaysandtheircorrespondingdifferentialdependencynetworksarefurtheranalyzedtodiscoveressentialityandspecificitymediatorsofcelllineresponsetodrugs/compounds.ForanalysisweusethepreviouslypublishedmethodEDDY(EvaluationofDifferentialDependencY).EDDYfirstconstructslikelihooddistributionsofgene-dependencynetworks,aidedbyknowngene-geneinteraction,fortwogivenconditions,forexample,sensitivecelllinesvs.non-sensitivecelllines.Thesesetsofnetworksyieldadivergencevaluebetweentwodistributionsofnetworklikelihoodsthatcanbeassessedforsignificanceusingpermutationtests.Resultingdifferentialdependencynetworkswerethenfurtheranalyzedtoidentifygenes,termedmediators,whichmayplayimportantrolesinbiologicalsignalingincertaincelllinesthataresensitiveornon-sensitivetothedrugs.Establishingstatisticalcorrespondencebetweencompoundsandmediatorscanimproveunderstandingofknowngenedependenciesassociatedwithdrugresponsewhilealsodiscoveringnewdependencies.Millionsofcomputehoursresultedinthousandsofthesestatisticaldiscoveries.EDDYidentified8,811statisticallysignificantpathwaysleadingto26,822compound-pathway-mediatortriplets.ByincorporatingSTITCHandSTRINGdatabases,wecouldconstructevidencenetworksfor14,415compound-pathway-mediatortripletsforsupport.Theresultsofthisanalysisarepresentedinasearchablewebsitetoaidresearchersinstudyingpotentialmolecularmechanismsunderlyingcells’drugresponseaswellasindesigningexperimentsforthepurposeofpersonalizedtreatmentregimens.
32
AMETHYLATION-TO-EXPRESSIONFEATUREMODELFORGENERATINGACCURATEPROGNOSTICRISKSCORESANDIDENTIFYINGDISEASETARGETSIN
CLEARCELLKIDNEYCANCER
JeffreyA.Thompson1,CarmenJ.Marsit2
1DartmouthCollege,2EmoryUniversity
JeffreyThompsonManyresearchersnowhaveavailablemultiplehigh-dimensionalmolecularandclinicaldatasetswhenstudyingadisease.Asweenterthismulti-omiceraofdataanalysis,newapproachesthatcombinedifferentlevelsofdata(e.g.atthegenomicandepigenomiclevels)arerequiredtofullycapitalizeonthisopportunity.Inthiswork,weoutlineanewapproachtomulti-omicdataintegration,whichcombinesmolecularandclinicalpredictorsaspartofasingleanalysistocreateaprognosticriskscoreforclearcellrenalcellcarcinoma.Theapproachintegratesdatainmultiplewaysandyetcreatesmodelsthatarerelativelystraightforwardtointerpretandwithahighlevelofperformance.Furthermore,theproposedprocessofdataintegrationcapturesrelationshipsinthedatathatrepresenthighlydisease-relevantfunctions.
33
DENOVOMUTATIONSINAUTISMIMPLICATETHESYNAPTICELIMINATIONNETWORK
GuhanRamVenkataraman1,ChloeO'Connell1,FumikoEgawa2,DornaKashef-Haghighi1,DennisPaulWall1
1StanfordUniversity,2St.George'sUniversity
FumikoEgawaAutismhasbeenshowntohaveamajorgeneticriskcomponent;thearchitectureofdocumentedautisminfamilieshasbeenoverandagainshowntobepasseddownforgenerations.Whileinheritedriskplaysanimportantroleintheautisticnatureofchildren,denovo(germline)mutationshavealsobeenimplicatedinautismrisk.HerewefindthatautismdenovovariantsverifiedandpublishedintheliteratureareBonferroni-significantlyenrichedinagenesetimplicatedinsynapticelimination.Additionally,severalofthegenesinthissynapticeliminationsetthatwereenrichedinprotein-proteininteractions(CACNA1C,SHANK2,SYNGAP1,NLGN3,NRXN1,andPTEN)havebeenpreviouslyconfirmedasgenesthatconferriskforthedisorder.Theresultsdemonstratethatautism-associateddenovosarelinkedtopropersynapticpruninganddensity,hintingattheetiologyofautismandsuggestingpathophysiologyfordownstreamcorrectionandtreatment.
34
IDENTIFYINGGENETICASSOCIATIONSWITHVARIABILITYINMETABOLICHEALTHANDBLOODCOUNTLABORATORYVALUES:DIVINGINTOTHE
QUANTITATIVETRAITSBYLEVERAGINGLONGITUDINALDATAFROMANEHR
ShefaliS.Verma1,AnastasiaM.Lucas1,DanielR.Lavage1,JosephB.Leader1,RaghuMetpally2,SarathbabuKrishnamurthy1,FrederickDewey1,IngridBorecki1,AlexanderLopez3,JohnOverton3,
JohnPenn3,JeffreyReid3,SarahA.Pendergrass1,GerdaBreitwieser2,MarylynD.Ritchie1
1DepartmentofBiomedicalandTranslationalInformatics,GeisingerHealthSystem,Danville,PA;2DepartmentofFunctionalandMolecularGenomics,GeisingerHealthSystem,Danville,PA;
3RegeneronGeneticsCenter,Tarrytown,NYShefaliSetiaVermaAwiderangeofpatienthealthdataisrecordedinElectronicHealthRecords(EHR).Thisdataincludesdiagnosis,surgicalprocedures,clinicallaboratorymeasurements,andmedicationinformation.Togetherthisinformationreflectsthepatient’smedicalhistory.ManystudieshaveefficientlyusedthisdatafromtheEHRtofindassociationsthatareclinicallyrelevant,eitherbyutilizingInternationalClassificationofDiseases,version9(ICD-9)codesorlaboratorymeasurements,orbydesigningphenotypealgorithmstoextractcaseandcontrolstatuswithaccuracyfromtheEHR.HerewedevelopedastrategytoutilizelongitudinalquantitativetraitdatafromtheEHRatGeisingerHealthSystemfocusingonoutpatientmetabolicandcompletebloodpaneldataasastartingpoint.ComprehensiveMetabolicPanel(CMP)aswellasCompleteBloodCounts(CBC)arepartsofroutinecareandprovideacomprehensivepicturefromhighlevelscreeningofpatients’overallhealthanddisease.Werandomlysplitourdataintotwodatasetstoallowfordiscoveryandreplication.Wefirstconductedagenome-wideassociationstudy(GWAS)withmedianvaluesof25differentclinicallaboratorymeasurementstoidentifyvariantsfromHumanOmniExpressExomebeadchipdatathatareassociatedwiththesemeasurements.Weidentified687variantsthatassociatedandreplicatedwiththetestedclinicalmeasurementsatp<5x10-08.SincelongitudinaldatafromtheEHRprovidesarecordofapatient’smedicalhistory,weutilizedthisinformationtofurtherinvestigatetheICD-9codesthatmightbeassociatedwithdifferencesinvariabilityofthemeasurementsinthelongitudinaldataset.WeidentifiedlowandhighvariancepatientsbylookingatchangeswithintheirindividuallongitudinalEHRlaboratoryresultsforeachofthe25clinicallabvalues(thuscreating50groups–ahighvarianceandalowvarianceforeachlabvariable).WethenperformedaPheWASanalysiswithICD-9diagnosiscodes,separatelyinthehighvariancegroupandthelowvariancegroupforeachlabvariable.Wefound717PheWASassociationsthatreplicatedatap-valuelessthan0.001.Next,weevaluatedtheresultsofthisstudybycomparingtheassociationresultsbetweenthehighandlowvariancegroups.Forexample,wefound39SNPs(inmultiplegenes)associatedwithICD-9250.01(Type-Idiabetes)inpatientswithhighvarianceofplasmaglucoselevels,butnotinpatientswithlowvarianceinplasmaglucoselevels.Anotherexampleistheassociationof4SNPsinUMODwithchronickidneydiseaseinpatientswithhighvarianceforaspartateaminotransferase(discoveryp-value:8.71x10-09andreplicationp-value:2.03x10-06).Ingeneral,weseeapatternofmanymore statisticallysignificantassociationsfrompatientswithhighvarianceinthequantitativelabvariables, incomparisonwiththelowvariancegroupacrossallofthe25laboratorymeasurements.Thisstudy isoneofthefirstofitskindtoutilizequantitativetraitvariancefromlongitudinallaboratorydatato findassociationsamonggeneticvariantsandclinicalphenotypesobtainedfromanEHR,integrating laboratoryvaluesanddiagnosiscodestounderstandthegeneticcomplexitiesofcommondiseases.
35
STRATEGIESFOREQUITABLEPHARMACOGENOMIC-GUIDEDWARFARINDOSINGAMONGEUROPEANANDAFRICANAMERICANINDIVIDUALSINACLINICAL
POPULATION
LauraWiley1,JacobVanHouten2,DavidSamuels2,MelindaAldrich3,DanRoden2,JoshPeterson2,JoshuaDenny2
1UniversityofColorado,2VanderbiltUniversity,3VanderbiltUniversityMedicalCenter
LauraWileyThebloodthinnerwarfarinhasanarrowtherapeuticrangeandhighinter-andintra-patientvariabilityintherapeuticdoses.Severalstudieshaveshownthatpharmacogenomicvariantshelppredictstablewarfarindosing.However,retrospectiveandrandomizedcontrolledtrialsthatemploydosingalgorithmsincorporatingpharmacogenomicvariantsunderperforminAfricanAmericans.Thisstudysoughttodetermineif:1)includingadditionalvariantsassociatedwithwarfarindoseinAfricanAmericans,2)predictingwithinsingleancestrygroupsratherthanacombinedpopulation,or3)usingpercentageAfricanancestryratherthanobservedrace,wouldimprovewarfarindosingalgorithmsinAfricanAmericans.UsingBioVU,theVanderbiltUniversityMedicalCenterbiobanklinkedtoelectronicmedicalrecords,wecompared25modelingstrategiestoexistingalgorithmsusingacohortof2,181warfarinusers(1,928whites,253blacks).Wefoundthatapproachesincorporatingadditionalvariantsincreasedmodelaccuracy,butnotinclinicallysignificantways.RacestratificationincreasedmodelfidelityforAfricanAmericans,buttheimprovementwassmallandnotlikelytobeclinicallysignificant.UseofpercentAfricanancestryimprovedmodelfitinthecontextofracemisclassification.
36
SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
37
PRODUCTIONOFAPRELIMINARYQUALITYCONTROLPIPELINEFORSINGLENUCLEIRNA-SEQANDITSAPPLICATIONINTHEANALYSISOFCELLTYPE
DIVERSITYOFPOST-MORTEMHUMANBRAINNEOCORTEX
BrianAevermann1,JamisonMcCorrison1,PratapVenepally1,RebeccaHodge2,TrygveBakken2,JeremyMiller2,MarkNovotny1,DannyN.Tran1,FranciscoDiez-Fuertes3,LenaChristiansen4,FanZhang4,FrankSteemers4,RogerS.Lasken1,EdLein2,NicholasSchork1,
RichardH.Scheuermann1
1J.CraigVenterInstitute,2AllenInstituteforBrainScience,3InstitutodeSaludCarlosIII,4Illumina,Inc.
RichardScheuermannNextgenerationsequencingoftheRNAcontentofsinglecellsorsinglenuclei(sc/nRNA-seq)hasbecomeapowerfulapproachtounderstandthecellularcomplexityanddiversityofmulticellularorganismsandenvironmentalecosystems.However,thefactthattheprocedurebeginswitharelativelysmallamountofstartingmaterial,therebypushingthelimitsofthelaboratoryproceduresrequired,dictatesthatcarefulapproachesforsamplequalitycontrol(QC)areessentialtoreducetheimpactoftechnicalnoiseandsamplebiasindownstreamanalysisapplications.HerewepresentapreliminaryframeworkforsamplelevelqualitycontrolthatisbasedonthecollectionofaseriesofquantitativelaboratoryanddatametricsthatareusedasfeaturesfortheconstructionofQCclassificationmodelsusingrandomforestmachinelearningapproaches.We’veappliedthisinitialframeworktoadatasetcomprisedof2272singlenucleiRNA-seqresultsanddeterminedthat~79%ofsampleswereofhighquality.Removalofthepoorqualitysamplesfromdownstreamanalysiswasfoundtoimprovethecelltypeclusteringresults.Inaddition,thisapproachidentifiedquantitativefeaturesrelatedtotheproportionofuniqueorduplicatereadsandtheproportionofreadsremainingafterqualitytrimmingasusefulfeaturesforpass/failclassification.Theconstructionanduseofclassificationmodelsfortheidentificationofpoorqualitysamplesprovidesforanobjectiveandscalableapproachtosc/nRNA-seqqualitycontrol.
38
TRACINGCO-REGULATORYNETWORKDYNAMICSINNOISY,SINGLE-CELLTRANSCRIPTOMETRAJECTORIES
PabloCordero,JoshuaM.Stuart
UCSantaCruzGenomicsInstitute,UniversityofCalifornia,SantaCruz
PabloCorderoTheavailabilityofgeneexpressiondataatthesinglecelllevelmakesitpossibletoprobethemolecularunderpinningsofcomplexbiologicalprocessessuchasdifferentiationandoncogenesis.Promisingnewmethodshaveemergedforreconstructingaprogression'trajectory'fromstaticsingle-celltranscriptomemeasurements.However,itremainsunclearhowtoadequatelymodeltheappreciablelevelofnoiseinthesedatatoelucidategeneregulatorynetworkrewiring.Here,wepresentaframeworkcalledSingleCellInferenceofMorphIngTrajectoriesandtheirAssociatedRegulation(SCIMITAR)thatinfersprogressionsfromstaticsingle-celltranscriptomesbyemployingacontinuousparametrizationofGaussianmixturesinhigh-dimensionalcurves.SCIMITARyieldsrichmodelsfromthedatathathighlightgeneswithexpressionandco-expressionpatternsthatareassociatedwiththeinferredprogression.Further,SCIMITARextractsregulatorystatesfromtheimplicatedtrajectory-evolvingco-expressionnetworks.Webenchmarkthemethodonsimulateddatatoshowthatityieldsaccuratecellorderingandgenenetworkinferences.Appliedtotheinterpretationofasingle-cellhumanfetalneurondataset,SCIMITARfindsprogression-associatedgenesincornerstoneneuraldifferentiationpathwaysmissedbystandarddifferentialexpressiontests.Finally,byleveragingtherewiringofgene-geneco-expressionrelationsacrosstheprogression,themethodrevealstheriseandfallofco-regulatorystatesandtrajectory-dependentgenemodules.Theseanalysesimplicatenewtranscriptionfactorsinneuraldifferentiationincludingputativeco-factorsforthemulti-functionalNFATpathway.
39
ANUPDATEDDEBARCODINGTOOLFORMASSCYTOMETRYWITHCELLTYPE-SPECIFICANDCELLSAMPLE-SPECIFICSTRINGENCYADJUSTMENT
KristinI.Fread1,WilliamD.Strickland2,GarryP.Nolan3,EliR.Zunder1
1DepartmentofBiomedicalEngineering,UniversityofVirginia;2DepartmentofBiomedicalSciences,UniversityofVirginia;3DepartmentofMicrobiologyand
Immunology,StanfordUniversity
EliZunderPooledsampleanalysisbymasscytometrybarcodingcarriesmanyadvantages:reducedantibodyconsumption,increasedsamplethroughput,removalofcelldoublets,reductionofcross-contaminationbysamplecarryover,andtheeliminationoftube-to-tube-variabilityinantibodystaining.Asingle-celldebarcodingalgorithmwaspreviouslydevelopedtoimprovetheaccuracyandyieldofsampledeconvolution,butthismethodwaslimitedtousingfixedparametersfordebarcodingstringencyfiltering,whichcouldintroducecell-specificorsample-specificbiastocellyieldinscenarioswherebarcodestainingintensityandvariancearenotuniformacrossthepooledsamples.Toaddressthisissue,wehaveupdatedthealgorithmtooutputdebarcodingparametersforeverycellinthesample-assignedFCSfiles,whichallowsforvisualizationandanalysisoftheseparametersviaflowcytometryanalysissoftware.Thisstrategycanbeusedtodetectcelltype-specificandsample-specificeffectsontheunderlyingcelldatathatariseduringthedebarcodingprocess.Anadditionalbenefittothisstrategyisthedecouplingofbarcodestringencyfilteringfromthedebarcodingandsampleassignmentprocess.Thisisaccomplishedbyremovingthestringencyfiltersduringsampleassignment,andthenfilteringafterthefactwith1-and2-dimensionalgatingonthedebarcodingparameterswhichareoutputwiththeFCSfiles.Thesedataexplorationstrategiesserveasanimportantqualitycheckforbarcodedmasscytometrydatasets,andallowcelltypeandsample-specificstringencyadjustmentthatcanremovebiasincellyieldintroducedduringthedebarcodingprocess.
40
IMAGINGGENOMICS
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
41
ADAPTIVETESTINGOFSNP-BRAINFUNCTIONALCONNECTIVITYASSOCIATIONVIAAMODULARNETWORKANALYSIS
ChenGao,JunghiKim,WeiPan
DivisionofBiostatistics,SchoolofPublicHealth,UniversityofMinnesota
WeiPanDuetoitshighdimensionalityandhighnoiselevels,analysisofalargebrainfunctionalnetworkmaynotbepowerfulandeasytointerpret;instead,decompositionofalargenetworkintosmallersubcomponentscalledmodulesmaybemorepromisingassuggestedbysomeempiricalevidence.Forexample,alterationofbrainmodularityisobservedinpatientssufferingfromvarioustypesofbrainmalfunctions.Althoughseveralmethodsexistforestimatingbrainfunctionalnetworks,suchasthesamplecorrelationmatrixorgraphicallassoforasparseprecisionmatrix,itisstilldifficulttoextractmodulesfromsuchnetworkestimates.Motivatedbytheseconsiderations,weadaptaweightedgeneco-expressionnetworkanalysis(WGCNA)frameworktoresting-statefMRI(rs-fMRI)datatoidentifymodularstructuresinbrainfunctionalnetworks.Modularstructuresareidentifiedbyusingtopologicaloverlapmatrix(TOM)elementsinhierarchicalclustering.Weproposeapplyinganewadaptivetestbuiltontheproportionaloddsmodel(POM)thatcanbeappliedtoahigh-dimensionalsetting,wherethenumberofvariables(p)canexceedthesamplesize(n)inadditiontotheusualp<nsetting.WeappliedourproposedmethodstotheADNIdatatotestforassociationsbetweenageneticvariantandeitherthewholebrainfunctionalnetworkoritsvarioussubcomponentsusingvariousconnectivitymeasures.Weuncoveredseveralmodulesbasedonthecontrolcohort,andsomeofthemweremarginallyassociatedwiththeAPOE4variantandseveralotherSNPs;however,duetothesmallsamplesizeoftheADNIdata,largerstudiesareneeded.
42
EXPLORINGBRAINTRANSCRIPTOMICPATTERNS:ATOPOLOGICALANALYSISUSINGSPATIALEXPRESSIONNETWORKS
ZhanaKuncheva1,MichelleL.Krishnan2,GiovanniMontana2
1ImperialCollegeLondon,2King'sCollegeLondon
ZhanaKunchevaCharacterizingthetranscriptomearchitectureofthehumanbrainisfundamentalingaininganunderstandingofbrainfunctionanddisease.AnumberofrecentstudieshaveinvestigatedpatternsofbraingeneexpressionobtainedfromanextensiveanatomicalcoverageacrosstheentirehumanbrainusingexperimentaldatageneratedbytheAllenHumanBrainAtlas(AHBA)project.Inthispaper,weproposeanewrepresentationofagene'stranscriptionactivitythatexplicitlycapturesthepatternofspatialco-expressionacrossdifferentanatomicalbrainregions.Foreachgene,wedefineaSpatialExpressionNetwork(SEN),anetworkquantifyingco-expressionpatternsamongstseveralanatomicallocations.NetworksimilaritymeasuresarethenemployedtoquantifythetopologicalresemblancebetweenpairsofSENsandidentifynaturallyoccurringclusters.Usingnetwork-theoreticalmeasures,threelargeclustershavebeendetectedfeaturingdistincttopologicalproperties.WethenevaluatewhethertopologicaldiversityoftheSENsreflectssignificantdifferencesinbiologicalfunctionthroughageneontologyanalysis.WereportonevidencesuggestingthatoneofthethreeSENclustersconsistsofgenesspecificallyinvolvedinthenervoussystem,includinggenesrelatedtobraindisorders,whiletheremainingtwoclustersarerepresentativeofimmunity,transcriptionandtranslation.Thesefindingsareconsistentwithpreviousstudiesshowingthatbraingeneclustersaregenerallyassociatedwithoneofthesethreemajorbiologicalprocesses.
43
PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM?
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
44
ADEEPLEARNINGAPPROACHFORCANCERDETECTIONANDRELEVANTGENEIDENTIFICATION
PadidehDanaee,RezaGhaeini,DavidHendrix
OregonStateUniversity
PadidehDaneeCancerdetectionfromgeneexpressiondatacontinuestoposeachallengeduetothehighdimensionalityandcomplexityofthesedata.Afterdecadesofresearchthereisstilluncertaintyintheclinicaldiagnosisofcancerandtheidentificationoftumor-specificmarkers.Herewepresentadeeplearningapproachtocancerdetection,andtotheidentificationofgenescriticalforthediagnosisofbreastcancer.First,weusedStackedDenoisingAutoencoder(SDAE)todeeplyextractfunctionalfeaturesfromhighdimensionalgeneexpressionprofiles.Next,weevaluatedtheperformanceoftheextractedrepresentationthroughsupervisedclassificationmodelstoverifytheusefulnessofthenewfeaturesincancerdetection.Lastly,weidentifiedasetofhighlyinteractivegenesbyanalyzingtheSDAEconnectivitymatrices.Ourresultsandanalysisillustratethatthesehighlyinteractivegenescouldbeusefulcancerbiomarkersforthedetectionofbreastcancerthatdeservefurtherstudies.
45
GENOME-WIDEINTERACTIONWITHSELECTEDTYPE2DIABETESLOCIREVEALSNOVELLOCIFORTYPE2DIABETESINAFRICANAMERICANS
JacobM.Keaton1,JacklynN.Hellwege1,MaggieC.Y.Ng1,NicholetteD.Palmer1,JamesS.Pankow2,MyriamFornage3,JamesG.Wilson4,AdolofoCorrea4,LauraJ.Rasmussen-Torvik5,JeromeI.Rotter6,Yii-DerI.Chen6,KentD.Taylor6,StephenS.Rich7,LynneE.
Wagenknecht1,BarryI.Freedman1,DonaldW.Bowden1
1WakeForestSchoolofMedicine,2UniversityofMinnesota,3UniversityofTexasHealthScienceCenteratHouston,4UniversityofMississippiMedicalCenter,5NorthwesternUniversityFeinbergSchoolofMedicine,6Harbor-UCLAMedicalCenter,7Universityof
Virginia
JacobKeatonType2diabetes(T2D)istheresultofmetabolicdefectsininsulinsecretionandinsulinsensitivity,yetmostT2Dlociidentifiedtodateinfluenceinsulinsecretion.WehypothesizedthatT2Dloci,particularlythoseaffectinginsulinsensitivity,canbeidentifiedthroughinteractionwithknownT2Dlociimplicatedininsulinsecretion.Totestthishypothesis,singlenucleotidepolymorphisms(SNPs)nominallyassociatedwithacuteinsulinresponsetoglucose(AIRg),adynamicmeasureoffirst-phaseinsulinsecretion,andpreviouslyassociatedwithT2Dingenome-wideassociationstudies(GWAS)wereidentifiedinAfricanAmericansfromtheInsulinResistanceAtherosclerosisFamilyStudy(IRASFS;n=492subjects).TheseSNPsweretestedforinteraction,individuallyandjointlyasageneticriskscore(GRS),usingGWASdatafromfivecohorts(ARIC,CARDIA,JHS,MESA,WFSM;n=2,725cases,4,167controls)withT2Dastheoutcome.Insinglevariantanalyses,suggestivelysignificant(Pinteraction<5x10-6)interactionswereobservedatseverallociincludingDGKB(rs978989),CDK18(rs12126276),CXCL12(rs7921850),HCN1(rs6895191),FAM98A(rs1900780),andMGMT(rs568530).Notablebeta-cellGRSinteractionsincludedtwoSNPsattheDGKBlocus(rs6976381;rs6962498).ThesedatasupportthehypothesisthatadditionalgeneticfactorscontributingtoT2Driskcanbeidentifiedbyinteractionswithinsulinsecretionloci.
46
META-ANALYSISOFCONTINUOUSPHENOTYPESIDENTIFIESAGENESIGNATURETHATCORRELATESWITHCOPDDISEASESTATUS
MadeleineScott1,FrancescoVallania2,PurveshKhatri3
1StanfordMedicalSchool,StanfordUniversity,Stanford,California;2StanfordInstituteforImmunity,Transplantation,andInfection,StanfordUniversity,Stanford,California;3StanfordCenterforBiomedicalInformaticsResearch,StanfordUniversity,Stanford,
California
PurveshKhatriTheutilityofmulti-cohorttwo-classmeta-analysistoidentifyrobustdifferentiallyexpressedgenesignatureshasbeenwellestablished.However,manybiomedicalapplications,suchasgenesignaturesofdiseaseprogression,requireone-classanalysis.HerewedescribeanRpackage,MetaCorrelator,thatcanidentifyareproducibletranscriptionalsignaturethatiscorrelatedwithacontinuousdiseasephenotypeacrossmultipledatasets.Wesuccessfullyappliedthisframeworktoextractapatternofgeneexpressionthatcanpredictlungfunctioninpatientswithchronicobstructivepulmonarydisease(COPD)inbothperipheralbloodmononuclearcells(PBMCs)andtissue.OurresultspointtoadisregulationintheoxidationstateofthelungsofpatientswithCOPD,aswellasunderscoretheclassicallyrecognizedinflammatorystatethatunderliesthisdisease.
47
LEARNINGPARSIMONIOUSENSEMBLESFORUNBALANCEDCOMPUTATIONALGENOMICSPROBLEMS
AnaStanescu,GauravPandey
IcahnSchoolofMedicineatMountSinai
GauravPandeyPredictionproblemsinbiomedicalsciencesaregenerallyquitedifficult,partiallyduetoincompleteknowledgeofhowthephenomenonofinterestisinfluencedbythevariablesandmeasurementsusedforprediction,aswellasalackofconsensusregardingtheidealpredictor(s)forspecificproblems.Inthesesituations,apowerfulapproachtoimprovingpredictionperformanceistoconstructensemblesthatcombinetheoutputsofmanyindividualbasepredictors,whichhavebeensuccessfulformanybiomedicalpredictiontasks.Moreover,selectinga{\itparsimonious}ensemblecanbeofevengreatervalueforbiomedicalsciences,whereitisnotonlyimportanttolearnanaccuratepredictor,butalsotointerpretwhatnovelknowledgeitcanprovideaboutthetargetproblem.Ensembleselectionisapromisingapproachforthistaskbecauseofitsabilitytoselectacollectivelypredictivesubset,oftenarelativelysmallone,ofallinputbasepredictors.Oneofthemostwell-knownalgorithmsforensembleselection,CES(Caruana{\itetal.}'sEnsembleSelection),generallyperformswellinpractice,butfacesseveralchallengesduetothedifficultyofchoosingtherightvaluesofitsvariousparameters.Sincethechoicesmadefortheseparametersareusuallyad-hoc,goodperformanceofCESisdifficulttoguaranteeforavarietyofproblemsordatasets.ToaddressthesechallengeswithCESandothersuchalgorithms,weproposeanovelheterogeneousensembleselectionapproachbasedontheparadigmofreinforcementlearning(RL),whichoffersamoresystematicandmathematicallysoundmethodologyforexploringthemanypossiblecombinationsofbasepredictorsthatcanbeselectedintoanensemble.WedevelopthreeRL-basedstrategiesforconstructingensemblesandanalyzetheirresultsontwounbalancedcomputationalgenomicsproblems,namelythepredictionofproteinfunctionandsplicesitesineukaryoticgenomes.Weshowthattheresultantensemblesareindeedsubstantiallymoreparsimoniousascomparedtothefullsetofbasepredictors,yetstillofferalmostthesameclassificationpower,especiallyforlargerdatasets.TheRLensemblesalsoyieldabettercombinationofparsimonyandpredictiveperformanceascomparedtoCES.
48
NETWORKMAPOFADVERSEHEALTHEFFECTSAMONGVICTIMSOFINTIMATEPARTNERVIOLENCE
KathleenWhiting1,LarryY.Liu2,MehmetKoyutürk2,GunnurKarakurt2
1UniformedServicesUniversity,2CaseWesternReserveUniversity
GunnurKarakurtIntimatepartnerviolence(IPV)isaseriousproblemwithdevastatinghealthconsequences.ScreeningproceduresmayoverlookrelationshipsbetweenIPVandnegativehealtheffects.ToidentifyIPV-associatedwomen’shealthissues,weminednational,aggregatedde-identifiedelectronichealthrecorddataandcomparedfemalehealthissuesofdomesticabuse(DA)versusnon-DArecords,identifyingtermssignificantlymorefrequentfortheDAgroup.Aftercodingthesetermsinto28broadcategories,wedevelopedanetworkmaptodeterminestrengthofrelationshipsbetweencategoriesinthecontextofDA,findingthatacuteconditionsarestronglyconnectedtocardiovascular,gastrointestinal,gynecological,andneurologicalconditionsamongvictims.
49
PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
50
APOWERFULMETHODFORINCLUDINGGENOTYPEUNCERTAINTYINTESTSOFHARDY-WEINBERGEQUILIBRIUM
AndrewBeck1,AlexanderLuedtke2,KeliLiu3,NathanTintle4
1UniversityofMichigan,2UniversityofCalifornia-Berkeley,3HarvardUniversity,4DordtCollege
NathanTintleTheuseofposteriorprobabilitiestosummarizegenotypeuncertaintyispervasiveacrossgenotype,sequencingandimputationplatforms.Priorworkinmanycontextshasshowntheutilityofincorporatinggenotypeuncertainty(posteriorprobabilities)indownstreamstatisticaltests.TypicalapproachestoincorporatinggenotypeuncertaintywhentestingHardy-WeinbergequilibriumtendtolackcalibrationinthetypeIerrorrate,especiallyasgenotypeuncertaintyincreases.WeproposeanewapproachinthespiritofgenomiccontrolthatproperlycalibratesthetypeIerrorrate,whileyieldingimprovedpowertodetectdeviationsfromHardy-WeinbergEquilibrium.Wedemonstratetheimprovedperformanceofourmethodonbothsimulatedandrealgenotypes.
51
MICRORNA-AUGMENTEDPATHWAYS(MIRAP)ANDTHEIRAPPLICATIONSTOPATHWAYANALYSISANDDISEASESUBTYPING
DianaDiaz1,MicheleDonato2,TinNguyen1,SorinDraghici1
1WayneStateUniversity,2StanfordUniversityMedicalCenter
SorinDraghiciMicroRNAsplayimportantrolesinthedevelopmentofmanycomplexdiseases.Becauseoftheirimportance,theanalysisofsignalingpathwaysincludingmiRNAinteractionsholdsthepotentialforunveilingthemechanismsunderlyingsuchdiseases.However,currentsignalingpathwaydatabasesarelimitedtointeractionsbetweengenesandignoremiRNAs.Here,weusetheinformationonmiRNAtargetstobuildadatabaseofmiRNA-augmentedpathways(mirAP),andweshowitsapplicationinthecontextsofintegrativepathwayanalysisanddiseasesubtyping.OurmiRNA-mRNAintegrativepathwayanalysispipelineincorporatesatopology-awareapproachthatwepreviouslyimplemented.Ourintegrativediseasesubtypingpipelinetakesintoaccountsurvivaldata,geneandmiRNAexpression,andknowledgeoftheinteractionsamonggenes.Wedemonstratetheadvantagesofourapproachbyanalyzingninesample-matcheddatasetsthatprovidebothmiRNAandmRNAexpression.WeshowthatintegratingmiRNAsintopathwayanalysisresultsingreaterstatisticalpower,andprovidesamorecomprehensiveviewoftheunderlyingphenomena.Wealsocompareourdiseasesubtypingmethodwiththestate-of-the-artintegrativeanalysisbyanalyzingacolorectalcancerdatabasefromTCGA.Thecolorectalcancersubtypesidentifiedbyourapproacharesignificantlydifferentintermsoftheirsurvivalexpectation.ThesemiRNA-augmentedpathwaysofferamorecomprehensiveviewandadeeperunderstandingofbiologicalpathways.Abetterunderstandingofthemolecularprocessesassociatedwithpatients'survivalcanhelptoabetterprognosisandanappropriatetreatmentforeachsubtype.
52
FREQUENTSUBGRAPHMININGOFPERSONALIZEDSIGNALINGPATHWAYNETWORKSGROUPSPATIENTSWITHFREQUENTLYDYSREGULATEDDISEASE
PATHWAYSANDPREDICTSPROGNOSIS
ArdaDurmaz,TimA.D.Henderson,DouglasBrubaker,GurkanBebek
CaseWesternReserveUniversity
GurkanBebekMotivation:Largescalegenomicsstudieshavegeneratedcomprehensivemolecularcharacterizationofnumerouscancertypes.Subtypesformanytumortypeshavebeenestablished;however,theseclassificationsarebasedonmolecularcharacteristicsofasmallgenesetswithlimitedpowertodetectdysregulationatthepatientlevel.Wehypothesizethatfrequentgraphminingofpathwaystogatherpathwaysfunctionallyrelevanttotumorscancharacterizetumortypesandprovideopportunitiesforpersonalizedtherapies.Results:Inthisstudywepresentanintegrativeomicsapproachtogrouppatientsbasedontheiralteredpathwaycharacteristicsandshowprognosticdifferenceswithinbreastcancer(p<9.57E−10)andglioblastomamultiforme(p<0.05)patients.WewereablevalidatethisapproachinsecondaryRNA-Seqdatasetswithp<0.05andp<0.01respectively.Wealsoperformedpathwayenrichmentanalysistofurtherinvestigatethebiologicalrelevanceofdysregulatedpathways.Wecomparedourapproachwithnetwork-basedclassifieralgorithmsandshowedthatourunsupervisedapproachgeneratesmorerobustandbiologicallyrelevantclusteringwhereaspreviousapproachesfailedtoreportspecificfunctionsforsimilarpatientgroupsorclassifypatientsintoprognosticgroups.Conclusions:Theseresultscouldserveasameanstoimproveprognosisforfuturecancerpatients,andtoprovideopportunitiesforimprovedtreatmentoptionsandpersonalizedinterventions.TheproposednovelgraphminingapproachisabletointegratePPInetworkswithgeneexpressioninabiologicallysoundapproachandclusterpatientsintoclinicallydistinctgroups.WehaveutilizedbreastcancerandglioblastomamultiformedatasetsfrommicroarrayandRNA-Seqplatformsandidentifieddiseasemechanismsdifferentiatingsamples.
53
CERNASEARCHMETHODIDENTIFIEDAMET-ACTIVATEDSUBGROUPAMONGEGFRDNAAMPLIFIEDLUNGADENOCARCINOMAPATIENTS
HallaKabat,LeoTunkle,InhanLee
miRcore
InhanLeeGiventhediversemolecularpathwaysinvolvedintumorigenesis,identifyingsubgroupsamongcancerpatientsiscrucialinprecisionmedicine.WhilemosttargetedtherapiesrelyonDNAmutationstatusintumors,responsestosuchtherapiesvaryduetothemanymolecularprocessesinvolvedinpropagatingDNAchangestoproteins(whichconstitutetheusualdrugtargets).ThoughRNAexpressionshavebeenextensivelyusedtocategorizetumors,identifyingclinicallyimportantsubgroupsremainschallenginggiventhedifficultyofdiscerningsubgroupswithinallpossibleRNA-RNAnetworks.Itisthusessentialtoincorporatemultipletypesofdata.Recently,RNAwasfoundtoregulateotherRNAthroughacommonmicroRNA(miR).TheseregulatingandregulatedRNAsarereferredtoascompetingendogenousRNAs(ceRNAs).However,globalcorrelationsbetweenmRNAandmiRexpressionsacrossallsampleshavenotreliablyyieldedceRNAs.Inthisstudy,wedevelopedaceRNA-basedmethodtoidentifysubgroupsofcancerpatientscombiningDNAcopynumbervariation,mRNAexpression,andmicroRNA(miR)expressiondatawithbiologicalknowledge.ClinicaldataisusedtovalidateidentifiedsubgroupsandceRNAs.SinceceRNAsarecausal,ceRNA-basedsubgroupsmaypresentclinicalrelevance.UsinglungadenocarcinomadatafromTheCancerGenomeAtlas(TCGA)asanexample,wefocusedonEGFRamplificationstatus,sinceatargetedtherapyforEGFRexists.WehypothesizedthatglobalcorrelationsbetweenmRNAandmiRexpressionsacrossallpatientswouldnotrevealimportantsubgroupsandthatclusteringofpotentialceRNAsmightdefinemolecularpathway-relevantsubgroups.UsingexperimentallyvalidatedmiR-targetpairs,weidentifiedEGFRandMETaspotentialceRNAsformiR-133binlungadenocarcinoma.TheEGFR-METupandmiR-133bdownsubgroupshowedahigherdeathratethantheEGFR-METdownandmiR-133bupsubgroup.AlthoughtransactivationbetweenMETandEGFRhasbeenidentifiedpreviously,ourresultisthefirsttoproposeceRNAasoneofitsunderlyingmechanisms.Furthermore,sinceMETamplificationwasseeninthecaseofresistancetoEGFR-targetedtherapy,theEGFR-METupandmiR-133bdownsubgroupmayfallintothedrugnon-responsegroupandthusprecludeEGFRtargettherapy.
54
IMPROVEDPERFORMANCEOFGENESETANALYSISONGENOME-WIDETRANSCRIPTOMICSDATAWHENUSINGGENEACTIVITYSTATEESTIMATES
ThomasKamp,MicahAdams,CraigDisselkoen,NathanTintle
DordtCollege
NathanTintleGenesetanalysismethodscontinuetobeapopularandpowerfulmethodofevaluatinggenome-widetranscriptomicsdata.Theseapproachrequireapriorigroupingofgenesintobiologicallymeaningfulsets,andthenconductingdownstreamanalysesattheset(insteadofgene)levelofanalysis.Genesetanalysismethodshavebeenshowntoyieldmorepowerfulstatisticalconclusionsthansingle-geneanalysesduetobothreducedmultipletestingpenaltiesandpotentiallylargerobservedeffectsduetotheaggregationofeffectsacrossmultiplegenesintheset.Traditionally,genesetanalysismethodshavebeenapplieddirectlytonormalized,log-transformed,transcriptomicsdata.Recently,effortshavebeenmadetotransformtranscriptomicsdatatoscalesyieldingmorebiologicallyinterpretableresults.Forexample,recentlyproposedmodelstransformlog-transformedtranscriptomicsdatatoaconfidencemetric(rangingbetween0and100%)thatageneisactive(roughlyspeaking,thatthegeneproductispartofanactivecellularmechanism).Inthismanuscript,wedemonstrate,onbothrealandsimulatedtranscriptomicsdata,thattestsfordifferentialexpressionbetweensetsofgenesusingaretypicallymorepowerfulwhenusinggeneactivitystateestimatesasopposedtolog-transformedgeneexpressiondata.Ouranalysissuggestsfurtherexplorationoftechniquestotransformtranscriptomicsdatatomeaningfulquantitiesforimproveddownstreaminference.
55
METHYLDMV:SIMULTANEOUSDETECTIONOFDIFFERENTIALDNAMETHYLATIONANDVARIABILITYWITHCONFOUNDERADJUSTMENT
PeiFenKuan,JunyanSong,ShuyaoHe
StonyBrookUniversity
PeiFenKuanDNAmethylationhasemergedaspromisingepigeneticmarkersfordiseasediagnosis.Boththedifferentialmean(DM)anddifferentialvariability(DV)inmethylationhavebeenshowntocontributetotranscriptionalaberrationanddiseasepathogenesis.ThepresenceofconfoundingfactorsinlargescaleEWASmayaffectthemethylationvaluesandhamperaccuratemarkerdiscovery.Inthispaper,weproposeaflexibleframeworkcalledmethylDMVwhichallowsforconfoundingfactorsadjustmentandenablessimultaneouscharacterizationandidentificationofCpGsexhibitingDMonly,DVonlyandbothDMandDV.Theproposedframeworkalsoallowsforprioritizationandselectionofcandidatefeaturestobeincludedinthepredictionalgorithm.WeillustratetheutilityofmethylDMVinseveralTCGAdatasets.AnRpackagemethylDMVimplementingourproposedmethodisavailableathttp://www.ams.sunysb.edu/~pfkuan/softwares.html#methylDMV.
56
IDENTIFYCANCERDRIVERGENESTHROUGHSHAREDMENDELIANDISEASEPATHOGENICVARIANTSANDCANCERSOMATICMUTATIONS
MengMa1,ChangchangWang2,BenjaminGlicksberg1,EricE.Schadt1,ShuyuLi1,RongChen1
1IcahnSchoolofMedicineatMountSinai,2AnhuiUniversity
ShuyuLiGenomicsequencingstudiesinthepastseveralyearshaveyieldedalargenumberofcancersomaticmutations.Thereremainsamajorchallengeindelineatingasmallfractionofsomaticmutationsthatareoncogenicdriversfromabackgroundofpredominantlypassengermutations.Althoughcomputationaltoolshavebeendevelopedtopredictthefunctionalimpactofmutations,theirutilityislimited.Inthisstudy,weappliedanalternativeapproachtoidentifypotentiallynovelcancerdriversasthosesomaticmutationsthatoverlapwithknownpathogenicmutationsinMendeliandiseases.Wehypothesizethatthosesharedmutationsaremorelikelytobecancerdriversbecausetheyhavetheestablishedmolecularmechanismstoimpactproteinfunctions.WefirstshowthattheoverlapbetweensomaticmutationsinCOSMICandpathogenicgeneticvariantsinHGMDisassociatedwithhighmutationfrequencyincancersandisenrichedforknowncancergenes.WethenattemptedtoidentifyputativetumorsuppressorsbasedonthenumberofdistinctHGMD/COSMICoverlappingmutationsinagivengene,andourresultssuggestthationchannels,collagensandMarfansyndromeassociatedgenesmayrepresentnewclassesoftumorsuppressors.Toelucidatepotentiallynoveloncogenes,weidentifiedthoseHGMD/COSMICoverlappingmutationsthatarenotonlyhighlyrecurrentbutalsomutuallyexclusivefrompreviouslycharacterizedoncogenicmutationsineachspecificcancertype.Takentogether,ourstudyrepresentsanovelapproachtodiscovernewcancergenesfromthevastamountofcancergenomesequencingdata.
57
IDENTIFYINGCANCERSPECIFICMETABOLICSIGNATURESUSINGCONSTRAINT-BASEDMODELS
AndréSchultz1,SanketMehta1,ChenyueW.Hu1,FiekeW.Hoff2,TerzahM.Horton3,StevenM.Kornblau2,AminaA.Qutub1
1RiceUniversity,2UniversityofTexasMDAndersonCancerCenter,3BaylorCollegeof
MedicineandTexasChildren'sHospital
AndréSchultzCancermetabolismdiffersremarkablyfromthemetabolismofhealthysurroundingtissues,anditisextremelyheterogeneousacrosscancertypes.Whilethesemetabolicdifferencesprovidepromisingavenuesforcancertreatments,muchworkremainstobedoneinunderstandinghowmetabolismisrewiredinmalignanttissues.Tothatend,constraint-basedmodelsprovideapowerfulcomputationaltoolforthestudyofmetabolismatthegenomescale.Togeneratemeaningfulpredictions,however,thesegeneralizedhumanmodelsmustfirstbetailoredforspecificcellortissuesub-types.Herewefirstpresenttwoimprovedalgorithmsfor(1)thegenerationofthesecontext-specificmetabolicmodelsbasedonomicsdata,and(2)Monte-Carlosamplingofthemetabolicmodelfluxspace.Byapplyingthesemethodstogenerateandanalyzecontext-specificmetabolicmodelsofdiversesolidcancercelllinedata,andprimaryleukemiapediatricpatientbiopsies,wedemonstratehowthemethodologypresentedinthisstudycangenerateinsightsintotherewiringdifferencesacrosssolidtumorsandbloodcancers.
58
SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
59
MAPPINGNEURONALCELLTYPESUSINGINTEGRATIVEMULTI-SPECIESMODELINGOFHUMANANDMOUSESINGLECELLRNASEQUENCING
TravisJohnson,ZacharyAbrams,YanZhang,KunHuang
OhioStateUniversity
TravisJohnsonMousebraintranscriptomicstudiesareimportantintheunderstandingofthestructuralheterogeneityinthebrain.However,itisnotwellunderstoodhowcelltypesinthemousebrainrelatetohumanbraincelltypesonacellularlevel.Weproposethatitispossiblewithsinglecellgranularitytofindconcordantgenesbetweenmouseandhumanandthatthesegenescanbeusedtoseparatecelltypesacrossspecies.Weshowthatasetofconcordantgenescanbealgorithmicallyderivedfromacombinationofhumanandmousesinglecellsequencingdata.Usingthisgeneset,weshowthatsimilarcelltypessharedbetweenmouseandhumanclustertogether.Furthermorewefindthatpreviouslyunclassifiedhumancellscanbemappedtotheglial/vascularcelltypebyintegratingmousecelltypeexpressionprofiles.
60
ASPATIOTEMPORALMODELTOSIMULATECHEMOTHERAPYREGIMENSFORHETEROGENEOUSBLADDERCANCERMETASTASESTOTHELUNG
KimberlyR.KanigelWinner1,JamesC.Costello2
1ComputationalBioscienceProgram,DepartmentofPharmacology,UniveristyofColoradoCancerCenter;2UniversityofColoradoAnschutzMedicalCampus
KimberlyKanigelWinnerTumorsarecomposedofheterogeneouspopulationsofcells.Somaticgeneticaberrationsareoneformofheterogeneitythatallowsclonalcellstoadapttochemotherapeuticstress,thusprovidingapathforresistancetoarise.Insilicomodelingoftumorsprovidesaplatformforrapid,quantitativeexperimentstoinexpensivelystudyhowcompositionalheterogeneitycontributestodrugresistance.Accordingly,wehavebuiltaspatiotemporalmodelofalungmetastasisoriginatingfromaprimarybladdertumor,incorporatinginvivodrugconcentrationsoffirst-linechemotherapy,resistancedatafrombladdercancercelllines,vasculardensityoflungmetastases,andgainsinresistanceincellsthatsurvivechemotherapy.Inmetastaticbladdercancer,afirst-linedrugregimenincludessixcyclesofgemcitabinepluscisplatin(GC)deliveredsimultaneouslyonday1,andgemcitabineonday8ineach21-daycycle.Theinteractionbetweengemcitabineandcisplatinhasbeenshowntobesynergisticinvitro,andresultsinbetteroutcomesinpatients.Ourmodelshowsthatduringsimulatedtreatmentwiththisregimen,GCsynergydoesbegintokillcellsthataremoreresistanttocisplatin,butrepopulationbyresistantcellsoccurs.Post-regimenpopulationsaremixturesoftheoriginal,seededresistantclones,and/ornewclonesthathavegainedresistancetocisplatin,gemcitabine,orbothdrugs.Theemergenceofatumorwithincreasedresistanceisqualitativelyconsistentwiththefive-yearsurvivalof6.8%forpatientswithmetastatictransitionalcellcarcinomaoftheurinarybladdertreatedwithaGCregimen.Themodelcanbefurtherusedtoexploretheparameterspaceforclinicallyrelevantvariables,includingthetimingofdrugdeliverytooptimizecelldeath,andpatient-specificdatasuchasvasculardensity,ratesofresistancegain,diseaseprogression,andmolecularprofiles,andcanbeexpandedfordataontoxicity.Themodelisspecifictobladdercancer,whichhasnotpreviouslybeenmodeledinthiscontext,butcanbeadaptedtorepresentothercancers.
61
SCALABLEVISUALIZATIONFORHIGH-DIMENSIONALSINGLE-CELLDATA
JuhoKim,NateRussell,JianPeng
UniversityofIllinoisatUrbana-Champaign
JuhoKimSingle-cellanalysiscanuncoverthemysteriesinthestateofindividualcellsandenableustoconstructnewmodelsabouttheanalysisofheterogeneoustissues.State-of-the-arttechnologiesforsingle-cellanalysishavebeendevelopedtomeasurethepropertiesofsingle-cellsanddetecthiddeninformation.Theyareabletoprovidethemeasurementsofdozensoffeaturessimultaneouslyineachcell.However,duetothehigh-dimensionality,heterogeneouscomplexityandsheerenormityofsingle-celldata,itsinterpretationischallenging.Thus,newmethodstoovercomehigh-dimensionalityarenecessary.Here,wepresentacomputationaltoolthatallowsefficientvisualizationofhigh-dimensionalsingle-celldataontoalow-dimensional(2Dor3D)spacewhilepreservingthesimilaritystructurebetweensingle-cells.Wefirstconstructanetworkthatcanrepresentthesimilaritystructurebetweenthehigh-dimensionalrepresentationsofsingle-cells,andthen,embedthisnetworkintoalow-dimensionalspacethroughanefficientonlineoptimizationmethodbasedontheideaofnegativesampling.Usingthisapproach,wecanpreservethehigh-dimensionalstructureofsingle-celldatainanembeddedlow-dimensionalspacethatfacilitatesvisualanalysesofthedata.
62
COMPUTATIONALAPPROACHESTOUNDERSTANDINGTHEEVOLUTIONOFMOLECULARFUNCTION
POSTERPRESENTATIONS
63
CLUSTER-BASEDGENOTYPE-ENVIRONMENT-PHENOTYPECORRELATIONALGORITHM
ErnestoBorrayo,RyokoMachida-Hirano
GeneResearchCenter,UniversityofTsukuba
ErnestoBorrayoTheinteractionsbetweengenotypeandenvironmentgiverisetophenotypicplasticity.However,theseinteractionsaredynamicandcomplex.Whatisconsideredasaphenotypeatoneevaluation,canbeconsideredasanenvironmentalconditionatsomeother,asthatpreviousphenotypewillaffectparticularconditionsforthenewone.Also,underaspecificperspectiveadeterminedgeneticmaterialcanbeconsideredasanenvironmentalconditionforotherloci.Theseconceptselucidatethatthe“onegene,onetrait”rationaleisrathertheexceptionthantherule,andinordertoadequatelypredictthepossiblephenotypeexpectedatanybiologicallevel,thespecificinteractionbetweenenvironmentandgenotypeshouldbeanalyzedcarefully.Inordertoinferthedegreeofinfluenceofbothagenotypeandanenvironmentovercertainphenotypictraits,wedevelopedacluster-basedalgorithmthatrendersthewayphenotypicaltraitscanbeexplainedbyeitherthatgenotypeorsuchenvironmentalconditions.Althoughthisapproachisstillfarfrombeingabletoconsiderallpossibleaspectsthatmayexplainaphenotypiccondition,itisafirstapproachtosuccessfullyanalyzingthementionedgenotype-environment-phenotypeinteractionsinacomprehensivemanner.Totestthealgorithmalongwithsyntheticdata,realgenetic,environmentalandagromorphologicaltraitsofTheobromacacaoandSechiumedulewerealsoanalyzed.Weexpectthatfurtherexplorationofdifferentclassifierswillhelptoadequatelypredictphenotypicexpressionatdifferentbiologicallevels—withsignificantapplicationsindiversefieldssuchascropimprovement,genomics,clinicaldiagnosis/prognosis/treatmentandmetabolomics—andthatitwillenhanceourunderstandingofgenomics,metabolomicsandadaptation/evolutionaryprocesses.
64
QUANTITATINGTRANSLATIONALCONTROL:MRNAABUNDANCE-DEPENDENTANDINDEPENDENTCONTRIBUTIONS
JingyiJessicaLi1,Guo-LiangChew2,MarkD.Biggin3
1DepartmentofStatisticsandDepartmentofHumanGenetics,UCLA;2ComputationalBiologyProgram,FredHutchinsonCancerResearchCenter;3BiologicalSystemsand
EngineeringDivision,LawrenceBerkeleyNationalLaboratory
JingyiJessicaLiTranslationratepermRNAmoleculecorrelatespositivelywithmRNAabundance.Asaresult,proteinlevelsdonotscalelinearlywithmRNAlevels,butinsteadscalewiththeabundanceofmRNAraisedtothepowerofan“amplificationexponent”.Hereweshowthattoquantitatetranslationalcontrolitisnecessarytodecomposethetranslationrateintotwocomponents.Onecomponent,TRmD,dependsonthemRNAlevelanddefinestheamplificationexponent.Theothercomponent,TRmIND,isindependentofmRNAamountandimpactsthecorrelationcoefficientbetweenproteinandmRNAlevels.WeshowthatinS.cerevisiaeTRmDrepresents~30%ofthevarianceintranslationandresultsinanamplificationexponentof~1.20–1.27.TRmINDconstitutestheremaining70%ofthevarianceintranslationandexplains<5%ofthevarianceinproteinexpression.Whenproteindegradationisalsoconsidered,thecorrelationbetweentheabundancesofproteinandmRNAisR2prot–RNA>0.92.WealsoinvestigatewhichmRNAsequenceelementsexplainthevarianceinTRmDandTRmIND.WefindthatTRmINDismoststronglydeterminedbythelengthoftheopenreadingframe,whileTRmDismorestronglydeterminedbyanArich,highlyunfoldedelementthatspansnucleotides-35to+28relativetotheinitiatingAUGcodon,implyingthatTRmINDisunderdifferentevolutionaryselectivepressuresthanTRmD.OurworkintroducesmethodsforcorrectlyscalingmRNAandproteinabundancedatausinginternallycontrolledstandards.Itprovidesquitedifferent,moreaccurateestimatesoftranslationalcontrolthananyprevious.Bydecomposingtranslationrates,wealsoprovideinsightsintothemRNAsequencedependenciesoftranslationthatwouldnotbeapparentotherwise.
65
PROSNET:INTEGRATINGHOMOLOGYWITHMOLECULARNETWORKSFORPROTEINFUNCTIONPREDICTION
ShengWang,MengQu,JianPen
UniversityofIllinoisUrbanaChampaign
ShengWangAutomatedannotationofproteinfunctionhasbecomeacriticaltaskinthepost-genomicera.Network-basedapproachesandhomology-basedapproacheshavebeenwidelyusedandrecentlytestedinlarge-scalecommunity-wideassessmentexperiments.Itisnaturaltointegratenetworkdatawithhomologyinformationtofurtherimprovethepredictiveperformance.However,integratingthesetwoheterogeneous,high-dimensionalandnoisydatasetsisnon-trivial.Inthiswork,weintroduceanovelproteinfunctionpredictionalgorithmProSNet.Anintegratedheterogeneousnetworkisfirstbuilttoincludemolecularnetworksofmultiplespeciesandlinktogetherhomologousproteinsacrossmultiplespecies.Basedonthisintegratednetwork,adimensionalityreductionalgorithmisintroducedtoobtaincompactlow-dimensionalvectorstoencodeproteinsinthenetwork.Finally,wedevelopmachinelearningclassificationalgorithmsthattakethevectorsasinputandmakepredictionsbytransferringannotationsbothwithineachspeciesandacrossdifferentspecies.Extensiveexperimentsonfivemajorspeciesdemonstratethatourintegrationofhomologywithmolecularnetworkssubstantiallyimprovesthepredictiveperformanceoverexistingapproaches.
66
GENERAL
POSTERPRESENTATIONS
67
IDENTIFICATIONOFDIFFERENTIALLYPHOSPHORYLATEDMODULESINPROTEININTERACTIONNETWORKS
MarziehAyati,DanicaWiredja,DanielaSchlatzer,GouthamNarla,MarkChance,MehmetKoyutürk
CaseWesternReserveUniversity
MehmetKoyuturkAdvancesinhigh-throughputomicstechnologiesrevolutionizedourunderstandingofthegenomicunderpinningsofcancer.However,manychallengesremaininunderstandinghowpatientswithcommondrivermutationsmaydisplaydivergingphosphoproteomicresponsestothesametreatment.Thus,anexaminationofthesignalinglandscapewillprovideessentialmolecularinformationformodelingpersonalizedpatienttreatmentdesign.However,integrativebioinformaticsapproachestoidentifyphosphoproteomics-basedmolecularstatesareintheirinfancy.Toaddressthischallenge,weadaptouralgorithmMoBaS,whichhasbeenoriginallydevelopedtoidentifyphenotype-associatedsubnetworksinthecontextofgenome-wideassociationstudies.MoBaStakesasinputaPPInetworkandascoreforeachproteinindicatingtheprotein’sdifferentialphosphorylationlevel.Itthenidentifiesproteinsubnetworksthatare(i)composedofdenselyinteractingproteins,and(ii)enrichedinproteinswithhighscores.MoBaSalsoassessesthestatisticalsignificanceoftheidentifiedsubnetworksusingpermutationteststhateffectivelyhandlemultiplehypothesistesting.WeapplyMoBaStocompareandcontrastthedrug-inducedglobalsignalingalterationsoftwoKRASmutatednon-smallcelllungcancer(NSCLC)celllines,A549andH358,treatedwithanovelactivatorofthetumorsuppressorProteinPhosphatase2A(PP2A)versusDMSOcontrol.Applyingkinaseenrichmentanalysisonidentifiedsubnetworks,weidentifyAuroraKBasakeykinasedifferentiallyregulatedbetweenthetwocelllinesinresponsetoourcompound.Furthercorroboratingthisfinding,weshowthatAuroraKBisdownregulatedattheproteinandmRNAlevelswithourtreatmentinA549butnotinH358.
68
CLUSTERINGMETHODFORPRIORITIZINGBREASTCANCERRISKGENESANDMIRNAS
YongshengBai,NaureenAslam,AliSalman
IndianaStateUniversity
YongshengBaiBackgroundMicroRNAs(miRNA)areshortnucleotidesthatinteractwiththeirtargetmRNAsthrough3’untranslatedregions(UTRs).TheCancerGenomeAtlas(TCGA)projectinitiatedin2006hasachievedtosequencetissuecollectionwithmatchedtumorandnormalsamplesfrom11,000patientsin33cancertypesandsubtypes,including10rarecancers.ThereisanurgentneedtodevelopinnovativemethodologiesandtoolsthatcanclustermRNA-miRNAinteractionpairsintogroupsandcharacterizefunctionalconsequencesofcancerriskgeneswhileanalyzingthetumorandnormalsamplessimultaneously.RationaleAnundirectedgraphcanbeusedtorepresentgeneandmiRNArelationshipsinaninteractionnetwork.Specifically,interactionsbetweengenesandmiRNAsarerenderedasabipartitegraphwithgenesormiRNAsasverticesandtheircalculatedcorrelationasedges.Ourhypothesisis:Ifahighlyscoredgene/miRNAclusterinagiventumorsampleshowsasignificantlyalteredregulationrelativetoasimilargene/miRNAclusterinthecorrespondingnon-tumorsample,theclusterisbiologicallysignificant.ResultsWedevelopedapowerfulmathematicalmodeltoidentifyclustersofsignificantmRNAandmiRNAinteractionpairsanddeciphermRNAandmiRNAregulationnetworkusingTCGAmiRNAsequencingandmRNAsequencingdata.WerantheclusterdetectionalgorithmimplementedinPython3onTCGABreastInvasiveCarcinoma(BRCA)transcriptome(bothRNA-SeqandmiRNA-Seq)datasets.Usingdifferentclustersize(orbin)anddifferentselectionofmiRNAandmRNApairsforcreatingclusterswillgeneratedifferenttopologyofclusters,therefore,resultingindifferentnumbersofcommonclustersbetweentumorandnormalsamplesaswell.Weran1,000differentrandomselectionsoftargetpairstogeneratedifferentclustertopologyandcombinedallresultstogethertoobtain105,850distinctivecandidateclustersforprioritization.ConclusionsWethinkourmethodologyforidentifyingcancerdrivergenesinpersonalgenomesinwhichcliniciansseektodevelopbettertreatmentstrategiesisvaluabletothefield.Ourproposedmethodshouldbeapplicableacrossarangeofdiseasesandcancers.
69
FUSIONDB:ASSESSINGMICROBIALDIVERSITYANDENVIRONMENTALPREFERENCESVIAFUNCTIONALSIMILARITY
ChengshengZhu1,YannickMahlich1,2,3,4,YanaBromberg1,4
1DepartmentofBiochemistryandMicrobiology,SchoolofEnvironmentalandBiologicalSciences,RutgersUniversity,NewBrunswick,NJ,USA;2GraduateSchool,Centerof
DoctoralStudiesinInformaticsanditsApplications(CeDoSIA),TUM,Garching,Germany;3DepartmentofInformatics,Bioinformatics&ComputationalBiology-I12,TUM,Garching,Germany;4InstituteofAdvancedStudy(TUM-IAS),Garching,Germany
YanaBrombergSummary:Microbialfunctionaldiversificationisdrivenbyenvironmentalfactors.Insomecases,microbesdiffermoreacrossenvironmentsthanacrosstaxa.HereweintroducefusionDB,anoveldatabaseofmicrobialfunctionalsimilarities,indexedbyavailableenvironmentalpreferences.fusionDBentriesrepresentnearlyfourteenhundredtaxonomically-distinctbacteriaannotatedwithavailablemetadata:habitat,temperature,andoxygenuse.Eachmicrobeisencodedasasetoffunctionsrepresentedbyitsproteome,andindividualmicrobesareconnectedviacommonfunctions.DatabasesearchesproduceeasilyvisualizableXML-formattednetworkfilesofselectedorganisms,alongwiththeirsharedfunctions.fusionDBthusprovidesafastmeansofassociatingspecificenvironmentalfactorswithorganismfunctions.Availability:http://bromberglab.org/databases/fusiondbandasasql-dumpbyrequest.Contact:[email protected],[email protected]
70
THEGEORGEM.O’BRIENKIDNEYTRANSLATIONALCORECENTERATTHEUNIVERSITYOFMICHIGAN
FrankC.Brosius1,WenjunJu1,KeithBellovich2,ZeenatBhat3,CrystalGadegbeku4,DebbieGipson1,JenniferHawkins1,JuliaHerzog1,SusanMassengill5,RichardC.
McEachin1,SubramaniamPennathur1,KalyaniPerumal6,RogerWiggins1,MatthiasKretzler1
1UniversityofMichigan,2RenaissanceRenalResearchInstitute,3WayneStateUniversity,
4TempleUniversity,5LevineChildren’sHospital,6UniviversityofIllinoisatChicago
RichardMcEachinRecentadvanceshaveallowedthedevelopmentofmolecularmapstodefinechronickidneydisease(CKD)innew,accurateandpersonalizedways.ThesedevelopmentsmakepossiblethepredictionofoutcomesandresponsetotherapyandtheidentificationofkeymoleculartargetsfortreatmentofCKDinindividualpatients.IdentificationofsuchtargetsentailsclosecollaborationbetweenteamsofinvestigatorstocollectandannotatesamplesfromwellcharacterizedCKDsubjects.Inaddition,technologiesareneededthatsupportinformationexchange,robustdatabanks,anddataintegrationtodefinekeypathwaysdrivingCKDpathogenesis.TheO'BrienKidneyTranslationalCoreCenterattheUniversityofMichiganprovidessuchbiobanking,databankstructureandbioinformaticsupporttobasicandclinicalinvestigatorstoallowthemtopursuecriticalprecisionmedicineinvestigationsofhumanswithCKD.TheClinicalPhenotypingandBiobankCorehasenrolledover1200patientswithCKDfrom5sitesandbankedtheirsamplesandclinicalinformationprovidingavaluableresourceforefficientdiscovery.Multiplespecificresearchstudieshavenowsuccessfullyutilizedtheseresources.TheAppliedSystemsBiologyCoreanditsonlineanalyticaltool,Nephroseq,haveassistedhundredsofinvestigatorsaroundtheworldinapproachestotheanalysisoflargetranscriptomicdatasetsandothersystems-level,biologicalstudiesofpatientswithCKD.TheCenter’sBioinformaticsCoreprovidesaccesstocomputationalapplicationsandskilledprofessionalsupportinbioinformaticsandbiostatisticsandwillnowbeprovidingback-endmaintenanceofNephroseq.TheAdministrativeCoredirectspilotandsmallgrants,studenttraininganddiscountprogramswiththegoalofhelpingnewandestablishedresearchersutilizesystemsbiologicalandtranslationalresearchtools.Togetherthesecoresprovideacomprehensivetranslationalresearchsupportfornovelresearchintoclassificationandtreatmentofchronickidneydiseases.Allinterestedacademicinvestigatorsaroundtheworldareinvitedtomakeuseoftheseservicesandtocontactusforinformationandconsultation.
71
MININGDIRECTIONALDRUGINTERACTIONEFFECTSONMYOPATHYUSINGTHEFAERSDATABASE
DanaiChasioti1,XiaohuiYao1,PengyueZhang2,XiaNing3,LangLi2,LiShen4
1IUPUISchoolofInformaticsandComputing;2CenterforComputationalBiologyandBioinformatics,DepartmentofMedicalandMolecularGenetics,IndianaUniversity
SchoolofMedicine;3IUPUIDepartmentofComputerScience;4CenterforNeruoimaging,DepartmentofRadiologyandImagingSciences,IndianaUniversitySchoolofMedicine
LiShenBackground:Mininghigh-orderdrug-druginteraction(DDI)inducedadversedrugeffects(ADEs)fromelectronichealthrecord(EHR)databasesisanemergingarea,andveryfewstudieshaveexploredtherelationshipsbetweenDDIs.Tobridgethisgap,westudyanovelpharmacovigilanceproblemforminingdirectionaldruginteractioneffectonmyopathyusingtheFDAAdverseEventReportingSystem(FAERS)database.Method:Theanalysiswasperformedonacase–controldatasetextractedfromtheFAERSdatabase.Thedatasetcontains1,763drugs,andincludes136,860myopathyeventsand3,940,587controlevents.GiventwosetsofdrugcombinationsD1andD2(asupersetofD1),wedefinethedirectionalADEeffectfromD1toD2,asthealteredADEriskassociatedwiththechangefromtakingD1totakingD2.TheADEriskswereestimatedusingoddratios(ORs).Toaddressbothcomputationalandstatisticalchallenges,thisstudywasfocusedoncomputingORsforfrequentD2’s(i.e.,thenumberofoccurrencesauser-specifiedminimumsupport).TheApriorialgorithmwasemployedtoidentifyfrequentD2’s.Results:Usingtheminimumsupportof1000,weidentified764frequentdrugs,7036frequent2drugcombinations,and4280frequent3drugcombinations.ThetoptenADEORsforsingledrugsrangefrom4.1to5.6,fortwodrugcombinationsfrom12.6to21.5,andforthreedrugcombinationsfrom14.8to19.5.ThetoptendirectionalADEORsbetweenonedrugandtwodrugsrangefrom13.5to28.2;thosebetweenonedrugandthreedrugsrangefrom13.1to20.3;andthosebetweentwodrugsandthreedrugsrangefrom11.3to34.4.MultiplepromisingdirectionalADEfindingswereidentified.Forexample,theriskofmyopathyis28.2timeshigherwhenaddingGadopentetatedimeglumineontopofGadobenatedimeglumine.BothdrugsareGadolinium-basedcontrastagents(GBCAs)usedinmagneticresonanceimaging.GBCAshavebeenshowntobeassociatedwithNephrogenicsystemicfibrosis(NSF)whichmaypresentasprogressivemyopathy.Conclusion:ThedirectionaldruginteractionscapturetheADErisksintroducedbyadditionaldrugstakenontopofasetofbaselinedrugs,andprovidenovelandvaluablepharmacovigilanceknowledgewithpotentialtoimpactclinicaldecisionsupport.MiningfrequentpatternsusingAprioriisapromisingapproachforeffectivediscoveryofhigh-orderdirectionaldruginteractioneffects.
72
DECIPHERINGNEURONALBROADHISTONEH3K4ME3DOMAINSASSOCIATEDWITHGENE-REGULATORYNETWORKSANDCONSERVEDEPIGENOMIC
LANDSCAPESINTHEHUMANBRAIN
AslihanDincer1,EricE.Schadt2,BinZhang2,JoelT.Dudley2,DavinGavin3,SchahramAkbarian4
1DepartmentofNeuroscience,FriedmanBrainInstitute,IcahnSchoolofMedicineatMountSinai,NewYork;2DepartmentofGeneticsandGenomicSciences,InstituteforGenomicsandMultiscaleBiology,IcahnSchoolofMedicineatMountSinai,NewYork;
3DepartmentofPsychiatry,JesseBrownVeteransAffairsMedicalCenter,Chicago;4DepartmentofPsychiatry,FriedmanBrainInstitute,IcahnSchoolofMedicineatMount
Sinai,NewYork
AslihanDincerOnlyfewhistonemodificationshavebeenmappedinhumanbrain.TrimethylationofhistoneH3atlysine4(H3K4me3)isachromatinmodificationknowntomarkthetranscriptionstartsites(TSS)ofactivegenepromoters.RegulatorsofH3K4me3markaresignificantlyassociatedwiththegeneticriskarchitectureofcommonneurodevelopmentaldisease,includingschizophreniaandautism.Here,throughintegrativecomputationalanalysisofepigenomicandtranscriptomicdatabasedonnextgenerationsequencing,weinvestigatedH3K4me3landscapesofFACSsortedneuronalandnon-neuronalnucleiinhumanpostmortem,non-humanprimate(chimpanzeeandmacaque)andmouseprefrontalcortex(PFC),andblood.WecharacterizedthebroadH3K4me3histonedomainsfromhumanPFCinthecontextofcell-typespecificregulation,associationwithneuronalandnon-neuronalgeneexpressionandpotentialimplicationsfornormalanddiseaseddevelopment.WefirstaddressedtheoccurrenceandthebiologicalsignificanceofthebroadH3K4me3histonedomainsinthreedifferentcelltypes,includingNeuN+PFCneurons,NeuN-PFCcells,andnucleatedbloodcellsandthenidentifiednovelregulatorsofthesethreedifferentcelltypesbyfocusingontop5%broadestH3K4me3peaks(lengthinbasepairs).InPFCneurons,broadestpeaksrangedinsizefrom3.9to12kb,withextremelybroadpeaks(~10kborbroader)relatedtosynapticfunctionandGABAergicsignaling(DLX1,ELFN1,GAD1,LINC00966).Broadestneuronalpeaksshoweddistinctmotifsignatures,andwerecentrallypositionedinprefrontalgenebayesianregulatorynetworks.Approximately120ofthebroadestH3K4me3peaksinhumanPFCneurons,includingmanygenesrelatedtoglutamatergicanddopaminergicsignaling,werefullyconservedinchimpanzee,macaqueandmousecorticalneurons.Explorationofspreadandbreadthoflysinemethylationmarkingsinspecificcelltypescouldprovidenovelinsightsintoepigeneticmechanismofnormalanddiseasedbraindevelopment,agingandevolutionofneuronalgenomes.
73
NORMALIZATIONTECHNIQUESANDMACHINELEARNINGCLASSIFICATIONFORASSIGNINGMOLECULARSUBSETSINAUTOIMMUNEDISEASEANDCANCER
JenniferM.Franks1,2,GuoshuaiCai1,JaclynN.Taroni3,4,MichaelL.Whitfield1,21DepartmentofMolecularandSystemsBiology;2PrograminQuantitativeBiomedicalSciences,
GeiselSchoolofMedicineatDartmouth;3DepartmentofSystemsPharmacologyandTranslationalTherapeutics;4InstituteforTranslationalMedicineandTherapeutics,Universityof
PennsylvaniaPerelmanSchoolofMedicineJenniferFranksSystemicsclerosis(SSc)isacomplexconnectivetissuediseaseinvolvingskinandinternalorganfibrosis,vasculardamage,andimmunologicabnormalities.Tocharacterizediseaseheterogeneityandmolecularpathogenesis,transcriptomicshaveelucidatedcommonbiologicalprocessesinsubsetsofSScpatientsusingintrinsicgeneexpressionanalyses.Fourintrinsicsubsetscharacterizedbydistinctmolecularsignatureshavebeenvalidatedbymultipleindependentcohorts.Technicalbiasesinherenttodifferentgeneexpressionprofilingplatformspresentauniqueproblemwhenanalyzingdatageneratedfrommultiplestudies.WhilemicroarrayandRNA-seqdatahavebeenshowntohaveahighcorrelation,differencesinoverallprocessingandquantificationresultindistinctdatadistributions.Here,weintroduceanaccurateandreproducibleclassifierforSScmolecularsubtypesandhavedevelopedamethodtonormalizedatawhenplatform-specificartifactsarise.Weusedthreeindependent,well-characterizedandvalidatedexperimentalmicroarraydatasets(Hinchcliffetal.,2013;Milanoetal.,2008;Pendergrassetal.,2012)totrainasupervisedclassifierusingthree-foldcross-validationrepeatedtentimes,performingatanaverageof>88%accuracy.Datafromotherplatforms,includingRNA-seq,areanalyzedforplatform-basedbiasusingguidedPCAanalysis(Reeseetal.,2013).Wedevelopedamethodtoeliminateplatformbiasbynormalizingonagene-by-genebasisusingthemicroarraytrainingdataasthetargetdistribution.Wefindthatthismethodsuccessfullyremovesplatform-specificeffectsfromthedata.Followingnormalization,eachsampleisassignedtoamolecularsubsetbasedonsupportvectormachine(SVM)classification.OurpreliminaryanalysesfindthatthesemethodsworkextremelywellonavalidationRNA-seqdatasetinSSc(100%accuracy,n=12,Lietal.,inpreparation).WealsoappliedourmethodstobreastcancerDNAmicroarrayandRNA-seqdatafromTheCancerGenomeAtlas(TCGA)(CancerGenomeAtlas,2012)wherefiveintrinsicgeneexpressionsubsetshavebeenpreviouslyidentifiedanddescribedwithPAM50(Parkeretal.,2009).Tumorandtumor-adjacentnormalbiopsiesofbreastcancer,forwhichintrinsicsubtypeinformationwasavailable,wereusedtotrainandtestaSVMandevaluateournormalizationtechnique.Weachieve93%accuracyinassigningsubtypesfornormalizedRNA-seqdatausingourclassifiertrainedexclusivelyonmicroarraydata.Untilrecently,clinicaltrialsanddiagnosingphysicianshavenotconsideredmolecularheterogeneityinthecontextofimmunosuppressivetherapy,whichmayexplainimprovementinselectSScpatients(Martyanov&Whitfield,2016).Advancingpersonalizedmedicinebyusingintrinsicmolecularsubsetsmayproveparticularlybeneficialtothisfield.Withournewlydevelopedtechniques,wecansuccessfullyleverageinformationfromvalidatedexpressiondatainnewanalysesdespitedifferentplatformsusedforgeneexpressionprofiling.
74
MULTI-OMICSDATAINTEGRATIONTOSTRATIFYPOPULATIONINHEPATOCELLULARCARCINOMA
KumardeepChaudhary,OlivierPoirion,LiangqunLu,LanaGarmire
UniversityofHawaiiCancerCenter,Honolulu
LanaGarmireHighmortalityrateofHepatocellularCarcinoma(HCC)isinpartduetothevastheterogeneityofthecancer.IdentifyingrobustmolecularsubgroupsofHCChelpstoguideprecisetargetedtherapeutics.Thiscouldberealizedbyintegratingdifferentlayersofomicsdatasetsfromthesamecohort.Toachievethis,wepresentadeeplearning(DL)basedmethodtoinspectthedifferentsubpopulationsofpatientswithinHCCfromTCGA.Weobtainedtheinformationof360HCCpatientsavailableinTCGAwith3omicsdatatypes(RNA-seq,miRNA-seqandmethylation).Toidentifythedifferentsubpopulations,ourpipelineimplementsaDL-basedautoencoder,identifieshiddenlayerslinkedtosurvival,andperformsk-meansclusteringusingthesenewfeatures.Toassignnewsamplestotheidentifiedsubpopulations,asupervisedclassificationprocedurewasconductedusingSupportVectorMachine(SVM).Toassesstheperformanceofthemodel,weused5-foldscross-validationschemetoestimatec-indexandbrierscores.Wealsoused60:40ratiotosplitthedatain10foldsinordertoassessthesignificanceofthecoxphregressioninthetestdataset.Finally,weinferredtheclusterlabelsoftwoexternalcohortsbasedonthegeneexpressiondata.Autoencoderframeworkwasusedtocombinethe3omicsasinputfeatures(~40,000)andtoproduce100transformednewfeatures.Amongthesenewfeatures,weidentified36featuressignificantlylinkedwithsurvival,whichwerefurtherusedtoinfer2optimalclustersofpatientswithsignificantsurvivaldifferences.Usingcross-validationprocedure,weobtainedaveragec-indexandbrierscorevaluesof0.70and0.20respectively,forthetestsets.Also,thecoxphregressionshowssignificantsurvivalestimationwhenusingthetestsamples.Finally,ourframeworkisvalidatedontwoexternaldataset:221HCCsamplesfromGEOstudyand230HCCsamplesfromLIRI-JP(RIKEN)cohort.Moreover,weprovedthateachoftheindividualomicfeaturesetscanbeusedsuccessfullytoinferthe2survivalprofiles.However,thecombinationofthe3omicsismorepowerful.WealsocomparedtheDLmethodologywithnewfeaturesproducedbyPCAinstead.Theclinicalandmoleculardifferences(intermsofsurvival,pathways,anddrivermutationprofiles)weresignificantlydifferentforthetwosubpopulations.Thisisthefirststudytoemploydeeplearningasarobustframeworktoidentifynon-linearcombinationofmulti-omicsfeatureslinkedtoidentificationofsubclassesofHCCpatients.Usingmulti-omicsdatasets,ourpipelinesuccessfullycombinesthesedifferentfeaturesandidentifiestwoHCCsubpopulationsexhibitingdifferentsurvivalprofiles.Wethenusedthismodelincombinationwithsupervisedmachine-learningapproachestopredictHCCsubpopulationassignmentfortestandvalidationdatasets.
75
TOWARDSSTANDARDS-BASEDCLINICALDATAWEBAPPLICATIONLEVERAGINGSHINYRANDHL7FHIR
NaHong,NareshProdduturi,ChenWang,GuoqianJiang
DepartmentofHealthSciencesResearch,MayoClinic,Rochester,MN
GuoqianJiangIntroduction:TheFastHealthcareInteroperabilityResources(FHIR)isanemergingclinicaldatastandarddevelopedatHL7,whichenablestherepresentationandexchangeoftheelectronichealthrecords(EHR)datainastandardstructure.FHIRhasstrongexecutableabilitybasedontheRESTfulservicearchitectureandmultipleflexibledataexchangeformats.ShinyisawebapplicationframeworkwithasimplifiedwebdeploymentmechanismthatenablespowerfulRfunctionstosupportthegraphicalandinteractiveanalysis.Therefore,withthegoalofbuildingreusableandextensibleclinicalstatisticsandanalysisapplications,weaimtodesign,developandevaluateaflexibleframeworkusingtheHL7FHIRstandardandtheR-poweredwebapplication-Shiny.Methods:WefirstestablishedalocalFHIRservertomanageourclinicaldata.ThispartofworkisfocusedontheanalysisandimplementationoftheFHIRdatamodels(i.e.,coreresources),dataexchangeformats(e.g.,XMLandJSON)andinvokinganopensourceHAPIFHIRAPI.Second,wedesignedtwoanalysisworkflowsthatarefocusedonpatient-centereddataanalysisandcohort-baseddataanalysisrespectively.Accordingtotheworkflowdesign,wedevelopedanopenapplicationplatformknownasShinyFHIRusingtheShinywebframeworkandtheestablishedFHIRserver.Results:WebuiltalocalFHIRserverusingtheHAPIDSTU2API.Intotal,140patientrecords,476observationrecords,496conditionrecordsand107procedurerecordswerepopulatedintotheFHIRserverfortesting.WiththesupportofRpackages,including‘jsonlite’,‘dygraph’and‘timeline’,ourplatformcanbeusedforavarietyofusecasesofclinicaldataanalysis,includingpatientbloodpressureobservationtimelineanalysis,patientcohortgender/agedistributionstatistics,etc.TheresultsoftheexperimentshowthattheShinyFHIRintegrationapproachoffersthefeasibilityofweb-basedinteractivestatisticsanalysisonstandardizedFHIR-basedclinicaldata.Discussions:TheimplementationsofFHIRhavealreadyattractedalotofinterestsfromhealthcarepractitioners.OurShinyFHIRimplementationprovidesausefulframeworkthatwouldbecomplementarytootherFHIR-basedapplications(e.g.,SMARTonFHIR).ShinyFHIRisdesignedtovisualizetheFHIR-conformantdatathroughcapturingtheuserexperiencesandhabits,andoffersrapidsupportforclinicalresearchwhilecombiningthelimitlessstatisticalpowerofR.However,thereareseveralissuesneedtobesolvedinthefuture,suchasthesupportoftheFHIRextensionsandcustommodelsandthesystemperformanceenhancement.Inthisstudy,wedescribedoureffortsinbuildingastandardizedclinicalstatisticsandanalysisapplicationleveragingShiny.WeconsiderthatthedesignedworkflowscanbeappliedtootherEHRsdatathatfollowstheFHIRstandard,andotherpublicavailableFHIRserverscanbeusedtovalidatetheutilityofourframework.
76
ADATALAKEPLATFORMOFCONTEXTUALBIOLOGICALINFORMATIONFORAGILETRANSLATIONALRESEARCH
AustinHuang1,DmitriBichko1,MathieuBoespflug2,EdskodeVries3,FacundoDominguez2,DanielZiemek1
1Pfizer,2TweagI/O,3Well-Typed
AustinHuangResearchersneedtoaggregatecontextualbiologicalinformationinordertointerpretexperimentalandclinicalstudyresults.Theseneedsvarygreatlydependingonthescientificquestion.Creatinglarge-scale,structureddatarepositoriesrequiressubstantialinvestmentthatisnotamenabletotherapidly-evolvingneedsoftranslationalresearch.Ontheotherhand,performingdataanalysesusingadhoccollectionsoflocaldatafiles(excelsheets,csvtables,etc.)allowsrapidandflexibleexecution,italsocreatestechnicaldebt.Inthelongterm,theseworkflowsresultinmissedopportunitiestoaccumulateinstitutionalknowledgeandareassociatedwithpoorreproducibility.Wehaveimplementedadataplatformthatcanachievethebenefitsofamoreprincipledhandlingofdatapersistencewithminimalanalystoverhead.Thisisachievedbyautomatingschemainference,metadatacuration,versioning,andRESTfulserviceproductionthroughasimple,Git-likeingestiontool.DatascientistscanretrievedataviafamiliarclientlanguageAPIssuchasdplyrinR.Theplatformisbuiltonopensourcedatabase(Postgres,withanarchitecturethatallowsalternativebackends)andfunctionalprogramming(Haskell,PostgREST)technologies.Ourobjectiveistoacceleratedatasharing/discoverabilityonanalystteamsanddrasticallyreducetheeffortofpersistingdatainasystematicmechanism.Wethereforeprovideatechnologyfoundationforrapiddataserviceproductionandimprovingreproducibilityandreusabilityindataanalyses.
77
GENOMEREADIN-MEMORY(GRIM)FILTER:FASTLOCATIONFILTERINGINDNAREADMAPPINGUSINGEMERGINGMEMORYTECHNOLOGIES
JeremieKim1,DamlaSenol1,HongyiXin2,DonghyukLee1,3,MohammedAlser4,HasanHassan5,OguzErgin5,CanAlkan4,OnurMutlu1,6
1DepartmentofElectricalandComputerEngineering,CarnegieMellonUniversity,Pittsburgh,PA;2DepartmentofComputerScience,CarnegieMellonUniversity,Pittsburgh,PA;3NVIDIA
Research,Austin,TX;4DepartmentofComputerEngineeringBilkentUniversity,Ankara,Turkey;5DepartmentofComputerEngineering,TOBBUniversityofEconomicsandTechnology,
Söğütözü,Ankara,Turkey;6DepartmentofComputerScience,SystemsGroupETH,Zürich,Switzerland
JeremieKimHigh-throughput sequencing (HTS) technology has resulted in a massive influx of available genetic data. Using HTS technology, genomes are sequenced relatively quickly and result in many short DNA sequences (reads) that are used to analyze the donor’s genome across multiple days when using state-of-the-art methods. The first step of genome analysis, read mapping, determines origins for billions of reads within a reference genome to identify the donor’s genomic variants. Hash-table based read mappers are a common type of comprehensive read mappers. They operate by fetching from a pre-generated hash-table, potential mapping locations of a read in the reference genome, which are verified by local alignment, a computationally-expensive dynamic programming algorithm that determines similarity between the read and the potential mapping segment of the reference genome. Alignment has traditionally been the computational bottleneck of read mapping, but recently, many works have been proposing a new step called Location-Filtering in order to alleviate this bottleneck.
Location-Filtering is a critical step where many incorrect potential locations from the hash-table are discarded before local alignment verifies such locations. FastHASH, SHD, and GateKeeper propose variations of Location-Filtering that discard only incorrect locations to reduce end-to-end runtime of hash-table based read mapping. Location-Filtering is now the computational bottleneck of read mapping.
Our goal is to create an efficient Location-Filter that quickly discards as many false negative locations as possible before alignment, while retaining a zero false positive rate. Efficiently filtering incorrect mappings before alignment significantly improves throughput and latency of hash-table based read mapping. We propose a novel filtering algorithm that quickly eliminates from consideration reference genome segments where alignment would yield no matches. Our algorithm’s novelty mainly stems from its design to exploit 3D-stacked memory systems. 3D-stacked memory is an emerging technology that tightly integrates computation and high-capacity memory in a single die stack, thereby enabling concurrent processing of large data chunks at low latency and high bandwidth. The key ideas of our design consist of 1) a new representation of coarse-grained reference genome segments such that the genome can be operated on in parallel using bitwise operations and 2) exploiting the parallel computation capability of 3D-stacked memory to run massively-parallel in-memory operations on the new genome representation. We call our resulting filter the GRIM-Filter.
This work shows how GRIM-Filter can be used with any hash-table based read mapping algorithm and how it effectively exploits processing-in-memory capabilities of 3D-stacked memory. We show that when running with 5% error tolerance, GRIM-Filter reduces false positive locations by 5.59x-6.41x and provides a 1.81x-3.65x end-to-end speedup over the state-of-the-art read mapper mrFAST with FastHASH
78
BCL-2FAMILYMEMBERSASREGULATORSOFRESPONSIVENESSTOBORTEZOMIBINAMULTIPLEMYELOMAMODEL
MelissaE.Ko1,2,CharisTeh3,4,ChristopherS.Playter5,EliR.Zunder6,DanielH.Gray4,7,WendyJ.Fantl8,SylviaK.Plevritis9,GarryP.Nolan2
1CancerBiologyProgram,StanfordSchoolofMedicine,Stanford,CA;2BaxterLaboratoryforStemCellBiology,StanfordSchoolofMedicine,Stanford,CA;3MolecularGeneticsofCancerDivision,ImmunologyDivision,TheWalterandElizaHallInstitute,Parkville,VIC,Australia;
4DepartmentofMedicalBiology,TheUniversityofMelbourne,Parkville,VIC,Australia;5DepartmentofBiologicalSciences,PurdueUniversity,Lafayette,IN;6DepartmentofBiomedical
Engineering,UniversityofVirginia,Charlottesville,VA;7TheWalterandElizaHallInstitute,Parkville,VIC,Australia;8DepartmentofObstetricsandGynecology,StanfordSchoolof
Medicine,Stanford,CA;9DepartmentofRadiology,StanfordSchoolofMedicine,Stanford,CAMelissaKoSurvivalratesforBcellmalignancieshavesteadilyimprovedoverthelastfivedecadesreachinglevelsofover50%asaresultoftherapeuticagentssuchasdexamethasone,bortezomib,andlenalidomide.However,despitetheirsuccessinproducingclinicalresponses,thecellularmechanismsbywhichtheseagentskilltumorcellsarepoorlyunderstood.WehypothesizedthattheBcl-2familyofproteins,whichareknowntocontrolinitiationofapoptosisandarefrequentlydysregulatedincancerousBcellssuchasmultiplemyeloma,caninfluenceresponsivenesstothesetherapeuticagents.Thus,withafocusonmultiplemyeloma,weaimedtocomprehensivelyprofileindividualcellsfortheirexpressionlevelsofBcl-2familymemberssimultaneouslywithactivatedintracellularsignalingproteinsuponexposureofcellstodrugsusedtotreatB-cellmalignancies.Weappliedsingle-cellmasscytometrytoinvestigatetheinterplayofpro-survivalandpro-apoptoticBcl-2familymembersinMM1SBlymphoblasticcellsexposedtodifferentdrugs.ThisdatasetwasanalyzedwithFLOW-MAP,acomputationaltooldevelopedintheNolanLabthatorganizeshigh-dimensionalsingle-celldataintoaninterpretable2Dgraphstructure.FLOW-MAPenabledtheapoptoticprogressionofindividualcellstobevisualizedandshowedchangesinexpressionlevelsofBcl-2familymembersandsignalingfactorsacrosscellswithdifferentdrugsensitivities.Ourextensivestudyrevealedheterogeneousresponsesofcellsubsetstotherapeuticagentsusedtotreatmultiplemyelomapatients.Forexample,ourresultsshowedthatbortezomib,aproteasomeinhibitorapprovedfortreatmentofmultiplemyeloma,potentlyinducesapoptosiswithin24hourstoagreaterextentcomparedtoothertreatments.Inductionofapoptosisinsinglecellstreatedwithbortezomibcoincidedwithaselectivereductionofasubsetofpro-survivalBcl-2members.Furthermore,ouranalysissuggeststhatametricthatreflectsthebalanceofpro-survivalandpro-apoptoticBcl-2proteinsmaybestseparateandpredictcellswithdifferentialsensitivitytobortezomib.Thisparadigmissupportedbystatisticalmodelingwhereinwedevelopedaclassifierofbortezomib-resistantvs.sensitivecellsusingBcl-2familyinformationorasingleBcl-2scorewithsignificantaccuracy.Ourstudyprovidesageneralframeworkforunderstandingdifferentialsensitivityoftumorpopulationstoanti-cancerdrugs.Ourresultsarelikelytoidentifypreviouslyunknowndeath-inducingmechanismsaswellaspinpointpotentialsynergiesbetweenstandard-of-caretherapiesandnewlydevelopedtherapies,suchasBcl-2familyinhibitors.
79
BIOMEDICALTEXT-MININGAPPLICATIONSFORTHESYSTEMDEEPDIVE
EmilyK.Mallory,ChrisRe,RussB.Altman
StanfordUniversity
EmilyMalloryAcompleterepositoryofbiomedicalrelationshipsiskeyforunderstandingtheprocessesunderlyingbothhumandiseaseanddrugresponse.Afterdecadesofexperimentalresearch,themajorityofknownbiomedicalrelationshipsexistsolelyintextualformintheliteratureandarethuscomputationallyinaccessible.Whilecurateddatabaseshaveexpertsmanuallyannotaterelevantrelationshipsorinteractionsfromtext,thesedatabasesstruggletokeepupwiththerapidgrowthofthebiomedicalliterature.Toaddresstheneedforbiomedicalrelationshipextraction,therehavebeennumerousbiologicalentityandrelationshipextractionchallenges;however,extractionsystemsinthebiomedicalspacetendtobetaskspecificanddonotprovideageneralframeworkforquicklydevelopingfuturesystemstoaddressnewextractiontasks.Inthiswork,wedevelopedmultipleentityandrelationshipapplications(called“extractors”)forthesystemDeepDivetoextractbiomedicalrelationshipsfromfulltextarticles.DeepDiveisatrainedsystemforextractinginformationfromavarietyofsources,includingtext.Applicationdeveloperscreatefeaturesandtrainingexamples,andDeepDiveassignsaprobabilitythatagivenentityorrelationshipiscorrectortrueintheoriginalsentence.Wedevelopedentityextractorsforgenes,drugs,anddiseases;andrelationshipextractorsforgene-gene,gene-disease,andgene-drugrelationships.Weevaluatedthegene-geneworkpreviouslywithacorpusofarticlesfromthreePLOSjournals,andwearecurrentlyevaluatingtheothertworelationshipextractorsonacorpusfromPubMedCentral.Theprecisionofourentityextractorsrangedfrom80to90%.Forthetaskofextractinggene-generelationships,oursystemachieved76%precisionand49%recallinextractingdirectandindirectinteractionspreviouslycuratedbytheDatabaseofInteractingProteins(DIP).Forrandomlycuratedextractions,thesystemachievedbetween62%and83%precisionbasedondirectorindirectinteractions,aswellassentence-levelanddocument-levelprecision.Ourcurrentgene-diseaseandgene-drugextractorsachievedover70%precisiononarandomsubsetofdocumentsfromover340,000fulltextarticlesinthePubMedCentralOpenAccessSubset.Wearecurrentlytuningtheseextractorstoincreaseperformance.Thisworkwillenablenotonlyfulltextliteratureextractionforbiomedicalrelationships,butalsocomputationalmethodsdevelopmentbasedontheserelationships.
80
PROFILINGADAPTIVEIMMUNEREPERTOIRESACROSSMULTIPLEHUMANTISSUESBYRNASEQUENCING
SergheiMangul1,IgorMandric2,HarryTaegyunYang1,DennisMontoya1,NicolasStrauli3,JeremyRotman1,BenjaminStatz1,WillVanDerWey1,AlexZelikovsky2,Roberto
Spreafico1,MauraRossetti1,SagivShifman1,MarkAnsel3,NoahZaitlen3,EleazarEskin1
1UniversityofCaliforniaLosAngeles,2GeorgiaStateUniversity,3UniversityofCaliforniaSanFrancisco
SergheiMangulAssay-basedapproachesprovideadetailedviewoftheadaptiveimmunesystembyprofilingT-andB-cellreceptors.However,thesemethodscomeatahighcostandlackthescaleofregularRNAsequencing(RNA-seq).WedevelopedImReP,anovelcomputationalmethodthatutilizesRNA-seqdatatoprofiletheadaptiveimmunerepertoire.ImRePisabletoquantifyindividualimmuneresponsesfromRNA-SeqdatabasedonarecombinationlandscapeofgenesencodingB-andT-cellreceptors.WeappliedImRePto8,555samplesfrom544individualsand53diversehumantissues,andconstructedthecomplementaritydeterminingregions3(CDR3),whichisthemostvariablepartoftheantigen-bindingsite.Weassembled3.8milliondistinctCDR3sequences.Analyzingthisdataset,weidentifiedthenormal,healthy,adaptiveimmuneprofilefordifferenttissues.Wedescribethevariationinimmuneprofiles,andthedistributionofclonallineagesacrossindividualsandtissues.BaseontheimmuneprofilesgeneratedbyImReP,wewereabletoidentifyinflammationandvariousdiseases,asconfirmedfromthehistologicalimages.TheatlasofTandBcellrepertoires,freelyavailableathttps://sergheimangul.wordpress.com/atlas-of-t-and-b-cell-repertoires/,isthelargestrecourseintermsofthenumberofCDR3sequencesandtissuetypesinvolved.Weanticipatethisrecoursetoenhancefuturestudiesinareassuchasimmunologyandadvancedevelopmentoftherapiesforhumandiseases.ImRePisfreelyavailableathttps://sergheimangul.wordpress.com/imrep/.
81
THECMHVARIANTWAREHOUSE-ACATALOGOFGENETICVARIATIONINPATIENTSOFACHILDREN'SHOSPITAL
NeilMIller1,GreysonTwist1,ByunggilYoo1,AndreaGaedigk2
1CenterforPediatricGenomicMedicine,Children'sMercy,KansasCity;2DivisionofClinicalPharmacology&TherapeuticInnovation,Children'sMercy,KansasCity,School
ofMedicine,UniversityofMissouri-KansasCity
NeilMillerAdvancesinhigh-throughputDNAsequencinghaveenabledthecomprehensiveidentificationofindividualgeneticvariationonanunprecedentedscale,poweringthediagnosisofdiseaseandpersonalizedtreatment.Astheabilitytodetectgeneticvariationhasgrown,cliniciansandresearchersstruggletointerpretthefunctionalsignificanceofthemillionsofvariantsfoundineachindividualgenome.TheVariantWarehouseattheCenterforPediatricGenomicMedicineatChildren’sMercy,KansasCity,isaresourcecontainingarecordofover160milliongenomicvariantsdetectedinmorethan5000patientssequencedbytheCentersince2011.EachvarianthasbeencharacterizedbytheCPGM’sRapidUnderstandingofNucleotideEffectSoftware(RUNES)pipeline,whichrecordsdatabasecrossreferences,predictedfunctionalconsequencesandavariantclassificationscore(1-5)basedonpreliminaryguidelinesfromtheAmericanCollegeofMedicalGeneticsandGenomics(ACMG).Additionally,alocalallelefrequencyiscalculatedforeachvariantevery6hoursenablingcliniciansandresearcherstorapidlyidentifyrarevariants.Despiteextensivecross-referencingwithdatabasessuchasdbSNP,ClinVar,ExACandCOSMICtheCMHvariantwarehousecontainsasignificantnumberofnovelvariantsnotpresentinexternaldatabases.59%ofthetotalvariantsinthewarehousearenovelwithalocalallelefrequencyoflessthan0.25%.Ofthese,1%arecategory1-3variantsexpectedtohavesomefunctionalimpact.Wehaveobserved82,578variantsamongapanelof58pharmacogenes(includingCPICgenes),ofwhich59%arenoveland2%arecategory1-3variants.Theamountofnoveltyobservedinthispatientpopulationsuggeststhateffortstocomprehensivelycataloghumanvariationremainaworkinprogressandthatinterpretationofvariantdatawillrequiresomelevelofinterpretationofnovelvariantsfortheforeseeablefuture.Theseobservationsareincreasinglyrelevantinpharmacogenomicsapplicationswheredrugcompatibilityisdeterminedthroughassociationtoknownhaplotypes;inthiscontext,thepresenceofnovelandrarevariantsmustbeanticipatedandaccountedforinautomatedhaplotypedetermination.TheCMHvariantwarehouseispubliclyavailableathttp://warehouse.cmh.edu.Toolstosearchandviewvariantsbygene,categoryandallelefrequencyareprovidedaswellasbulkdownloadsofdata.ProgrammaticaccesstodataisprovidedthroughimplementationsoftheGlobalAllianceforGenomicsandHealthvariantannotationAPI.
82
MUTPRED2ANDITSAPPLICATIONTOTHEINFERENCEOFMOLECULARSIGNATURESOFDISEASE
VikasPejaver1,LiliaM.Iakoucheva2,SeanD.Mooney3,PredragRadivojac1
1DepartmentofComputerScienceandInformatics,SchoolofInformaticsandComputing,IndianaUniversityBloomington;2DepartmentofPsychiatry,UniversityofCaliforniaSanDiego;
3DepartmentofBiomedicalInformaticsandMedicalEducation,UniversityofWashingtonSeattlePredragRadivojacOverthepastdecade,severalmethodshavebeendevelopedforthecomputationalprioritizationofmissensemutations.However,theidentificationoftheeffectsofsuchmutationsonproteinstructureandfunctionstillremainamajorchallenge.Previously,wedevelopedMutPred,arandomforest-basedmodelfortheclassificationofpathogenicmissensevariantsandtheautomatedinferenceofmolecularmechanismsofdisease.Here,webuildonourpreviousworkandpresentMutPred2asanimprovedapproachforthesetasks.Forpathogenicityprediction,MutPred2particularlybenefitsfromalargerandheterogeneoustrainingset,theinclusionofnewfeatures,theencodingoflocalsequencecontextandtheuseofaneuralnetworkensemble.Throughcross-validationexperimentsandatestonanindependentdataset,weshowthatMutPred2outperformsMutPredandotherstate-of-the-artmethods.Inparticular,weobservethatMutPred2predictsfewerpathogenicmutationsthanPolyPhen-2,whenappliedtohomozygousmutationsfromhealthyindividuals.Additionally,MutPred2hasover50built-instructuralandfunctionalpropertypredictors,whichgreatlyincreasethenumberofpossibledownstreamconsequencesthatcanbeassociatedwithagivenaminoacidsubstitution.Weintroduceanovelrankingapproachthatutilizesapositive-unlabeledlearningframeworktoderiveposteriorprobabilitiesforthedisruptionofthesepropertiesand,thus,inferthemostlikelymolecularmechanismofpathogenicity.WethendemonstratetheutilityofMutPred2intwosituations.First,weidentifyprominentstructuralandfunctionalsignaturesinadatasetofmostlyMendeliandiseases(fromMutPred2’strainingset)andrecapitulateknownassociationsbetweenthesediseasesandorderedandstructuredregionsofproteins.Wealsomakenovelpredictionsabouttheroleofallostericresiduesinsuchdiseases.Second,weapplyMutPred2toadatasetofdenovomutationsfrompatientsdiagnosedwithneuropsychiatricdisorders,alongwithhealthysiblingsascontrols.Onthisdataset,MutPred2pathogenicityscoresalonearesufficienttodistinguishbetweenneuropsychiatriccasesandcontrols,withoutanyadditionalgene-basedorvariant-basedfiltering.Wealsoobservethatdisruptionsinprotein-proteininteractions(PPIs),phosphorylationandacetylationarefrequentmechanisms,suggestingthatneuropsychiatricdisordersarelargelycharacterizedbyabreakdowninmolecularsignaling.Finally,weidentifycandidatemutationspredictedtodisruptPPIsandvalidatethemexperimentally.
83
HIV-TRACE:MONITORINGTHEHIVEPIDEMICINNEARREALTIMEUSINGLARGENATIONALANDGLOBALSCALEMOLECULAREPIDEMIOLOGY
SergeiPond1,StevenWeaver1,JoelWertheim2,AndrewJ.LeighBrown3
1TempleUniversity,2UniversityofCaliforniaSanDiego,3UniversityofEdinburgh
SergeiPondManypathogens,includingHIV,propagatealongsexualandsocialcontactnetworks.ItisnowclearthatHIVtransmissionnetworksbelongtothescalefreefamilyandthespreadofinfectionsinscalefreenetworksiscriticallyenhancedbyhighlyconnectedindividualsor“hubs”.Thestructureofthetransmissionnetworkhasmajorimplicationsforinterruptinganepidemic.Sincepathogentransmissionnetworksarenotobserveddirectly,theyareinferredandcharacterizedbasedonindirectmeasurements,andmethodstodothisproperlyremainsanopenresearchchallenge.Becauseoftheirrapidandhost-specificevolutionandchronicdiseasestates,HIVsequenceisolatesareessentiallyuniquetoeachinfectedperson.Thissequenceuniquenesscanbeusedtoconfirmorrejectthehypothesisthattwoindividualsare“linked”byarecenttransmissionorbelongtothesametransmissionclusterThereare~1,000,000HIVsequencesisolatedfromdifferentindividualsoverthelast4decades.Nationalandinternationalsurveillanceanddrugresistanceprogramsaregeneratinghighresolutionsequencingdataonhundredsofthousandsofisolatesannually.WedevelopedHIVTransmissionClusterEngine(HIV-TRACE)inordertomaketheprocessofcluster(andnetwork)inferenceautomated,fast,convenient,andmorerobust.Itisanefficientopen-sourceapplicationdesignedtoscalewellandenablenearreal-timeinferenceandanalysisoflargenetworks:itcanprocess100,000sequencesin~15-30minutesona64corebackendsystem.HIV-TRACE(hiv-trace.org)isanopen-sourcewebapplicationbuiltonrobustandpopularmodernlibraries.Userinteractionandresultvisualizationisdoneentirelyinthebrowser,processingisdoneasynchronouslyonaserverbackend.ComponentsandversionsofHIV-TRACEareusedbytheCDC(VARS,HICSB),Canadianpublichealthofficials,NYCDepartmentofPublichealth,SanDiegoprimaryinfectioncohort,andtheUKDrugResistanceDatabase.WeillustratetheutilityofHIV-TRACEonfourreal-worldexamplesofessentialquestionsinpublichealthandepidemiologyofHIV-1:1).Arethererapidlygrowingtransmissionclusters,andwhatisdrivingtheirgrowth?2).HowdoesHIVspreadatdifferentgeographicscales,andamongdifferentriskgroups?3).Howcantreatmentandinterventionbedeployedinoptimalwaystoreduceincidenceandprevalence?4).Canvaccineandpreventionefficacybemeasuredmoreaccuratelyusingnetwork-levelinformation.
84
THEEXTREMEMEMORY®CHALLENGE:ASEARCHFORTHEHERITABLEFOUNDATIONSOFEXCEPTIONALMEMORY
MaryA.Pyc,EmilyGiron,PhilipCheung,DouglasFenger,J.StevendeBelle,TimTully
DartNeuroScience
DouglasFengerWeareinterestedindiscoveringnewcandidatetargetsfordrugtherapiestoenhancecognitivevitalityinhumansthroughoutlife,andtoremediatememorydeficitsassociatedwithbraininjuryandbrain-relateddiseasessuchasAlzheimer’sandParkinson’s.Toachieveourgoalweneedacomprehensiveandobjectiveunderstandingofthehumangenomecontributiontovariationinmemoryperformanceinhealthyindividuals.WeareimplementingaGenome-WideAssociationStudy(GWAS)toidentifygeneticlocivaryingamongindividualswhopossessexceptionalandnormalmemoryabilities.Thesegenesandthoseinassociatednetworkswillinformdrugdiscoveryanddevelopment.Ourfirststepistoidentifyexceptionalmembersofthepopulation.Thus,wehavecreatedanonlinememorytest–theExtremeMemoryChallenge(XMC,accessibleathttp://www.extremememorychallenge.com)–toconvenientlyscreenthroughanunlimitednumberofsubjectstofindindividualswithexceptionalmemoryconsolidationabilities.Identifiedsubjectsare(1)validatedbyabatteryofsecondarymemorytasks,and(2)providingsalivasamplesfromwhichwecanisolateDNAforGWAS..TenpilotexperimentswereconductedtoparameterizetheXMCscreen.Participantslearnedface-namepairsforadelayedrecalltest.Afterinitialstudy,eachnamewaspresentedandparticipantswereaskedtoselectthecorrectfaceamongfour(distracterswereotherfacespairedwithdifferentnames).Onedaylaterparticipantscompletedafinaltesttrial.Weareprimarilyinterestedinforgettingacrosssessions,asthisprovidesanestimateofconsolidationacrossa24-hourtimeinterval.Pilotstudiesindicatedtheoptimalprotocolshouldinclude30face-namepairs,presentedata4secondrate.Todate,17,849participantsfrom176nationshavebeenscreenedintheXMC.Ofthese,11,311havecompletedbothsessions.IndividualsinoursamplearemostfrequentlyCaucasians(55%),post-secondaryschool-educated(63%),reportedbeingmostalertinthemorning(51%),andrighthanded(89.5%).Theaverageagewas34,andthegenderdistributionwassplitevenly.Theforgettingrate(decreaseinperformancefromday1today2)was10%.Wehaveidentified49individualswithperfectperformanceonday2ofthetestand24withexceptionalconsolidationabilities(definedas3SDsfromthemean).Wehavebegunthegenomicsphaseofthestudywith33individualswhohavecompletedadditionalbehavioraltesting.
85
RESCUETHEMISSINGVARIANTS-LESSONSLEARNEDFROMLARGESEQUENCINGPROJECTS
YingxueRen1,JosephS.Reddy1,VivekanandaSarangi2,JasonP.Sinnwell2,SteveG.Younkin3,NilüferErtekin-Taner3,OwenA.Ross3,RosaRademakers3,ShannonK.McDonnell2,JoannaM.
Biernacka2,YanW.Asmann1
1DepartmentofHealthSciencesResearch,MayoClinic,Jacksonville,FL;2DepartmentofHealthSciencesResearch,MayoClinic,Rochester,MN;3DepartmentofNeuroscience,MayoClinic,
Jacksonville,FLYingxueRenIdentifyingnoveldiseasevariantsthroughnextgenerationsequencing(NGS)hasbeenafruitfulpracticeinmedicalresearchinrecentyears,leadingtothediscoveriesofnewdiseasemechanismsaswellastherapeuticstrategies.TheGATKbestpracticeshavesincebeenestablishedtoprovidegeneralrecommendationsoncoreprocessingstepsrequiredtogofromrawreadstofinalvariantcallsets.However,withthesamplesizedrasticallyincreasingintoday’ssequencingexperiments,manydefaultvariantcallingstrategiesandthechoiceoftoolscallforacloserexamination.OurstudyutilizedthewholeexomesequencingdataprovidedbytheAlzheimer'sDiseaseSequencingProject(ADSP)totestfordifferentvariantcallingstrategiesandtoolsinvolvedinthevariantdiscoveryworkflowinthecontextofsamplesizes.WefirstinvestigatedtheimpactofusingdifferentsequencealignersonvariantcallsetswhilekeepingthedefaultGATKsettingsofthevariantcallingandQCstepsidentical.Weselected1952samplestoalignbybothBWAandNovoAlign,andcomparedthevariantcallsetsin50,100,200,500,1000and1952samples.Wediscoveredthatthepercentageofvariantsuniquetoalignerincreaseddramaticallywithincreasingsamplesizes.Atsamplesizeof1952,theuniquevariantsgeneratedbyBWAandNovoAlignaccountformorethan20%oftotalcalledvariants.Theseuniquevariantshavegoodvariantqualitymetrics:~80%haveGenotypeQuality(GQ)scoreof60orabove,andtheirdistributionofBalleleconcentration(BAC)centersaround0.5and1,consistentwithwhatisexpectedofdiploidgenomes.What’smore,over96%oftheuniquevariantshavepopulationBallelefrequency(BAF)oflessthan0.01,indicatingthatthesevariantsarerareinthepopulation.Allthesemetricssuggestthattheseuniquevariantsareimportanttobeincludedindownstreamvariantanalysis.Inadditiontoalignercomparison,wealsoevaluatedsingle-samplevariantcallingversusthedefault,singlesamplevariantcallingfollowedbyjointmulti-samplegenotypingstrategyin50,100,500,2000,and5000samples.Ourdatashowedthat,withincreasingsamplesizes,thesingle-samplecallingstrategyaddedincreasingpercentageofuniquevariants.Atsamplesizeof5000,single-samplecallingadded58,884variants,accountingfor5.55%oftotalvariantscalledbybothstrategies.7331oftheseuniquevariantspassedVariantQualityScoreRecalibration(VQSR)andhaveGQof60oraboveinatleast5samples.Ourstudyidentifiedalargenumberofgood-qualityvariantsfromtheADSPexomesequencingprojectthatweremissedbyusingonealignerorusingmulti-samplegenotypingstrategyalone.Ourfindingsrevealedtherelationshipsbetweenbioinformaticspipelinesandbiomedicalresearchresults,andsuggestedthatalternativevariantcallingstrategiesmaybebeneficialforoptimalvariantdiscoveryinfaceoftoday’slargesequencingscale.
86
TOWARDEFFECTIVEMICRORNAQUANTIFICATIONFROMSMALLRNA-SEQ
PamelaRussell1,RichardRadcliffe2,BrianVestal1,WenShi1,PratyaydiptaRudra1,LauraSaba2,KaterinaKechris1
1DepartmentofBiostatisticsandInformatics,ColoradoSchoolofPublicHealth;2DepartmentofPharmaceuticalSciences,UniversityofColoradoSkaggsSchoolof
PharmacyandPharmaceuticalSciences
PamelaRussellExtensiveworkhasledtorobustquantificationmethodsforRNA-seqdataprimarilyderivedfromlargeRNAs.Manystudieshaveusedthesemethods“outofthebox”toestimatemicroRNA(miRNA)expressionfromsmallRNA-seqdata.However,thesemethodsdonoteffectivelyaddressissuesparticulartomiRNAs.Firstofall,referencebiasisamplifiedduetothesmallsizeofsequencingreadsderivedfrommiRNAs(~22nt).Thatis,withshorterreads,atruemismatchbetweenasampleandthereferencecanleadtoincorrectalignmentsorinabilitytoalignreadsatall,creatingacountbiastowardthosesampleswiththereferenceallele.Withlongerreads,singlemismatcheshavelessimpactonalignmentalgorithms.Second,anybiasforindividualmiRNAsismoreimpactfuloverallduetotherelativelysmallrepertoireofmiRNAscomparedtomRNAs.InaccuratecountsforahandfulofmiRNAscansignificantlyalteroveralllibrarycountsandthusaffectnormalization.Werefertothisissueasrepertoirebias.Also,mostmiRNAstudiesseektoidentifyfunctionalmaturemiRNAmoleculesregardlessofthepositioninthegenomethattheyareoriginallytranscribedfromorsmallnon-functionaldifferencesbetweenmiRNAsofthesamefamily.ToolsdesignedforlargeRNAsdonotaddresstherepetitivenatureandfamilystructureofmiRNAs,bydefaultreturningestimatedcountsformultipletargetsthatshouldbeconsideredequivalentbytypicalmiRNAstudyparadigms.Genome-basedmethodsoftenmapmiRNAreadstomultiplelociencodingthesamematuremiRNA.MethodsbasedonmappingdirectlytoamiRNAdatabasedonotsufferfrommultiplealignmentsduetoidenticalregionsofthegenomebutdotypicallydistinguishamongmembersofeachmiRNAfamily.Bothsourcesofmultiplemappingscanleadtomisleadingcountswhenthegoalistoelucidatefunction.Hereweexplorealltheseissuesinthecontextofcommonlyusedmethods.Wethenproposeanewhighthroughputapproachthat(1)incorporatesindividualgeneticvariationintothereferencesequenceusedforalignment,reducingreferencebias,and(2) assignseachreadtoasinglefunctionalgroupsuchasamiRNAfamily.Wedemonstratetheaccuracyofthisapproachcomparedtootherpopularmethodsusingadatasetderivedfrom206mousebrainsamples.FundedbyNIH/NIAAAAA016597,R01AA021131andR24AA013162
87
NANOPORESEQUENCINGTECHNOLOGYANDTOOLS:COMPUTATIONALANALYSISOFTHECURRENTSTATE,BOTTLENECKSANDFUTUREDIRECTIONS
DamlaSenol1,JeremieKim1,SaugataGhose1,CanAlkan2,OnurMutlu1,3
1DepartmentofElectricalandComputerEngineering,CarnegieMellonUniversity,Pittsburgh,PA,USA;2DepartmentofComputerEngineering,BilkentUniversity,Bilkent,Ankara,Turkey;
3DepartmentofComputerScience,SystemsGroup,ETHZürich,SwitzerlandDamlaSenolNanoporesequencing,apromisingsingle-moleculeDNAsequencingtechnology,exhibitsmanyattractive qualities and, in time, could potentially surpass current sequencing technologies.Nanoporesequencingpromiseshigherthroughput,lowercost,andincreasedreadlength,anditdoes not require a prior amplification step. Nanopore sequencers rely solely on theelectrochemicalstructureofthedifferentnucleotidesforidentificationandmeasurethechangeintheioniccurrentaslongstrandsofDNA(ssDNA)passthroughthenano-scaleproteinpores. Biologicalnanopores forDNAsequencingwas firstproposed in the1990s,but itwasonly justrecentlymade commercially available inMay 2014 by Oxford Nanopore Technologies (ONT).The first commercial nanopore sequencing device, MinION, is an inexpensive, pocket-sized,portable,high-throughputsequencingapparatusthatproducesreal-timedata.Thesepropertiesenable newpotential applications of genome sequencing, such as rapid surveillanceof Ebola,Zikaorotherepidemics,near-patienttesting,andotherapplicationsthatrequirereal-timedataanalysis. Inaddition,thistechnologyiscapableofgeneratingvery longreads(~50,000bp)withminimal sample preparation. Despite all these advantageous characteristics, it has onemajordrawback:higherrorrates.Inordertoprovidehigheraccuracyandhigherspeed,inMay2016,ONT released a new version of MinION with a new nanopore chemistry called R9, whichreplacedthepreviousversionR7.AlthoughR9chemistryimprovesthedataaccuracy,thetoolsused for nanopore sequence analysis are of critical importance as they should overcome thehigherrorratesofthetechnology. Ourgoalinthisworkistocomprehensivelyanalyzetoolsfornanoporesequenceanalysis,withafocusonunderstandingtheadvantages,disadvantages,andbottlenecksofthevarioustools.Tothisend,werigorouslyexaminemultiplesteps in thenanoporegenomeanalysispipeline.Thefirststep,basecalling, translatestherawsignaloutputofMinIONintonucleotidestogenerateDNA sequences. Currently,Nanocall andNanonet are publicly available nanoporebasecallers.The second stepperformsgenomeassemblywithassemblers fornoisy long reads.Usingonlythe basecalled DNA reads, assemblers generate longer contiguous fragments called draftassemblies. Currently,CanuandMiniasm are the commonlyused long-readassemblers.Afterthis step, an improved consensus sequence is generated from the draft assembly withNanopolish,andacompletewholegenomeisobtained. Weanalyzethefiveaforementionednanoporesequencingtoolsintermsoftheirspeedandaccuracy,withthegoalsofdeterminingtheirbottlenecksandfindingimprovementstothesetools.Wealsodiscusspotentialfutureworksinnanoporebasecallersandassemblers,totakebetteradvantageofnanoporesequencingandtoovercomeitscurrentdisadvantageofhigherrorrates.
88
DETECTINGOUTLIERSFROMMULTIDIMENSIONALDATAWITHAPPLICATIONINCANCER
KyleSmith1,SubhajyotiDe2,DebashisGosh1
1UniversityofColorado,2RutgersUniversity
KyleSmithOutliers,whichareverydifferentfromthetypicalcasesinacohort,bringinunexpectedchallengesfordecisionmakinginmanydifferentdisciplines.Theissueismoreacuteinoncology,sincemosttypesofcancerarehighlyheterogeneousdiseases.Evenwithinanycancersubtype,patientsshowextensivevariationintheirmolecularprofilesandclinicaloutcomes.Evenwithinacohortofcancerpatientswhohaveapparentlythesamebiomarkersandreceivedidenticaltreatment,thereareexceptionalrespondersandexceptionalnon-responders,whoareoutliers.Itissuspectedthattheiratypicalmolecularandclinicalprofilescontributetotheirexceptionalresponse.Whileidentifyingsuchoutliercasescanbenefitprecisionmedicineinitiatives,methodstodetectthemfrommultidimensionaldatahasreceivedlimitedattention.Here,weproposeanovelframeworktoidentifyoutliercancerpatientswithatypicalprofilesfrommultidimensionalgenomicdata.Wearguethatdetectionofoutlierpatientswithatypicalprofilescanhelpidentifyexceptionalrespondersandtailorprecisionmedicineinoncologyinitiatives.
89
HUEMR:INTUITIVEMININGOFELECTRONICMEDICALRECORDS
AbiodunOtolorin1,NanaOsafo2,WilliamSoutherland2
1DepartmentofCommunityandFamilyMedicine,HowardUniversity,Washington,DC;2DepartmentofBiochemistry&MolecularBiologyandtheCenterforComputational
BiologyandBioinformatics,HowardUniversity,Washington,DC
WilliamSoutherlandDespitethewidespreadadoptionofelectronicmedicalrecordsystemsandadvancesingenomics,amajorbarriertoresearchendeavorsisthelackofintuitiveuser-friendlyinteractivetoolsthatenableresearcherstoaccessandanalyzedatareadily.Inlightofthis,innovativetoolshavebeendevelopedtoaddresstheproblem.However,wehypothesizedthataninteractivedatavisualizationtoolthatiscapableofstand-aloneorpluginfunctionalitythatalsoleveragescommondataquerymethodologieswouldcontributetoresearcheffortsrequiringinterrogationofclinicalresearchdatabases.HowardUniversityHospital(HUH)isatertiaryacademicmedicalcenterwithover50,000emergencydepartmentvisitsand8,000inpatientadmissionsperyearandprimarilyprovidescaretotheminoritypopulationintheDistrictofColumbiametropolitanarea.Usingde-identifiedHUHelectronicmedicalrecordsdata,aHUHclinicalresearchdatabasewasdeveloped.Additionally,theHowardUniversityelectronicMedicalRecords(HUeMR)querytoolwasdevelopedasaweb-basedclient-serverapplicationusingjavascriptandphp.HUeMRmayfunctioninstand-aloneorpluginmode.ItsgraphicalinterfacewasbuiltusingGoogleCharts,aninteractiveopensourcevisualizationlibrary.HUeMRsupportscomplexbooleansearchoperationsspecifiedbyaninteractivequerytool.Ontologyispresentedusinglinkeddropdownmenusandqueryconstructionisdisplayedinnaturallanguageform.Dataisdisplayedusingeditableinteractivecharts.Multiplerowsofchartsmaybecreatedthatcontaindifferenttypesofdataconcepts.Queriesmayberefinedbyclickingonthechartsfollowedbyselectionofoneormoreadditionalqueryparameters.DiagnosisbasedonICDcodesorkeywordsmayalsobesearched.Thesefeaturesareillustratedinadiabetesuse-caseinvestigation.Insummary,HUeMRisasecuredataanalyticsthatcanbeuseinstand-aloneorpluginmodetoqueryingclinicalresearchdatabases.Ithasahighlyinteractiveuserinterfacethatallowsrapiddataanalysisforcohortdiscovery.Thisworkwassupportedbygrant#5G12MD007597fromtheNationalInstituteonMinorityHealthandHealthDisparitiesfromtheNIH.
90
DECIPHERINGLUNGADENOCARCINOMAMORPHOLOGYANDPROGNOSISBYINTEGRATINGOMICSANDHISTOPATHOLOGY
Kun-HsingYu1,GeraldJ.Berry2,DanielL.Rubin1,ChristopherRé3,RussB.Altman1,MichaelSnyder4
1BiomedicalInformaticsProgram,StanfordUniversity;2DepartmentofPathology,StanfordUniversity;3DepartmentofComputerScience,StanfordUniversity;
4DepartmentofGenetics,StanfordUniversity
Kun-HsingYuAdenocarcinomaaccountsformorethan40%oflungmalignancy,andmicroscopicpathologyevaluationisindispensabletoitsdiagnosis.However,howhistopathologyfindingsrelatetomolecularabnormalitiesremainslargelyunknown.Toaddressthisproblem,weobtainedhematoxylinandeosinstainedwhole-slidehistopathologyimages,pathologyreports,RNA-sequencing,andproteomicsdataof538lungadenocarcinomapatientsfromTheCancerGenomeAtlas.Weprofiledgeneexpression,proteinexpressionandmodifications,andextractedmorethan9,000objectivefeaturesfromthehistopathologyimagesofeachpatient.Wesuccessfullypredictedhistologygradewithtranscriptomicsandproteomicssignatures(areaundercurve>0.75)andidentifiedtheassociatedmolecularpathways,suchascellcycleregulation,whichprovidebiologicalinsightsintotumorcelldifferentiationgrades.Wefurtherbuiltanintegrativehistopathology-transcriptomicsmodeltogeneratesuperiorprognosticpredictionsforstageIpatients(P<0.01)comparedwithgeneexpressionorhistopathologyanalysisalone.Theseresultssuggestthattheintegrationofhistopathologyandomicsstudiescanrevealthemolecularmechanismsofpathologyfindingsandenhanceclinicalprognosticprediction,whichwillcontributetothedevelopmentofprecisioncancermedicine.Ourmethodsaregeneralizabletoothertypesofmalignancyordiseases.
91
EXPLORINGDEEPLEARNINGFORCOPYNUMBERVARIATIONDETECTIONWITHNGSDATA
Yao-zhongZhang,RuiYamaguchi,SeiyaImoto,SatoruMiyano
InstituteofMedicalScience,UniversityofTokyo
Yao-zhongZhangCopynumbervariations(CNVs)areanimportanttypeofgeneticvariationswidelyusedforprofilingcancerandothercomplexdiseases.AccuratedetectionandsummarizationofCNVshelpidentifyoncotargetandcancersubtypesforprecisionmedicine.InusingNGSdataforCNVsdetection,variousheterogeneousbiases,suchasGC-contentbiasandothernoisesareneededtobeproperlyprocessed.ThisbecomesespeciallyimportantforCNVsdetectiononsingle-cellNGSdata.Inthisstudy,weextendtraditionalHMMapproachesforCNVsdetectionwithdeeplearning.Weextractfeaturerepresentation,whichintegratetheinformationfromreadcountandobservablegenomicsequences,asthenewobservablesequenceofgenomicbinsanditerativelytrainaDNN-HMMmodelforCNVsdetection.WecompareourmethodwithotherHMMbasedCNVsdetectionmethods.
92
IMAGINGGENOMICS
POSTERPRESENTATIONS
93
PERIPHERALEPIGENETICASSOCIATIONSWITHBRAINGRAYMATTERINSCHIZOPHRENIA
DongdongLin1,VinceD.Calhoun2,JuanR.Bustillo3,NoraPerrone-Bizzozero4,JingyuLiu1
1TheMindResearchNetworkandLovelaceBiomedicalandEnvironmentalResearchInstitute,Albuquerque;2Dept.ofElectronicandComputerEngineering,UniversityofNewMexico,Albuquerque;3Dept.ofPsychiatry,UniversityofNewMexico,Albuquerque;4Dept.of
Neurosciences,UniversityofNewMexico,AlbuquerqueJingyuLiuEpigeneticregulationbyDNAmethylationandhistonemodificationhasbeenincreasinglyrecognizedforitsrelevancetoschizophrenia(SZ).Beyondthegeneticvariation,epigeneticsthroughregulationofgenetranscriptionandexpressioncanpotentiallyexplainthe‘missing’heritabilityandmediatetheeffectofgeneticrisksindisease.SpecifictoDNAmethylation,recentstudieshavedemonstratedthat6-7%ofCpGsitesacrossthegenomeshowsignificantcorrespondencebetweenbrainandblood,supportingtheinvestigationofeasilyaccessibletissuesforbrainandmentaldisorders.Inthisstudy,weanalyzedDNAmethylationof163CpGsitesfromsalivaandwholebraingraymatterdensityof108SZpatientsand105healthycontrols.Weareawareofcellularitydifferencesbetweenbloodandsaliva,andtoourbestknowledgenodetailedsaliva-braincorrespondencestudyhasbeendoneexceptgeneralcomparisonofoverallpatterns,whichindicatesalivamaybeamorecloseindicatortobrainthanblood.The163CpGsitesarelocatedwithinthe108schizophrenicriskregionsreportedbythePsychiatricGenomicsConsortiumschizophreniaworkinggroup,andalsoshowedstrongcross-tissuesimilaritybasedonthegenome-widemethylationstudyofbloodandbraintissuesbyHannon,etal.QualitycontrolandnormalizationformethylationdatawereimplementedusingminfiRpackagetoremovebatcheffect,andcelltypeproportioneffect.GraymatterdensitymapsweresegmentedbySPM12withasmoothkennelof8mm3.Weappliedindependentcomponentanalysistobothbrainimagingdataandmethylationdata,andextracted25graymatternetworks,and15methylationcomponents.Amongthem,twomethylationcomponentsweresignificantlycorrelatedtothreegraymatternetworks(falsediscoveryrate<0.05).ThefirstmethylationcomponentcomprisedtwoCpGsiteswithinandneargeneZSCAN12,andwasassociatedwithabilateralmiddle/superiortemporalnetwork(r=0.25),andabilateralsuperiorfrontalnetwork(r=-0.24).Thehigherthemethylationcomponentis,thelowerthegraymatterdensityinsuperiorfrontalgyrusandthehigherinmiddletemporalgyrusare.Moreover,SZpatientsshowedsignificantgraymatterreductioninsuperiorfrontalgyrus(p=7.9x10-5).ThesecondmethylationcomponentconsistedofCpGsitesfromtwochromosomeregions(Chr.10AS3MTandNT5C2genes,andChr.12ARL6IP4andOGFOD2genes),andwasassociatedwithcaudateandthalamusregions.Allanalyseswerecontrolledforageandgender.AlthoughwedidnotfindSZspecificmethylationdifferenceswithinSZriskregions,ourresultssuggestthatDNAmethylationpatternsinsalivaareassociatedwithbraingraymattervariation,andsomeofthisvariationisrelatedtoschizophrenia.Themainlimitationofthisstudyincludes1)thelackofreplicationdatatoverifythefindings,and2)thelackofdirectsalivaandbraintissuecorrespondenceverification.
94
THEINTERPLAYBETWEENOLIGO-TARGETSPECIFICANDGENOME-WIDEOFF-TARGETINTERACTIONS
OlgaV.Matveeva1,NafisaN.Nazipova2,AlekseyY.Ogurtsov3,SvetlanaA.Shabalina3
1BiopolymerDesignLLC,Acton,MA;2InstituteofMathematicalProblemsofBiology,Pushchino,MoscowRegion,Russia;3NationalCenterforBiotechnologyInformation,NationalLibraryof
Medicine,NationalInstitutesofHealth,Bethesda,MDSvetlanaShabalinaManytechniquesofmolecularbiologyinvolveinteractionofspecificoligonucleotideswithDNAorRNAasabasicstep.DNAtargetingofsingle-guided(sg)RNAsforgenomeeditingprocedures,oligonucleotidearraygeneexpressionmonitoringoranti-sense-mediatedgenedown-regulation,andtheGenomicComparisonHybridization(GCH)arrayexperimentsareexamplesoftechniquesinvolvingRNA-DNAandDNA-DNAinteractions.RNAiapproacheswithsiRNAandshRNAmoleculesarebasedonRNA-RNAinteractions.Themainproblemofanyoligo-probeexperimentisthatthespecificoligo-targetinteraction,basedonfullypairedduplex,areusuallycombinedwithnon-specificparallelreactions,whereoligo-probecouldinteractwithmanypartiallypairedDNAorRNAsequences.Theinterplaybetweenspecificandgenome-wideoff-targetinteractionsispoorlystudieddespiteitscrucialroleintheefficacyofthesetechniques.Inthisstudy,weinvestigatedoligo-probecharacteristics,whichareresponsiblefortheinterplay,andwhichmostimprovetheoligo-probedesign.Wedefinedspecificityofinteractionasaratiobetweenoligo-targetspecificandgenome-wideoff-targetinteractions.Microarraydatabases,derivedfromtheGCHexperimentsusingtheAffymetrixplatforms,andcontainingtwodifferenttypesofprobeswereusedfortheanalysisbasedonthethermodynamicfeaturesandnucleotidesequencesofoligo-probes.Thefirsttypeofoligo-probedoesnothaveaspecifictargetonthegenomeandtheirhybridizationsignalsarederivedfromgenome-widecross-hybridizationalone.ThesecondtypeincludesoligonucleotidesthathaveaspecifictargetonthegenomicDNAandtheirsignalsarederivedfromspecificandcross-hybridizationcomponentscombinedtogetherinatotalsignal.Theanalysishasrevealedthathybridizationspecificitywasnegativelyaffectedbylowstabilityofthefully-pairedoligo-targetduplex,stableprobeself-folding,G-richcontent,includingGGGmotifs,lowsequenceSymmetricalComplexity(SC)score.TheSC-scorecharacterizesnucleotidecompositionsymmetryandprobe’svulnerabilitytooff-targetinteractions.Filteringouttheprobeswiththesecharacteristicssignificantlyincreaseshybridizationspecificitybydecreasinggenome-widecross-hybridizationorbyincreasingspecificinteractions.Selectedoligo-probeshavethreetimeshigherhybridizationspecificityonaverage,comparedtotheprobesthatwerefilteredoutfromtheanalysisbyapplyingsuggestedcut-offthresholdstothedescribedparameters.Multipleregressionmodelswithdescribedparametersweresuccessfullyappliedforpredictionsofinteractionspecificityandoff-targeteffectsandsupportedparameterchoice(P<0.001).WealsocomparedprobecharacteristicsselectedfortheanalysisinmicroarraydatabaseswithapplicablefeaturesofsiRNA/shRNAdesignfromourearlierstudies.WeappliedallselectedoligonucleotidefeaturesanddescribedparameterstonewsetsofsgRNAs.Ourstudyexaminedthethermodynamicsandsequence-intrinsicpropertiesofsgRNA-DNAduplexesandanalyzedadditionalselectioncriteriathatarecriticalforguideefficacy.Finally,weidentifyuniversalfeaturesofoligo-probes,si/shRNAsandguidesforoptimaldesignincludingtheSC-score.
95
PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM?
POSTERPRESENTATIONS
96
WARS2IMPLICATEDASACOMMONMODIFIEROFMETFORMINMETABOLITEBIOMARKERSINABIOBANKCOHORT
AlyssaI.Clay1,RichardM.Weinshilboum2,K.SreekumaranNair3,RimaF.Kaddurah-Daouk4,LieweiWang2,MatthewK.Breitenstein1
1DivisionofEpidemiology,MayoClinic;2DepartmentofMolecularPharmacologyandExperimentalTherapeutics,MayoClinic;3DivisionofEndocrinology,MayoClinic;4Duke
UniversityMatthewBreitensteinBackgroundMetforminisoneofthemostwidelyprescribeddrugsworldwideandafirstlinetreatmentfortype2diabetesmellitus(T2D).Metforminhasmanymechanismsofaction,withvaryinglevelsofunderstanding.Metforminisbeingevaluatedasapotentialchemopreventionagentforcancertreatment,withinhibitionofangiogenesisasoneaffectofmetforminbeingstronglypursued.However,contradictoryevidenceexistsforapotentialmechanismofangiogenesisinhibition(Carcinogenesis2014;(35)5).Buildingonourpriorworkthatidentifiedstratumofstatisticallycorrelatedmetabolites,weaimedtoidentifyoverlappingmetforminpharmacogenomic(PGx)SNPassociations,usingpharmacometabolomicsinformedPGxpairedwithanagnosticcomputationalapproach.MethodsToelucidateoverlappingPGxsignalsofmetforminexposure,weincludedmetabolites(n=5)withcorrelatedplasmaconcentration,adjustedformetforminexposure,inabiobankcohort-based,case-controlstudy.Cases(n=274)wereexposedtometforminmonotherapywithT2D;healthycontrols(n=274)hadnoknowndrugexposures.Casesandcontrolswerematchedbyageandgender,andadjustedforBMIandbatch.Apanelofaminoacidmetabolite(n=42)concentrationswasquantitativelymeasuredusingtandemliquidchromatography-massspectrometryfromfastingplateletpoorplasmasamplescollectedinEDTA.Genotypingwasperformedusingthe700kSNPIlluminaOmniExpressarrayplatformfrom250ngofDNA.Normalizedmetaboliteconcentrationswereutilizedasendpointstoinformgenomewideassociations.ResultsIncreasedplasmametaboliteconcentrationsforleucine(t=4.47,p=<0.0001),isoleuceine(t=4.63,p=<0.0001),andvaline(t=4.48,p=<0.0001)wereobservedwithexposuretometformin.Variantrs17023164(MAF=0.31),intheTryptophanylTRNASynthetase2,Mitochondrial(WARS2)generegionofchromosome1andaneQTLforWARS2infibroblasts,wasacommondownwardmodifierofleucine(β=-11.69,p=1.79e-7),isoleuceine(β=-6.99,p=2.40e-6),andvaline(β=-14.55,p=1.04e-5)withmetforminexposure.NoSNPsinneighboringgenesregionswereinhighLD(R^2>0.5)withrs17023164.ConclusionIncreasedplasmametaboliteconcentrationsforleucine,valine,andisoleucinewereobservedwithmetforminexposure.Acommonvariant,rs17023164inWARS2,wasidentifiedasastrongdownwardmodifierofthesemetaboliteswithmetforminexposure.Independently,WARS2isproposedasadeterminantofangiogenesis(NatCom2016;(7)12061).Wepositahypothesis:modificationofmetabolitebiomarkerconcentrationassociatedwithmetforminexposurebyWARS2variantsisapotentiallinkbetweenmetforminandangiogenesis.Functionalcharacterizationofapotentialmechanismformetformininhibitionofangiogenesis,modifiedbyWARS2,isongoing.
97
ESTIMATIONOFFALSENEGATIVERATESVIAEMBEDDINGSIMULATEDEVENTS
StephenV.Gliske1,KatyL.Lau1,BenjaminH.Brinkman2,GregA.Worrell2,CrisG.Fink3,WilliamC.Stacey1
1UniversityofMichigan,2MayoClinic,3OhioWesleyanUniversity
StephenGliskAutomatedeventdetectionistheresultofmanytypesofdata-drivenpatternrecognitionmethods.Oneofthegeneralchallengestotheseanalyzesisthequantificationandcorrectionforfalsenegativedetections,i.e.,caseswheretheevent(pattern)ispresentinthedatabutwasnotdetected.Estimatingthefalsepositiverateismucheasier,ashumanreviewofasubsampleofdetectedeventsissufficient.However,determiningthefalsenegativeratebyhumanreviewwouldrequiremanualsearchingthroughtherawdata,whichisimpractical,ifnotcompletelyinfeasible.Thischallengeisnotuniquetobiomedicaldataandiscommonlyaddressedinhighenergyphysics.Theapproachiscalledembedding.Itisapplicabletoanyanalysiswhereatleastoneofthesignalorbackgroundcanbemodeledwellbysimulations.Byplacingspecificeventsatknownlocations,onecanthenruntheautomateddetectorandreportthefractionofembeddedeventsthatweredetected.Wepresentthefirstapplicationofembeddingtoneurologicaldata,specificallytheautomateddetectionofabiomarkerofepilepsy(highfrequencyoscillations)recordedinintracranialelectroencephalogram(EEG)data.Thefalsenegativerateisfoundtobeconsistentacrossbothrecordingchannelandacrosspatients.
98
INTEGRATIVE,INTERPRETABLEDEEPLEARNINGFRAMEWORKSFORREGULATORYGENOMICSANDEPIGENOMICS
ChuanShengFoo,AvantiShrikumar,JohnnyIsraeli,PeytonGreenside,ChrisProbert,AnnaScherbina,RahulMohan,NathanBoley,AnshulKundaje
StanfordUniversity
AnshulKundajeWepresentgeneralizableandinterpretablesuperviseddeeplearningframeworkstopredictregulatoryandepigeneticstateofputativefunctionalgenomicelementsbyintegratingrawDNAsequencewithdiversechromatinassayssuchasATAC-seq,DNase-seqorMNase-seq.First,wedevelopnovelmulti-channel,multi-modalCNNsthatintegrateDNAsequenceandchromatinaccessibityprofiles(DNase-seqorATAC-seq)topredictin-vivobindingsitesofadiversesetoftranscriptionfactors(TF)acrosscelltypeswithhighaccuracy.Ourintegrativemodelsprovidesignificantimprovementsoverotherstate-of-the-artmethodsincludingrecentlypublisheddeeplearningTFbindingmodels.Next,wetrainmulti-task,multi-modaldeepCNNstosimultaneouslypredictmultiplehistonemodificationsandcombinatorialchromatinstateatregulatoryelementsbyintegratingDNAsequence,RNA-seqandATAC-seqoracombinationofDNase-seqandMNase-seq.Ourmodelsachievehighpredictionaccuracyevenacrosscell-typesrevealingafundamentalpredictiverelationshipbetweenchromatinarchitectureandhistonemodifications.Finally,wedevelopDeepLIFT(DeepLinearImportanceFeatureTracker),anovelinterpretationengineforextractingpredictiveandbiologicalmeaningfulpatternsfromdeepneuralnetworks(DNNs)fordiversegenomicdatatypes.DeepLIFTcanintegratethecombinedeffectsofmultiplecooperatingfiltersandcomputeimportancescoresaccountingforredundantpatterns.WeapplyDeepLIFTonourmodelstoobtainunifiedTFsequenceaffinitymodels,inferhighresolutionpointbindingeventsofTFs,dissectregulatorysequencegrammarsinvolvinghomodimerandheterodimericbindingwithco-factors,learnpredictivechromatinarchitecturalfeaturesandunravelthesequenceandarchitecturalheterogeneityofregulatoryelements.
99
VISUALIZATIONOFCOMPLEXDISEASESANDRELATEDGENESETS
ModestvonKorff,TobiasFink,ThomasSander
ActelionPharmaceuticalsLtd.,Allschwil,Switzerland
ModestvonKorffTherelationsbetweengenesanddiseasesformcomplexpatterns.Visualizationofthesepatternsenablesthescientisttoobtainanoverviewofthemostimportantgene–diseaserelations.Thesegene–diseaserelationsareofhighimportanceindrugdiscovery.Proteinsencodedbydisease-relatedgenesarepotentialtargetsfornewdrugsormaybecomebiomarkersfordiseasediagnosis.Bothanoveldrugtargetandabiomarkershouldbehighlyspecificfortheaimeddisease.Inourpublicationforthisconference,weintroducearelevanceestimator.Thisrelevanceestimatorisameasureofthespecificityofagene–diseaserelationshipthatalsotakesintoconsiderationallotherknowngene–diseaserelationships.Weanalyzedgene–diseaserelationshipsfrom22millionPubMedrecordsandobtainedamatrixwithrelevanceestimatorsforabout5000diseasesand15,000genes.Thisrelevancematrixenabledustoexpressthesimilaritybetweendiseaseswithsimplevector-baseddistancemeasures.Ameaningfuldisease–gene–diseasevisualization,consistingofseverallayers,wasderivedfromthesedisease–diseasesimilaritymeasuresandtherelevanceestimators.Themultidimensionalvisualizationspresentedheregiveanoverviewofcomplexdiseaseslikeasthma,Alzheimer'sdiseaseandhypertension.
100
PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES
POSTERPRESENTATIONS
101
FINDINGSFROMTHEFOURTHCRITICALASSESSMENTOFGENOMEINTERPRETATION,ACOMMUNITYEXPERIMENTTOEVALUATEPHENOTYPE
PREDICTION
StevenE.Brenner1,GaiaAndreoletti1,RogerAHoskins1,JohnMoult2,CAGIParticipants,
1UniversityofCalifornia,Berkeley;2IBBR,UniversityofMaryland,Rockville,MD
StevenBrennerTheCriticalAssessmentofGenomeInterpretation(CAGI,\'kā-jē\)isacommunityexperimenttoobjectivelyassesscomputationalmethodsforpredictingthephenotypicimpactsofgenomicvariation.CAGIparticipantsareprovidedgeneticvariantsandmakepredictionsofresultingphenotype.Thesepredictionsareevaluatedagainstexperimentalcharacterizationsbyindependentassessors.
ThefourthCAGIexperimentconcludedthisyear.Itincluded11challengeswhichreflected:non-synonymousvariantsandtheirbiochemicalimpactmeasuredbytargetedassays;noncodingregulatoryvariantsandtheirimpactongeneexpression;researchexomesforpredictionofcomplextraits;personalgenomesandtraitprofiles;andclinicalsequencesandassociatedreferringindications.
TherewerenotablediscoveriesthroughouttheCAGIexperiment,andgeneralthemesemerged.Theindependentassessmentfoundthattopmissensepredictionmethodsarehighlystatisticallysignificant,butindividualvariantaccuracyislimited.Moreover,missensemethodstendtocorrelatebetterwitheachotherthanwithexperiments(forreasonsthatmayreflectthepredictivemethodsandtheassaysthemselves).However,theremightbepotentialformissenseinterpretationattheextremeofthedistribution.Structure-basedmissensemethodsexcelinafewcases,whileevolutionary-basedmethodshavemoreconsistentperformance.Bespokeapproachesoftenenhanceperformance.
Ontheclinicalstudies,predictorswereabletoidentifycausalvariantsthatwereoverlookedbytheclinicallaboratory,anditappearsthatphysiciansmaynotalwaysorderthemostrelevantgenetictestfortheirpatients.CAGIdatashowthatrunningmultipleuncalibratedmethodsandconsideringtheirconsensusoftenprovidesundueconfidenceintheircorrelation;wethereforeadviseagainstrunningmultipleuncalibratedvariantinterpretationtoolsinclinicalanalysis.
Theresultsshowedthatpredictingcomplextraitsfromexomesisfraught.Interpretationofnon-codingvariantsshowspromisebutisnotatthelevelofmissense.Beyondthis,creatingageneticstudythatprovidesareliablegoldstandardisremarkablydifficult.However,therewerenotableimprovementsintheabilitytomatchgenomestotraitprofiles.
CompleteinformationaboutCAGImaybefoundathttps://genomeinterpretation.org.
102
ASTROLABE:EXPANSIONTOCYP2C9ANDCYP2C1
AndreaGaedigk1,GreysonP.Twist2,SarahSoden2,EmilyG.Farrow2,NeilA.Miller2
1DivisionofClinicalPharmacology&TherapeuticInnovation,Children'sMercy,KansasCity,SchoolofMedicine,UniversityofMissouri-KansasCity;2CenterforPediatricGenomic
Medicine,Children'sMercy,KansasCity
AndreaGaedigkBackground:CYP2C9and19arehighlypolymorphicpharmacogenesmetabolizingnumerousdrugs.BotharegeneswithCPICguidelinesunderscoringtheirclinicalrelevance.Tofacilitatehaplotypecallingandtranslationintophenotype,wehavedevelopedaprobabilisticscoringsystem,Astrolabe(initiallycalledConstellation;Twistetal2016,GenMed1:15007)thatenablesautomatedCYP2D6diplotypecallingfromwholegenomesequencing.WereportheretheextensionofAstrolabetoCYP2C9and2C19.Methods:ThestudywasapprovedbytheInstitutionalReviewBoardofChildren’sMercyandincluded85subjects(7HapMap;78patients/parents).AlleledefinitionsareaccordingtotheP450NomenclatureDatabase(cypalleles.ki.se/)withsomemodifications.Exonsand100bpofflankingintronswereusedforAstrolabecallsaswellas-2990to-440ofCYP2C9and-1063to-180ofCYP2C19harboringSNPsdefiningCYP2C9*8andCYP2C19*27,respectively.Allbut3subjectsweregenotypedforCYP2C9*2,*3,*5and*8andCYP2C19*2-*4,*17,*27and*35usingTaqManassaystovalidateAstrolabecalls.WGSdatawerereanalyzedwiththeDRAGENBio-ITprocessor(EdicoGenome)toimprovevariationcallquality.ToaccountforhaplotypeanddiplotypecombinationsnotobservedinoursamplesetsimulationsofallpossiblediplotypecombinationswereperformedusingtheARTreadsimulatorandDRAGENanalysispipeline.Astrolabeisavailableathttps://www.childrensmercy.org/genomesoftwareportal/Results:TomaximizeAstrolabecallaccuracy,intronregionswereadjustedtoincludeinformativeSNPswhileexcludingthosethatoccuronnumeroushaplotypesand/orarenotpartofadefinedallele.TheCYP2C9exon1region,e.g.waslimitedto57bpofintron1toexclude251T>C,whichispresentin1155/3540subjects(CMHvariantwarehousedatabase).ThisSNPdefinesCYP2C*29,butinterferedwithAstrolabecallsbyovercallingCYP2C*29intheabsenceofitskeySNP(33437C>A).OptimizedcallingtargetregionswerethenusedtocompareAstrolabewithgenotypecalls.Astrolabecorrectlycalled68/75(90.67%)and71/75(94.67%)ofsubjectsforCYP2C9and19,respectively.AmongtheallelesdetectedbyAstrolabeandgenotypingwereCYP2C9*2,*3and*8andCYP2C19*2,*17,*27and*35.AstrolabealsoidentifiedsubjectscarryingtherareCYP2C9*9and*11andCYP2C19*15alleleswhichwerenotcoveredbygenotyping.Astrolabecorrectlycalled1077/1128simulatedCYP2C19diplotypes(95%recall;45missedand6multiplecalls).Allmissedcallswere*12calledas*1.ForCYP2C9,Astrolabecorrectlycalled2186/2278simulateddiplotypes(95%recall;61missedand31multiplecalls).Allmissedcallswere*25calledas*1.Discussion:Astrolabe’sfunctionalitywassuccessfullyexpandedtoCYP2C9and19.PhenotypepredictionbasedonAstrolabewassuperioroverthatderivedfromalimitedgenotypepanel.ContinuedimprovementandexpansionofthenomenclaturedefinitionswillallowustoresolvethemiscalledhaplotypesrepresentedinthesimulationsetandimproveAstrolabecallingacrossalldiplotypes.
103
HUMANKINASESDISPLAYMUTATIONALHOTSPOTSATCOGNATEPOSITIONSWITHINCANCER
JonathanGallion,AngelaD.Wilkins,OlivierLichtarge
BaylorCollegeofMedicine
JonathanGallionThediscoveryofdrivergenesisamajorpursuitofcancergenomics,usuallybasedonobservingthesamemutationindifferentpatients.Buttheheterogeneityofcancerpathwaysplusthehighbackgroundmutationalfrequencyoftumorcellsoftencloudthedistinctionbetweenlessfrequentdriversandinnocentpassengermutations.Here,toovercomethesedisadvantages,wegroupedtogethermutationsfromclosekinaseparalogsunderthehypothesisthatcognatemutationsmayfunctionallyfavorcancercellsinsimilarways.Indeed,wefindthatkinaseparalogsoftenbearmutationstothesamesubstitutedaminoacidatthesamealignedpositionsandwithalargepredictedEvolutionaryAction.Functionally,thesehighEvolutionaryAction,non-randommutationsaffectknownkinasemotifs,butstrikingly,theydosodifferentlyamongdifferentkinasetypesandcancers,consistentwithdifferencesinselectivepressures.Takentogether,theseresultssuggestthatcancerpathwaysmayflexiblydistributeadependenceonagivenfunctionalmutationamongmultipleclosekinaseparalogs.Therecognitionofthis“mutationaldelocalization”ofcancerdriversamonggroupsofparalogsisanewphenomenathatmayhelpbetteridentifyrelevantmechanismsandthereforeeventuallyguidepersonalizedtherapy.
104
SCOTCH:ANOVELMETHODTODETECTINSERTIONSANDDELETIONSFROMNGSDATA
RachelGoldfeder,EuanAshley
StanfordUniversity
RachelGoldfederClinical-gradegenomesequencingandinterpretationrequiresaccurateandcompletegenotypecallsacrosstheentiregenome.Whilesinglenucleotidevariantdetectionishighlyaccurateandconsistent,thesevariantsexplainonlyasmallfractionofdiseaserisk.Othertypesofvariationthatdisrupttheopenreadingframe,suchasinsertionsanddeletions(INDELs),aremorelikelytobeharmful.However,currentmethodshavelowsensitivityforlarger(>=fivebases)INDELs,primarilyduetochallengessurroundingaligningsequencereadsthatspanINDELs.WepresentScotch,anovelINDELdetectionmethodthatleveragessignaturesofpoorreadalignment,readdepthinformation,andmachinelearningapproachestoaccuratelyidentifyINDELsfromnext-generationDNAsequencingdata.Usingbiologicallyrealisticsimulatedgenomesandsequencereadswithtechnologicallyrepresentativeerrorprofiles(generatedbyART),weevaluateScotchandseveralcurrentlyavailableINDELcallers.WeshowthatScotchhashighersensitivitythancurrentmethods,particularlyforlargerINDELs.Finally,wevalidateINDELsthatScotchdiscoveredinoneindividual,NA12878,andshowthatScotchhashighpositivepredictivevalue.ThismethodwillenableresearchersandclinicianstomoreaccuratelyidentifyINDELsassociatedwithpreviouslyunexplainedgeneticconditions.
105
MAYOOMICSREPOSITORYFORTRANSLATIONALMEDICINE
IainHorton,JeanetteEckel-Passow,StevenHart,ShannonMcDonnell,DavidMead,GayReed,GregDougherty,JasonRoss,JulieSwank,MarkMyers,MathieuWiepert,Rama
Volety,TonyStai,YaxiongLin,RobertFreimuth
MayoClinic
IainHortonTheMayoClinicGenomicDataWarehousehasestablishedtheinfrastructurefoundation,processes,andapplicationstomeetthetranslationalneedsoftheMayoClinicCenterforIndividualizedMedicine(CIM).Throughthestreamlinedandautomateddatapipeline,thenext-gensequencing(NGS)resultsareloadedandintegratedwithclinicaldata,providingthefoundationforthedevelopmentofrevolutionarysolutionsanddiscoveryintheclinicalpracticeandgenomicresearch.Initiatedin2012,withproductiondataingestionbeginninginearly2014,MayoClinic'sTranslationalResearchCenter(TRC)hasprovidedthecornerstoneplatformfordatacentricactivitieswithinCIM.DatageneratedfromboththeclinicalpipelineandresearchpipelineareautomaticallyloadedintoTRCwitheachnewbitaddingvalueandpowertothesystem.Twokeysolutionswithsignificantpotentialofimpactingpatientcareandscientificdiscoveryhavebeenbuiltonthisgenomicdatawarehouse.FirstistheMolecularDecisionSupportsystem,arule-basedpharmacogenomicssystemthatenablesMayoClinicclinicianstointegrateactionableinformationbasedonapatient'sgenotypeinformationatthepointofcareusingNGSdata.SecondistheMayoVariantSummaryapplication,acloud-nativesystemwhichempowersMayoClinicresearcherstoidentifyrareandactionablegenomicvariantsthroughdynamicfilteringandgroupingofsubjectphenotypeandspecimenmetadata.
106
PHARMACOGENOMICSCLINICALANNOTATIONTOOL(PHARMCAT)
T.E.Klein1,M.Whirl-Carrillo1,R.M.Whaley1,M.Woon1,K.Sangkuhl1,LesterG.Carter1,H.M.Dunnenberger2,P.E.Empey3,A.T.Frase4,R.R.Freimuth5,A.Gaedigk6,A.Gordon7,C.Haidar8,J.K.Hicks9,J.M.Hoffman8,M.T.Lee10,N.Miller11,S.D.Mooney12,T.N.Person13,J.F.Peterson14,M.V.Relling8,S.A.Scott15,G.Twist11,A.Verma13,M.S.Williams10,C.Wu16,W.Yang8,M.D.Ritchie4,13
1DeptGenetics,StanfordUniv,Stanford,CA;2CenterforMolecularMedicine,NorthShoreUniversityHealthSystem,EvanstonIL;3DepartmentofPharmacyandTherapeutics,SchoolofPharmacy,
UniversityofPittsburgh;4DepartmentofBiochemistryandMolecularBiology,ThePennsylvaniaStateUniversity,UniversityPark,PA;5DepartmentofHealthSciencesResearch,MayoClinic,RochesterMN;6DivisionofClinicalPharmacology,Toxicology&TherapeuticInnovation,Children’sMercy-
KansasCity,KansasCity,MO;7DepartmentofMedicine,DivisionofMedicalGenetics,UniversityofWashington,Seattle,WA;8St.JudeChildren'sResearchHospital,Memphis,TN;9DeBartoloFamilyPersonalizedMedicineInstitute,H.LeeMoffittCancerCenter,Tampa,FL;10GenomicMedicine
Institute,GeisingerHealthSystem,Danville,PA;11CenterforPediatricGenomicMedicine,Children’sMercy,KansasCity,MO;12DepartmentofBiomedicalInformaticsandMedicalEducation,UniversityofWashington,Seattle,WA;13BiomedicalandTranslationalInformatics,GeisingerHealthSystem,Danville,PA;14VanderbiltUniversityMedicalCenter,Nashville,TN;15DepartmentofGeneticsand
GenomicSciences,IcahnSchoolofMedicineatMountSinai,NewYork,NY;16DepartmentofMolecularandExperimentalMedicine,TheScrippsResearchInstitute,LaJolla,CA
TeriKleinPharmacogenomics(PGx)decisionsupportandreturnofresultsisanactiveareaofgenomicmedicineimplementationatmanyhealthcareorganizationsandacademicmedicalcenters.TheClinicalPharmacogeneticsImplementationConsortium(CPIC)hasestablishedguidelinessurroundinggene-drugpairsthatcanandshouldleadtoprescribingmodificationsbasedongeneticvariant(s).OneofthechallengesinimplementingPGxisextractinggenomicvariantsandassigninghaplotypes(includingstar-alleles)fromgeneticdataderivedfromsequencingandgenotypingtechnologiesinordertoapplytheprescribingrecommendationsofCPICguidelines.InacollaborationbetweenthePGRNStatisticalAnalysisResource(P-STAR),ThePharmacogenomicsKnowledgebase(PharmGKB),theClinicalGenomeResource(ClinGen),andCPIC,wearedevelopingasoftwaretooltoextractallvariantsfromCPIClevel-AgeneswiththeexceptionofG6PDandHLA,fromageneticdatasetresultingfromsequencingorgenotypingtechnologies(representedasa.vcf),interpretthevariantalleles,inferdiplotypes,andgenerateaninterpretationreportbasedonCPICguidelines.TheCPICpipelinereportcanthenbeusedtoinformprescribingdecisions.WeassembledafocusgroupofthoughtleadersinPGxtobrainstormtheissuesandtodesignthesoftwarepipeline.Wehostedaone-weekHackathonatthePharmGKBatStanfordUniversitytobringtogethercomputerprogrammerswithscientificcuratorstoimplementthefirstversionofthistool.Throughthisprocess,wehaveuncoveredmanyofthechallengessurroundingPGximplementation.Forexample,theinferenceofdiplotypesischallengingforseveralCPIClevel-Agenes.ThissoftwarepipelinewillbemadeavailableundertheMozillaPublicLicense(MPL2.0)anddisseminatedinGithubforthescientificandclinicalcommunitytotest,explore,andimprove.PharmCATwillprovideasolutionthatwillenablesitesimplementingPGxawaytomoreconsistentlyinterpretgenomicresultsandlinkthoseresultstopublishedclinicalguidelines.Furthermore,weareassembling(andwillbemaintaining)thetranslationtablesthatunderliethetool,whichwillsignificantlyreducetheeffortrequiredtoimplementPGxclinicallyandensuremoreuniforminterpretationsofPGxknowledge.Asprecisionmedicinecontinuestomoveintoclinicalpractice,implementationworkflowsforPGx,likePharmCAT,wouldenablestandardizedandconsistentimplementationofPGxgenes.
107
PCSK9MODULATINGVARIANTSINFAMILIALHYPERCHOLESTEROLEMIA
SarathbabuKrishnamurthy1,DianeSmelser1,ManickamKandamurugu1,JosephLeader1,NouraS.Abul-Husn2,AlanR.Shuldiner2,DavidH.Ledbetter1,FrederickE.Dewey2,David
J.Carey1,MichaelF.Murray1,RaghuP.R.Metpally1
1GeisingerHealthSystem;2RegeneronGeneticsCenter
SarathbabuKrishnamurthyBACKGROUND:Highlypenetrantautosomaldominantfamilialhypercholesterolemia(FH)isknowntobecausedbypathogeniclossoffunction(LOF)variantsinLDLRandgainoffunctionvariantsinPCSK9andAPOBgenes.InadditiontoitscausativeroleinFH,PCSK9LOFvariantsareassociatedwithloweringofserumlowdensitylipoproteincholesterol(LDL-C)andtotalcholesterol.Theaimsofthisstudywereto1.IdentifyrarenovelPCSK9genevariantsthatleadtocompleteorpartiallossofproteinfunctionintheDiscovEHRcohort.2.ExploreprevalenceofPCSK9LOFvariantsinasubsetofFHpatientsand3.ExaminewhetherFHpatientscarryingPCSK9LOFsshowassociationwithloweringtheplasmalowdensityLDL-Candcardiovascularrisk.METHODS:Weanalyzedwholeexomesequencesfrom51,289individualsintheDiscovEHRcohort,whoconsentedtoparticipateintheGeisingerHealthSystem’sMyCodeCommunityHealthInitiative.Raremissenseandpredictivelossoffunction(pLOF)codingvariantsinPCSK9wereidentifiedbyintegratingbioinformaticsandevaluatingLDL-Candtotalcholesterolmeasuresfromtheelectronichealthrecords(EHR).RESULTS:IntheoverallDiscovEHRcohort,weidentified20missenseand13pLOFs(2splicedonor,6stopgainedand5frameshift)rarevariantsinPCSK9,including15novelvariantsthatwereassociatedwithlowerLDL-Candtotalcholesterollevels.LDL-CinpLOFcarrierswassignificantlylowerthaninmissensecarrierswithpresumedpartiallossoffunction(p<0.0012).PatientswithPCSK9raremissensewithpresumedpartialLOForLOFvariantshadsignificantreductionintheincidenceofcoronaryeventscomparedtothecontrolgroup(p<0.0001).InFHpatients,theLDL-loweringPCSK9R46Lvariantpreviouslyreportedas3%prevalencewasfoundtobeenrichedat9.6%andwasassociatedwithlowerLDL-CcomparedtoFHpatientsnotcarryinganR46Lallele.AnovelPCSK9missensevariant(G316S)wasalsopresentinFHpatientswithaprevalenceof0.8%andalsoshowedanLDL-loweringphenotypiceffectinanimputedfamilypedigree.CONCLUSIONS:Overall11.8%oftheFHpatientsintheDiscovEHRcohortwereidentifiedtoalsocarryaPCSK9variantwhichmodulatestheirLDL-Candserumcholesterollevels.
108
INTEGRATIVENETWORKANALYSISOFPROSTATETISSUELINCRNA-MRNAEXPRESSIONPROFILESREVEALSPOTENTIALREGULATORYMECHANISMSOF
PROSTATECANCERRISKLOCI
NicholasB.Larson1,ShannonMcDonnell1,ZachFogarty1,MelissaLarson1,JohnCheville2,ShaunRiska1,SaurabhBaheti1,AshaA.Nair1,DanielO’Brien1,JaimeDavila1,DanielSchaid1,StephenN.
Thibodeau21DepartmentofHealthSciencesResearch,MayoClinic,Rochester,MN;2Departmentof
LaboratoryMedicineandPathology,MayoClinic,Rochester,MN
NicholasLarsonLarge-scalegenome-wideassociationstudieshaveidentified146lociassociatedwithriskofdevelopingprostatecancer(PRCA).However,mostoftheselocidonotlieincloseproximitytoproteincodinggenesandarepresumedtoberegulatoryinnature.DownstreamregulationofproteincodinggenesrelatedtoPRCAdevelopmentmaybemediatedbycis-actingregulationofnearbytranscripts,alsoknownascis-mediatedtrans-eQTLs.Thiscis-mediatorcausalrelationshipiscomprisedofaregulatoryvariant,anearbycis-regulatedgene,andthedownstreamregulatedtranstargetgene.Cis-mediatorsmayincludetranscriptionfactors,signalingproteins,andlongintergenicnon-codingRNAs(lincRNAs).LincRNAscorrespondtoahostofregulatoryfunctionssuchaschromatinremodelingandtranscriptionalco-activation,andhavepreviouslybeenidentifiedasdiagnosticandprognosticbiomarkersforanumberofcancers.Howevertheirroleincancerdevelopmentandprogressionispoorlyunderstood.Toexplorethehypothesisthatcis-mediatedtranseQTLsmayplayaroleinPRCArisk,weleveragedaneQTLdatasetof471samplesofnormalprostatetissuefromprostate/bladdercancerpatientswithavailableRNA-SeqandimputedIlluminaInfinium2.5Mgenotypedata.Wefirstconductedaninitialtranscriptome-wideeQTLscreeningofalllincRNAsandmRNAswith8,073SNPsinhighlinkagedisequilibrium(r2>0.5)withpreviouslyidentifiedPRCArisk-associatedvariants,identifyingapproximately5000transcripts(FDR<0.10)tobeputativelyassociated(cisortrans).WethenconstructedanundirectedGaussiangraphicalregulatorynetworkfromtheexpressionprofilesofthistranscriptsubset,identifying87,468connections.Toidentifycandidatecis-mediatornode-pairsintheexpressionnetwork,weisolatedasubsetofcis-associatedtranscripts(lincRNAormRNA)atastrictBonferronisignificancethreshold.WethenidentifiedallconnectedmRNAnodestothesecis-nodesthatdistaltothecis-variant(>1Mb)andhadevidenceofatrans-associationwiththecisvariant(P<1E-04),resultingin9candidatecis-mediatortrios.Finally,weappliedcausalmediationanalysistotesttheproportionofthetrans-associationthatismediatedbythecis-regulatedtranscript,resultingin7/9significantcis-mediatorrelationships.TranscriptionfactorHNF1Bwasidentifiedtobeasignificantmediatorinthetrans-associationsbetweenrs11263762andthreemRNAs:SRC,MIA2,andSEMA6A.AllthreeexhibitedconcomitantupregulationwithHNF1B.Notably,HNF1AhasbeenshowntostimulateSRCexpressionviaanalternativepromoter,whileMIA2isalsoaknownHNF1Atarget.DysregulationofSEMA6AhasbeenobservedinPRCAmetastasesandplaysapotentialroleinangiogenesisinteractingwithVEGFR2.MSMBandNDRG1bothdemonstrateandrogen-stimulatedexpressioninprostatetissue,andindicatedarecessivepatternofexpressiondysregulationwithrs10993994.Despiteasmallsamplesize,wereplicatedmultipletrans-eQTLsfromthesecis-mediatortriosintheGTExprostatetissueeQTLdataset(P<0.05).Together,ourfindingssuggestdysregulationofRNAexpressionmayplayaroleingeneticpredispositiontoPRCA.
109
INTEGRATEDANALYSISOFGENOMICS,PROTEOMICS,ANDPHOSPHOPROTEOMICSINCELLSANDTUMORSAMPLES
JasonE.McDermott1,TaoLiu1,SamuelPayne1,VladislavPetyuk1,RichardSmith1,PhilippMertins2,StevenCarr2,KarinRodland1
1PacificNorthwestNationalLaborator,2BroadInstitute
JasonMcDermottAspartoftheClinicalProteomicTumorAnalysisConsortium(CPTAC),wehaverecentlypublishedthefirstlarge-scaleproteomicandphosphoproteomicanalysisofhigh-gradeserousovariantumors.Weobservedthatphosphorylationstatuswasanexcellentindicatorofpathwayactivityandcoulddiscriminatebetweenpatientsurvivaltimes.Inthecurrentworkwehavecombinedthisdatawithcomparabledatafrombreastcancertumorsandcancercelllinestreatedwithkinaseinhibitors,toanswerseveralfundamentalquestionsabouttheroleofphosphorylationincellularprocessesandcancer.Thetotaldatasetcomprisedover150sampleswithverydeepproteomiccoverage(>20,000phosphopeptidesconfidentlyidentified).Wefirstfoundthatthecorrelationbetweenkinaseproteinabundanceandabundanceofphosphorylatedtargetpeptideswasverylow,indicatingthatkinaseabundanceisnotagoodproxyforphosphorylationstatusoverall.However,highlycorrelatedkinase-substratepairsweresignificantlymorelikelytobetruerelationships(fromexistingknowledge),demonstratingthatthismethodcouldbeusedtopredictnovelkinasetargetsinsomecases.Weusedthisanalysistoidentifyseveralnovelkinase-substraterelationshipsthatweredifferentialbetweentumorsubtypes,andthatcorrelatedwithpathwayswherephosphorylationwasaffectedbydrugtreatment.Theserelationshipsarecurrentlyunderinvestigationaspotentialnoveltargetsfortherapeuticintervention.Tobetteranalyzecancer-relevantpathwayactivitywedevelopedanovelapproachthatcharacterizescorrelation,differentialabundance,andstatisticalinteractionsbetweencomponentstoanalyzemultipleomicstypesinthecontextofsignalingandfunctionalpathways.Weusedthisapproach,calledtheLayeredEnrichmentAnalysisofPathways(LEAP),toidentifyactivepathwaysinmolecularsubtypesofovarianandbreastcancer,andseveralnovelsubpopulationsofpatientsdisplayinguniquelydysregulatedpathways.Ourresultsshowthatintegrationofmultipleomicstypeshasgreatpotentialintheareaofdevelopmentofnoveltherapeuticapproachesforpersonalizedmedicine.
110
NETDX:PATIENTCLASSIFICATIONUSINGINTEGRATEDPATIENTSIMILARITYNETWORKS
ShraddhaPai,ShirleyHui,RuthIsserlin,HussamKaka,GaryD.Bader
TheDonnellyCentre,UniversityofToronto
ShraddhaPaiPatientclassificationhaswidespreadbiomedicalandclinicalapplications,includingdiagnosis,prognosis,diseasesubtypingandtreatmentresponseprediction.Ageneralpurposeandclinicallyrelevantpredictionalgorithmshouldbeaccurate,generalizable,beabletointegratediversedatatypes(e.g.clinical,genomic,metabolomic,imaging),handlesparsedataandbeintuitivetointerpret.WedescribenetDx,asupervisedpatientclassificationframeworkbasedonpatientsimilaritynetworks,thatmeetstheabovecriteria(Ref1).netDxmodelsinputdataaspatientnetworks,andusesnetworkintegrationandmachinelearningforfeatureselection.WedemonstratetheutilityofnetDxbyintegratinggeneexpressionandcopynumbervariantstoclassifybreastcancertumoursasbeingoftheLuminalAsubtype(N=348tumours;Ref2).Usinggeneexpressiondata,netDxperformedaswellasorbetterthanestablishedstateoftheartmachinelearningmethods,achievingameanaccuracyof89%(2%s.d.)inclassifyingLuminalA.Inthesecondapplication,wepredictcase/controlstatusinautismspectrumdisordersbasedontheoccurrenceofrarecopynumberdeletionsinmetabolicpathways(N=3,291patients;Ref3);thispredictorachievedbetterperformancethanpreviouslypublishedmethods.netDxusespathwayfeaturestoaidbiologicalinterpretabilityandresultscanbevisualizedasanintegratedpatientsimilaritynetworktoaidclinicalinterpretation.Uponpublication,netDxsoftwarewillbemadepubliclyavailableviagithub;thesoftwareprovidesworkedexamplesandeasy-to-usefunctionsfordesignofcustompredictorworkflows.Moreathttp://netdx.orgReferences:1.netDxpreprint:http://dx.doi.org/10.1101/0844182.TheCancerGenomeAtlas(2012)Nature490:61.3. Pintoetal.(2014).AmJHumGen.94(5):677.
111
PREVALENCEANDDETECTIONOFLOW-ALLELE-FRACTIONVARIANTSINCLINICALCANCERSAMPLES
Hyun-TaeShin1,2,JaeWonYun1,2,NayoungK.D.Kim1,Yoon-LaChoi2,3,Woong-YangPark1,2,4,PeterJ.Park5
1SamsungGenomeInstitute,SamsungMedicalCenter,Seoul,Korea;2Samsung
AdvancedInstituteofHealthScienceandTechnology,SungkyunkwanUniversity,Seoul,Korea;3DepartmentofPathology&TranslationalGenomics,SamsungMedicalCenter,SungkyunkwanUniversitySchoolofMedicine,Seoul,Korea;4DepartmentofMolecularCellBiology,SungkyunkwanUniversitySchoolofMedicine,Seoul,Korea;5Department
ofBiomedicalInformatics,HarvardMedicalSchool,Boston,MA
Hyun-TaeShinClinicalapplicationofsequencing-basedassaysrequireshighsensitivityandspecificityfordetectinggenomicalterations.Ouranalysisofmorethan5000cancersamplesrevealsthatasignificantfractionofclinically-actionablesomaticvariantsmayhavelowvariantallelefractions(VAF),indicatingtheimportanceofveryhighcoveragesequencingforthesepatients.Asacasestudy,wedescriberefractorycancerpatientswithclinicalresponsetotherapiesthattargetlowVAFalterations.
112
AMETHYLATION-TO-EXPRESSIONFEATUREMODELFORGENERATINGACCURATEPROGNOSTICRISKSCORESANDIDENTIFYINGDISEASETARGETS
JeffreyA.Thompson1,CarmenJ.Marsit2
1DartmouthCollege,2EmoryUniversity
JeffreyThompsonManyresearchersnowhaveavailablemultiplehigh-dimensionalmolecularandclinicaldatasetswhenstudyingadisease.Asweenterthismulti-omiceraofdataanalysis,newapproachesthatcombinedifferentlevelsofdata(e.g.atthegenomicandepigenomiclevels)arerequiredtofullycapitalizeonthisopportunity.Inthiswork,weoutlineanewapproachtomulti-omicdataintegration,whichcreatesamodelofmethylationdysregulationanditseffectongeneexpressionandthencombinesthismolecularinformationwithclinicalpredictorsaspartofasingleanalysistocreateaprognosticriskscoreforclearcellrenalcellcarcinoma.Theapproachintegratesdatainmultiplewaysandyetcreatesmodelsthatarerelativelystraightforwardtointerpretandwithahighlevelofperformance.Over100randomsplitsofthedataintotrainingandtestingsets,ourmodelhadthehighestmedianC-indexofanymethodwetried,at.792.Furthermore,wedemonstratedthatourmolecularriskpredictorisindependentofclinicalcovariatesandthatthecombinedmodelresultsinstatisticallysignificantlyhigheraccuracythaneitherdatatypealone.Additionally,theproposedprocessofdataintegrationitselfcapturesrelationshipsinthedatathatrepresenthighlydisease-relevantfunctions.Thegenesignatureweidentifyforclearcellrenalcellcarcinomaprognosisisenrichedforgenesthatarecentralnodesinaprotein-proteininteractionnetworkassociatedwiththeJAK-STATsignalingcascade,whichitselfisaknownfactorinkidneycancerprogression.Oursignatureisalsoenrichedforgenesinpathwaysinvolvedinimmuneresponse,whichareincreasinglytargetedbynovelcancertherapies.Wecallthismodelthemethylation-to-expressionfeaturemodel(M2EFM).Althoughoneoftheotherapproachesweconsideredalsoresultedinahighlyaccuratemodel,M2EFMperformedbetterwithafarmoreparsimoniousmodelthatshedslightonthepotentialrelationshipbetweenabnormalgeneregulationandcancerprognosis.Givenourresults,wethinkthatfurtherdevelopmentofthisapproachiswarranted.
113
CYP2D6DIPLOTYPECALLINGFROMWGSUSINGASTROLABE:UPDATE
AndreaGaedigk1,GreysonP.Twist2,SarahSoden2,EmilyG.Farrow2,NeilA.Miller2
1DivisionofClinicalPharmacology&TherapeuticInnovation,Children'sMercy,KansasCity,SchoolofMedicine,UniversityofMissouri-KansasCity;2CenterforPediatric
GenomicMedicine,Children'sMercy,KansasCityGreysonTwistBackground:Tofacilitatehaplotypecallingandtranslationintophenotype,wehavepreviouslydevelopedaprobabilisticscoringsystem,Astrolabe(initiallycalledConstellation;Twistetal2016,GenMed1:15007)enablingautomatedCYP2D6diplotypecallingfromwholegenomesequencing.Wehaveimplementedaseriesofimprovementstoincreasecallaccuracyaswellaseaseofuse.Methods:TheStudywasapprovedbytheInstitutionalReviewBoardofChildren’sMercyKansasCityandincludedatotalof85subjects(7HapMap;78patients/parents).WGSdatawerereanalyzedwiththeDRAGENBio-ITprocessor(EdicoGenome)toimprovethequalityofvariationcalls.TheAstrolabeCYP2D6alleledefinitiontablewasexpandedtoincludea)additionalvariantsavailablethroughtheP450NomenclatureDatabase;b)variantscharacterizedbyourlaboratory,butnotavailablethroughtheNomenclatureDatabase;c)resequencingofsomealleles(e.g.*10,*17)forwhichonlyexonsareannotatedbytheNomenclatureDatabase.Programmingerrorsinthescoringalgorithmwererepairedandunittestedaswellasabroadrangeofvariantfileinputtypeswereincluded(vcf,gvcf,tabix,.gz).ImprovementsalsoincludeversioningoftheAstrolabetoolandthenomenclaturedatafromwhichcallsaregenerated.ToaccountforhaplotypeanddiplotypecombinationsnotobservedinoursamplesetsimulationsofallpossiblediplotypecombinationswereperformedusingtheARTreadsimulatorandDRAGENanalysispipeline.Astrolabeisavailableathttps://www.childrensmercy.org/genomesoftwareportal/.Results:TomaximizeAstrolabecallaccuracy,weremovedCYP2D6*1E,*3B,*4A-L,*4N,*6D,*10C-D,and*45Bfromthecallset,becauseofincompletealleledefinitions(basedonexonsonly),orSNP(s)thatarenotuniquetoanallele.Forexample,1749A>GispartoftheCYP2D6*3Band*103definitions,butalsoappearstobepresentonsome*1subvariants.Likewise,3288A>GisnotlimitedtoCYP2D6*6Dasimpliedbythenomenclaturedatabase,thuscausingerroneousAstrolabecalls.Callswithourreviseddefinitionswerecomparedwiththoseobtainedbygenotyping.AstrolabealsoaccuratelyidentifiedsubjectswithcopynumbervariationsincludingtheCYP2D6*5deletion(n=5)andgeneduplications(n=2).Also,increasedvariantcallingaccuracyoftheDRAGENpipelineimprovedthecallingofseveralsamples(n=).Astrolabecorrectlycalled7731/8128simulateddiplotypes(95%recall);133missedand264multiplecalls).Ofthemissedcalls124weredueto*38calledas*1.Discussion:TheseriesofimprovementstoAstrolabeincreasedcallaccuracyandminimizedthenumberofnocalls.PhenotypepredictionbasedonAstrolabewassuperioroverthatderivedfromalimitedgenotypepanel.ContinuedrefinementofexistingalleledefinitionsandtheinclusionofnovelhaplotypedefinitionswillfurtherimprovetheAstrolabetool.WearecurrentlyapplyingAstrolabetootherNGSdatasetsincludingexomesandtargetedNGSpanels.
114
INTEGRATION,INTERPRETATIONANDDISPLAYOFMULTI-OMICDATAFORPRECISIONMEDICINE
DavidS.Wishart1,AnaMarcu1,AnChiGuo1,AshAnwar2,SolveigJohannessen3,CraigKnox4,MichaelWilson4,ChristophH.Borchers5,PieterCullis6,RobertFraser2
1UniversityofAlberta,2MolecularYouInc.,3EduceDesignInc.,4OMxInc.,5Universityof
Victoria,6UniversityofBritishColumbia
DavidWishartThegoalofprecisionmedicineistouseadvancedmulti-omictechnologiestoimprovetheaccuracyofmedicaldiagnosesandenhancetheindividualizationofmedicaltreatment.Thefundamentalchallengeinprecisionmedicineisnotinthemeasurementorcollectionofmulti-omicdatabutinitsdelivery.Inparticular,theintegration,interpretationanddisplayofmulti-omicdatahasproventobeparticularlyproblematic.Herewedescribesomeofourexperiencesintacklingthisproblemandoutlineanumberofimportantfindingsthatwebelieveareworthsharing.Ourmostimportantfindingwastheneedtousehighquality,quantitative‘omicsdata.Measuringabsolutelyquantitative‘omicsdataensuresgreaterreproducibilityandpermitsdirectcomparisonstowell-establishedclinicalreferencevalues.Several‘omicslaboratoriesofferingquantitativeserviceshavebeenidentifiedandthesearedescribedhere.Second,wediscoveredthatcustomdatabasescontainingbiomarker-diseasedataareessential.Veryfewofthesekindsofdatabasesexist,buttheyarenecessaryforthecomparisonandfullintegrationofmulti-omicdata.Inparticular,theyprovidetheinformationneededtointegratemulti-omicmeasuresandtodeterminediseaserisk.Abriefdescriptionofafewofthesebiomarker-diseasedatabasesisprovided.Third,wediscoveredthatcolor-codedgraphs,whicharehyperlinkedtodetailedtextualexplanations,arenecessaryforthefacileinterpretationofthemulti-omicdata–bothbypatientsandphysicians.Anexampleofawell-designed,web-enabled“dashboard”isshowntohighlightthesefindings.Finallywefoundthatcomprehensivedatabasesofactionableresponsesmustbepreparedsothatdetailed,customizablemedical,lifestyle,dietorpharmacologicalguidancecanbeprovidedtotreatorpreventconditionsdetectedbythesemulti-omicmeasurements.Examplesofseveralomics-derived,actionableresponsesareprovidedtoclarifythispoint.Thesefindings,alongwithseveralassociatedsoftwaretoolsanddatabases,haverecentlybeenintegratedintoanautomaticworkflowthatallowsawiderangeofmulti-omicmeasurementstobeintegrated,interpretedanddisplayedforprecisionorpersonalizedmedicineapplications.
115
BIOTHINGSAPIS:LINKEDHIGH-PERFORMANCEAPISFORBIOLOGICALENTITIES
JiwenXin1,CyrusAfrasiabi1,SebastienLelong1,GingerTsueng1,SeanD.Mooney2,AndrewI.Su1,ChunleiWu1
1TheScrippsResearchInstitute,2TheUniversityofWashington
ChunleiWuTheaccumulationofbiologicalknowledgeandtheadvanceofwebandcloudtechnologyaregrowinginparallel.Recently,manybiologicaldataprovidersstarttoprovideweb-basedAPIs(ApplicationProgrammingInterfaces)foraccessingdatainasimpleandreliablemanner,inadditiontothetraditionalrawflat-filedownloads.WebAPIsprovidemanybenefitsovertraditionalfiledownloads.Forinstance,userscanrequestspecificdatasuchasalistofgenesofinterestwithouthavingtodownloadtheentiredataset,therebyprovidingthelatestdataondemandandreducingcomputationanddatatransfertimes.Thismeansthatprogrammerscanspendlesstimeonwranglingdata,andmoretimeonanalysisanddiscovery.Buildinganddeployingscalableandhigh-performancewebAPIsrequiressophisticatedsoftwareengineeringtechniques.Wepreviouslydevelopedhigh-performanceandscalablewebAPIsforgeneandgeneticvariantannotations,accessibleatMyGene.infoandMyVariant.info.Thesetwoservicesareatangibleimplementationofourexpertiseandcollectivelyserveover4millionrequestseverymonthfromthousandsofuniqueusers.Crucially,theunderlyingdesignandimplementationofthesesystemsareinfactnotspecifictogenesorvariants,butrathercanbeeasilyadaptedtootherbiomedicaldatatypessuchdrugs,diseases,pathways,species,genomes,domainsandinteractions.Wearecurrentlyexpandingthescopeofourplatformtootherbiologicalentities.Collectively,wereferthemas“BioThingsAPIs”(http://biothings.io).WealsoappliedJSON-LD(JSONforLinkingData)technologyinthedevelopmentofBioThingsAPIs.JSON-LDprovidesastandardwaytoaddsemanticcontexttotheexistingJSONdatastructure,forthepurposeofenhancingtheinteroperabilitybetweenAPIs.WehavedemonstratedtheapplicationsofJSON-LDwithBioThingsAPIs,includingdatadiscrepancychecksaswellasthecross-linkingbetweenAPIs.
116
SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY
POSTERPRESENTATIONS
117
SINGLECELLSIGNALINGSTATESREVEALINDUCTIONOFNON-GENETICVARIATIONINRESISTANCETOTRAIL-INDUCEDAPOPTOSIS
ReemaBaskar,HarrisFienberg,GarryNolan,SeanBendall
StanfordUniversity
ReemaBaskarTNFalpha-relatedapoptosis-inducingligand(TRAIL)hasbeenshowntospecificallytargetcancercells,howeverrampantresistancehascurtaileditsefficacyasadrug.Cell-to-cellvariationhasbeenpreviouslylinkedtoresistancetoTRAIL-inducedapoptosis.Wefurtherinvestigatenon-geneticphenotypicvariationasanovelmodeofdrugresistance.Usingmasscytometry,wecapturedhigh-dimensional,single-cellsignalingstatesofdifferentcancertypesoverthecourseofTRAILtreatment.Forthefirsttime,weprovideacomprehensivesinglecelloverviewofTRAILsignalingdynamicsandprovidepopulationmetricstoquantifyheterogeneitywithinresistancephenotypes.WedemonstratethatwhileallcellsrespondtoTRAIL,asubsetofthempersistintransientresistantstatesanddonotprogresstoapoptosis.OurmethodsshowcorrelationbetweenheterogeneityofresponsetoTRAILandpersistenceofnon-apoptotic,viablecancercellsindrug.Wealsoshowthatcombinatorialtherapiesdesignedtoinhibitimplicatedpathwaysinconservedresistantstatesdonoteradicateresistanceandinfactcaninducenewstatesofresistance.Thisstudypresentsexperimentalandcomputationaltoolstoinvestigatenon-geneticphenotypicvariationasanovelmodeofdrugresistanceincanceranddemonstratestheirutilityinunderstandingresistancetoTRAIL-inducedapoptosis.
118
ANOVELK-NEARESTNEIGHBORSAPPROACHTOCOMPAREMULTIPLEBIOLOGICALCONDITIONSINSINGLECELLDATA
TylerJ.Burns1,GarryP.Nolan2,NikolaySamusik2
1StanfordUniversitySchoolofMedicine,Dept.ofCancerBiology;2StanfordUniversitySchoolofMedicine,BaxterLaboratoryforStemCellBiology
TylerBurnsHighdimensionalsingle-celldataisroutinelyvisualizedintwodimensionsusingdimensionreductionalgorithmsliket-SNE,PrincipleComponentsAnalysis(PCA),orforce-directedgraphs.Whencomparinglevelsofintracellularproteinsinbasalversusperturbedcells,clusteringmustbeusedtovisualizechangesinspecificmarkersinasinglegraph.However,discretizingadatasetdoesnotallowonetounderstandsubtle,rare,and/orcontinuousbiologicalchangesacrosstheoriginalmanifold.Herein,wepresentanalgorithmthatrepresentseachcell’sinformationcontentasitsaverageacrossk-nearestneighbors.Thisallowsforcomparisonstobemadebetweenbiologicalconditionsonaper-cellbasis.Weusethistoproducedetailedt-SNEmapsdepictingbiologicalchange,andcorrelationanalysistoenumeratesignalingresponsestoperturbation.
119
SINGLE-CELLRNASEQUENCINGINPRIMARYGLIOBLASTOMA:IMPROVINGANALYSISOFHETEROGENEOUSSAMPLESBYINCORPORATING
QUANTIFICATIONOFUNCERTAINTY
WendyMarieIngram,DebdiptoMisra,NicholasF.Marko,MarylynRitchie
GeisingerHealthSystemWendyIngramBackground:Glioblastoma(GBM)isthemostcommonanddeadlybraincancerinadults.Theassociatedlethalitymaybeattributabletotheintrinsicheterogeneityofmicro-invasivetumorcells,someofwhichareunavoidablyleftbehindfollowingtumorresection.Thetranscriptomicheterogeneitymaycontributetothesurvivalandsubsequentproliferationofasmallsubsetofcellsthatareresistanttoradiationandchemotherapy.Ithaslongbeenhypothesizedthatinvestigationsintothesetumorsatasinglecelllevelwillallowforbettermolecularunderstandingoftreatmentresistanceandthedevelopmentofnoveltherapeuticapproaches.Recently,advancesinsinglecellcaptureandsequencingtechnologyhavebecomeavailableandallowforthesestudiestobeconducted.However,therearemanytechnicalandcomputationalchallengesinherenttosinglecelltranscriptomicsthatarenotaddressedbytraditionalRNA-seqanalysistools.Thesechallengesincludeuncertaintyoftechnicalandbiologicalvarianceandmustbecarefullyconsideredinorderforbiologicallyandtherapeuticallyrelevantconclusionstobereached.Methods:TumortissuefromtwoGBMpatientsundergoingsurgicalresectionaspartofstandardofcaretherapywascollectedatthetimeofsurgery.WeusedtheFluidigmC1microfluidicsplatformtocapturesinglecellsfollowedbyRNAsequencing(RNA-seq)ofthesecellsandabulkpopulationof~10,000cellsfromeachtumor.Wecomparedtwodifferenttranscriptomicalignmenttools,Bowtieandkallisto,andanalyzedthesinglecelltranscriptionalheterogeneityofcellswithinandbetweentumorsusingtherecentlydevelopedanalysistools,sleuth.Tothebestofourknowledge,wearethefirsttoutilizethissinglecellcapturemethodandperformsinglecellRNA-seqanalysisusingthenewlydevelopedkallistoandsleuthprogramsforprimaryGBMtissuesamples.Results:WeshowthattheFluidigmC1microfluidicssinglecellcapturemethodproduceshighqualitytranscriptomicmaterialforRNA-seqandmayhavebenefitsoveralternativemethods(e.g.fluorescence-activatedcellsorting)suchasshorterpreparationtime.Thekallisto-sleuthanalysisprogramsprovideimprovedestimationofgeneexpressionvariabilityandmorereliableclusteringofsinglecellsbyleveragingtheuniquefeaturesofequivalencygroupsandbootstrapestimatesofkallisto.Clusteranalysisdemonstratesthatcertaincellsfrombothtumorsclustertogetherandsharesomecommonexpressionpatters,buttheremainingcellsclusterintumor-specificgroupsordonotgroupwithothercells.WeobservemarkedintertumorandintratumortranscriptionalvariabilityandnotethataverageexpressionfromsinglecellsdoesnotreliablycorrelatewiththebulkcellRNA-seqabundanceestimates.Takentogether,wehaveshownthatthecombinationofFluidigmC1andthekallisto-sleuthanalysisprogramsprovetobeusefulandreliablemethodstoobtainandanalyzehighqualitysinglecellRNA-seqdatafortheinvestigationofprimarytumortissues.
120
REGISTRATIONOFFLOWCYTOMETRYDATAUSINGSWIFTCLUSTERTEMPLATESTOREMOVECHANNEL-SPECIFICORCLUSTER-SPECIFICVARIATION
JonathanA.Rebhahn1,SallyA.Quataert1,GauravSharma2,TimR.Mosmann1
1CenterforVaccineBiologyandImmunology,UniversityofRochesterMedicalCenter;2DepartmentofElectricalandComputerEngineering,UniversityofRochester
TimMosmannStandardizationbetweenflowcytometryexperimentsperformedatdifferenttimesisdifficultbecausevariationsincellparameterscanbecausedbymanyfactors,includingchangesinantibodyreagents,stainingprotocols,cellhandling,differentcytometers,andcytometersettingssuchasphotomultiplieramplificationvoltages.Thesevariationsmayoverwhelmthegenuinebiologicaldifferencesbeinginvestigated,suchasgeneticordisease-specificvariationsbetweensubjects.Technicalvariationscanbepartlyreducedbymanuallyadjustinganalysisgates,butthisissubjectiveandtime-consuming.Previousmethodsforsemi-automatedadjustmenthavereliedonhistogrampeaksormanualgatingtoidentifyanchorpopulations.Wehavenowdevelopedfully-automatedmethodsforregisteringflowcytometrysamples,i.e.normalizingthefluorescenceintensityofeachcellinallchannels.Wetakeadvantageofthehigh-resolutionclustertemplatesderivedbyclusteringreferencesamplesbytheSWIFTalgorithm.ThesetemplatesrepresentGaussianmodeldescriptionsofthemultidimensionaldata.Ifsamplestoberegisteredareatleastmoderatelysimilartothetarget/referencesample,assignmentofthetestsampletothetemplateresultsinmostcellsbeingassignedtotheappropriatecluster,butclustersthathaveshiftedinthetestsamplethenhavealteredmedianvaluesinoneormorechannels.Thishigh-resolutionpositionalinformationisusedfortwotypesofregistration:Rigid,orper-channelregistrationcomparesclusterlocationsbetweenthetargetandthetestsampletoberegistered,andthebest-fitregistrationadjustmentsaredeterminedforeachchannelandappliedincrementally,reassigningthecellsateachsteptoimprovethefinalfit.Thisobjectivelyusespositionalinformationfromallclusters,regardlessofclustersizevariation,andsuccessfullycorrectsglobalartifactssuchasstainingorcytometersettingsthatcause‘batch’differencesbetweenassaydays.Fluid,orper-clusterregistrationcalculatestheregistrationadjustmentrequiredforeachclusterinthetestsampletooverlapfullywithitscorrespondingclusterinthereferencesample.Thisregistersclustersmorecompletely,andcanremoveindividualvariation(duetoe.g.geneticordisease-specificeffects).Fluidregistrationremovesmostpositionalinformation-thisisdesirableifthemainexperimentaloutcomeisexpectedtobevariationsofthenumberofcellsofdifferenttypes.Thismethodhasbeenappliedtodatasetsthatincludechangesduetoassaydates,flowcytometers,subjects,andsequentialbloodsamples.Mostvariationoccurredbetweencytometersandassaydays,lessbetweensubjects,andtheleastbetweendifferentbleedsfromthesameperson.Registrationsubstantiallyimprovedcorrelationsbetweenclustermedians.Thenumberofcellsperclusteralsoshowedincreasedcorrelation,suggestingthatunmodifiedsamplesassignedtotheclustertemplatessometimeshadcellsassignedtoaninappropriatecluster.ThustheSWIFTcluster-basedregistrationcanimprovesubsequentflowcytometryanalysis.Registeredsamplescanbeanalyzedbyavarietyofmanualorautomatedprocedures.
121
WORKSHOP:NOBOUNDARYTHINKINGINBIOINFORMATICS
POSTERPRESENTATION
122
ENABLINGRICHERDATAINTEGRATIONFORGENOMICEPIDEMIOLOGY
E. Griffiths1,D.Dooley2,C.Bertelli1,J.Adam3,F.Bristow3,T.Matthews3,A.Petkau3,M.Courtot4,J.A.Carriço5,A.Keddy6,R.Beiko6,L.M.Schriml7,E.Taboada8,M.Graham3,G.VanDomselaar3,
W. Hsiao2,F.Brinkman1
1SFU,Burnaby,BC,Canada;2BCCentreforDiseaseControl,Vancouver,BC,Canada;3PHAC,Winnipeg,MB,Canada;4EBI,Hinxton,Cambridge,UK;5Univ.ofLisbon,Lisbon,Portugal;
6DalhousieUniv.,Halifax,NS,Canada;7Univ.ofMarylandSchoolofMedicine,Baltimore,MD,USA;8PHAC,Lethbridge,AB,Canada
FionaBrinkmanOnebarriertoeffectivelycapitalizingonwholegenomesequencedataisefficient,robustannotationandintegrationofassociatedcontextualdata(metadata).Whetherhuman,microbialorotherorganismalgenomicsequence,frequentlysuchcontextualdataistoounorganized,infreetextformat,toenableeffectiveintegrationforansweringmoresophisticatedquestions.ApproachestohelpovercomethisbarrierareillustratedherewiththeIntegratedRapidInfectiousDiseasesAnalysis(IRIDA.ca)ProjectandGenomicEpidemiologyOntology(GenEpiO.org)Consortium.Microbialpathogenwholegenomesequencingprovidesthehighestresolutionmolecular“fingerprint”forinfectiousdiseaseepidemiologyandistransformingpublichealthpractice–enablingmorerapididentificationofdiseaseoutbreaks,theirsources,andpotentialcontrolmeasures.However,suchmicrobialgenomicdata(likehuman‘omicdata)mustbecombinedwithepidemiological/clinical/laboratory/otherhealthcaredata(“contextualdata”)tobemeaningfullyinterpretedforclinicalandpublichealthquestions/actions.Furthermore,informationmustbesharedbetweendifferentagenciestoefficientlyassessandmanageriskstohumanhealthacrossjurisdictions.Currently,terminologiesdescribingpublichealthdatacannotbeeasilymappedacrossfunctionally-similarsoftwaresystemswithoutintricateinterventionbyspecialists,resultingindataexchangesystemsthatarestaticandfragile.Topromoteefficientdataexchangeandintelligencesharing,weproposeanintuitiveplatformforsearching,identifying,andverifyingthefundamentalhealthcareentityelements(ontologyterms)tomaptoinstitutionalapplicationdataformats,startingwithgenomicandpublichealthcontextualdata.KeyinnovationsaretheproposedGenomicEpidemiologyEntityMart(GE2M)thatallowsuserstoinspecttermdefinitions,labeling,anddatabasecrossreferencesinauser-friendlyformat,plusasoftwaresystemallowingdifferentjurisdictionstousethetermssuitableforthem,essentiallychoosingfroma“shoppingcart”ofoptionsmappedbetweenjurisdictions/organizations.AverypreliminaryprototypeofthisconcepthasbeenestablishedaspartoftheIRIDA.caprojectandtheGenEpiOConsortium(aconsortiumof70researchersfrom15countriesinterestedincontributingtothiseffort).Wehypothesizethatacommonandaccessibleontologyentitymartcanbedeveloped,ifappropriatetoolsforinterfacingdomainexpertswiththismartaredeveloped–andthemartisfirstappliedtopracticalmicrobialgenomicepidemiologydatasharingneedsbetweenselectpublichealthsystems(withconsultationinvolvingalargerconsortium).Inaddition,newgenomicdatavisualizationapproachesarebeingdevelopedforintegrationintotheIRIDAsoftwareplatform,toenablemoreinteractive,flexiblevisualizationofgenomicdatawithdifferentlevelsorviewsofcontextualdata(fromfinelydetailedcomparisonsofgenomicislandsandotherfeaturesbetweengenomes,toexamininggenomicdatainthecontextofgeographicaldata).IRIDAisbeingusedinCanada’spublichealthagency,andthisopensourcesoftwareisalsobeinginstalledinothercountriesinterestedinco-developingthisresourceandusingafederateddatasharingapproach.
123
AUTHORINDEX
A
Abrams,Zachary·59Abul-Husn,NouraS.·107Adam,J.·122Adams,Micah·54Aevermann,Brian·37Afrasiabi,Cyrus·115Agarwal,Vibhu·17Akbarian,Schahram·72Aldrich,MelindaC.·20,35Alkan,Can·77,87Alser,Mohammed·77Altman,RussB.·79,90Andreoletti,Gaia·101Andres-Terrè,Marta·13Ansel,Mark·80Anwar,Ash·114Armaselu,Bogdan·18Arunachalam,HarishBabu·18Ashley,Euan·104Aslam,Naureen·68Asmann,YanW.·85Ayati,Marzieh·67
B
Bader,GaryD.·110Baheti,Saurabh·108Bai,Yongsheng·68Bakken,Trygve·37Baskar,Reema·117Bauer,ChristopherR.·27Beaulieu-Jones,BrettK.·19Bebek,Gurkan·52Beck,Andrew·50Beck,Mette·28Beiko,R.·122Bellovich,Keith·70Bendall,Sean·117Berens,Michael·31Berry,GeraldJ.·90Bertelli,C.·122Best,AaronA.·2Bhat,Zeenat·70Bichko,Dmitri·76Biernacka,JoannaM.·85Biggin,MarkD.·64Boespflug,Mathieu·76Boley,Nathan·98Bongen,Erika·13Borchers,ChristophH.·114Borecki,Ingrid·34Borrayo,Ernesto·63
Bowden,DonaldW.·45Bowerman,Nathan·2Breitenstein,MatthewK.·96Breitwieser,Gerda·34Brenner,StevenE.·101Brinkman,BenjaminH.·97Brinkman,F.·122Bristow,F.·122Bromberg,Yana·69Brosius,FrankC.·70Brown,AndrewJ.Leigh·83Brubaker,Douglas·52Brunak,Soren·28Burns,TylerJ.·118Bustillo,JuanR.·93
C
Cai,Guoshuai·73Calhoun,VinceD.·9,93Cao,Mengfei·3Carey,DavidJ.·107Carr,Steven·109Carriço,J.A.·122Carter,LesterG.·106Cederberg,Kevin·18Chan,Yu-FengYvonne·23Chance,Mark·67Chang,Rui·11Chasioti,Danai·71Chaudhary,Kumardeep·74Chen,Rong·56Chen,Yii-DerI.·45Cheung,Philip·84Cheville,John·108Chew,Guo-Liang·64Choi,Yoon-La·111Christiansen,Lena·37Clay,AlyssaI.·96Clemons,PaulA.·31Cline,Melissa·15Cohain,Ariella·11Cordero,Pablo·38Correa,Adolofo·45Costello,JamesC.·60Courtot,M.·122Cowen,LenoreJ.·3Crawford,DanaC.·20Cullis,Pieter·114
D
Daescu,Ovidiu·18Danaee,Padideh·44Darrow,Bruce·22
124
Davila, Jaime·108Davis-Dusenbery,Brandi·14deBelle,J.Steven·84De,Subhajyoti·88Deisseroth,ColeA.·13DeJongh,Matthew·2Denny,Joshua·35deVries,Edsko·76Dewey,FrederickE.·34,107Dhruv,Harshil·31Diaz,Diana·51Diez-Fuertes,Francisco·37Dincer,Aslihan·72Disselkoen,Craig·54Divaraniya,AparnaA.·11Dominguez,Facundo·76Domselaar,G.Van·122Donato,Michele·51Dooley,D.·122Dougherty,Greg·105Draghici,Sorin·51Dudley,JoelT.·11,22,72Dunnenberger,H.M.·106Durmaz,Arda·52
E
Eckel-Passow,Jeanette·105Egawa,Fumiko·33Empey,P.E.·106Ergin,Oguz·77Ertekin-Taner,Nilüfer·85Eskin,Eleazar·80
F
Fantl,WendyJ.·78Farber-Eger,Eric·20Farrow,EmilyG.·102,113Fienberg,Harris·117Fink,CrisG.·97Fink,Tobias·24,99Fogarty,Zach·108Foo,ChuanSheng·98Fornage,Myriam·45Franks,JenniferM.·73Frase,A.T.·106Fraser,Robert·114Fread,KristinI.·39Freedman,BarryI.·45Freimuth,R.R.·106Freimuth,Robert·105
G
Gadegbeku,Crystal·70Gaedigk,A.·106
Gaedigk,Andrea·81,102,113Gallion,Jonathan·29,103Gao,Chen·41Garmire,Lana·74Gavin,Davin·72Gelijns,Annetine·22Genes,Nicholas·23Ghaeini,Reza·44Ghose,Saugata·87Gipson,Debbie·70Giron,Emily·84Glicksberg,Benjamin·56Gliske,StephenV.·97Goldfeder,Rachel·104Gordon,A.·106Gosh,Debashis·88Graham,M.·122Gray,DanielH.·78Greenside,Peyton·98Griffiths,E.·122Groop,Leif·28Guney,Emre·12Guo,AnChi·114
H
Haidar,C.·106Hart,Steven·105Hassan,Hasan·77Hawkins,Jennifer·70Haynes,WinstonA.·13He,Dan·30He,Shuyao·55Hellwege,JacklynN.·45Henderson,TimA.D.·52Hendrix,David·44Hershman,StevenG.·23Herzog,Julia·70Hicks,J.K.·106Hodge,Rebecca·37Hoff,FiekeW.·57Hoffman,J.M.·106Hollister,BrittanyM.·20Hong,Na·75Horton,Iain·105Horton,TerzahM.·57Hoskins,RogerA.·101Hsiao,W.·122Hu,ChenyueW.·57Huang,Austin·76Huang,Kun·7,59Hui,Shirley·110
I
Iakoucheva,LiliaM.·82Imoto,Seiya·91Ingram,WendyMarie·119
125
Israeli,Johnny·98Isserlin,Ruth·110Ivkovic,Sinisa·14
J
Jebakaran,Jebakumar·22Jiang,Guoqian·75Johannessen,Solveig·114Johnson,KippW.·22Johnson,Travis·59Ju,Wenjun·70
K
Kabat,Halla·53Kaddurah-Daouk,RimaF.·96Kaka,Hussam·110Kamp,Thomas·54Kandamurugu,Manickam·107KanigelWinner,KimberlyR.·60Karakurt,Gunnur·48Kasarskis,Andrew·11,22Kashef-Haghighi,Dorna·33Kaushik,Gaurav·14Keaton,JacobM.·45Kechris,Katerina·86Keddy,A.·122Khatri,Purvesh·13, 46Kiefer,Jeff·31Kim,Jeremie·77,87Kim,Juho·61Kim,Junghi·41Kim,NayoungK.D.·111Kim,Seungchan·31Klein,T.E.·106Knox,Craig·114Ko,MelissaE.·78Kornblau,StevenM.·57Kovatch,Patricia·22Koyutürk,Mehmet·48, 67Kretzler,Matthias·70Krishnamurthy,Sarathbabu·34,107Krishnan,MichelleL.·42Kuan,PeiFen·55Kuncheva,Zhana·42Kundaje,Anshul·98Kural, Deniz ·14
L
Lanchantin,Jack·21Larson,Melissa·108Larson,NicholasB.·108Lasken,RogerS.·37
Lau,KatyL.·97Lavage,DanielR.·27,34Leader,JosephB.·27,34,107Leavey,Patrick·18Ledbetter,DavidH.·107Lee,Donghyuk·77Lee,Inhan·53Lee,M.T.·106Lein,Ed·37Lelong,Sebastien·115Li,JingyiJessica·64Li,Lang·71Li,Li·22Li,MatthewD.·13Li,Shuyu·56Lichtarge,Olivier·25,29,103Lin,Chih-Hsu·25Lin,Dongdong·93Lin,Yaxiong·105Lincoln,StephenE.·15Liu,Charles·13Liu,Jingyu·93Liu,Keli·50Liu,LarryY.·48Liu,Tao·109Lofgren,Shane·13Lopez,Alexander·34Lu,Liangqun·74Lua,RhonaldC.·25Lucas,AnastasiaM.·34Luedtke,Alexander·50
M
Ma,Meng·56Machida-Hirano,Ryoko·63Mahendra,Divya·31Mahlich,Yannick·69Mahoney,J.Matthew·27Mallory,EmilyK.·79Mandric,Igor·80Mangul,Serghei·80Marcu,Ana·114Marko,NicholasF.·119Marsit,CarmenJ.·32,112Martinez,Maria·18Massengill,Susan·70Matthews,T.·122Matveeva,OlgaV.·94McCorrison,Jamison·37McDermott,JasonE.·109McDonnell,ShannonK.·85,105,108McEachin,RichardC.·70Mead,David·105Mehta,Sanket·57Mertins,Philipp·109Metpally,RaghuP.R.·34,107Miller,Jeremy·37Miller,Neil·81,102, 106,113
126
Miotto,Riccardo·22Mishra,Rashika·18Misra,Debdipto·119Miyano,Satoru·91Mohan,Rahul·98Montana,Giovanni·42Montoya,Dennis·80Mooney,SeanD.·82,106,115Moore,JasonH.·19Moskovitz,Alan·22Mosmann,TimR.·120Moult,John·101Murray,MichaelF.·107Mutlu,Onur·77,87Myers,Mark·105
N
Nair,AshaA.·108Nair,K.Sreekumaran·96Narla,Goutham·67Nazipova,NafisaN.·94Ng,MaggieC.Y.·45Nguyen,Tin·51Nho,Kwangsik·8Ni'Suilleabhain,Molly·18Ning,Xia·71Nolan,GarryP.·39,78,117,118Non,Amy·20Novotny,Mark·37
O
O'Connell,Chloe·33O’Brien,Daniel·108Ogurtsov,AlekseyY.·94Osafo,Nana·89Otolorin,Abiodun·89Overton,John·34
P
Pai,Shraddha·110Palmer,NicholetteD.·45Pan,Wei·41Pandey,Gaurav·47Pankow,JamesS.·45Parida,Laxmi·30Park,PeterJ.·111Park,Woong-Yang·111Paten,Benedict·15Payne,Samuel·109Pejaver,Vikas·82Pen,Jian·65Pendergrass,SarahA.·27,34Peng,Jian·4,61
Penn,John·34Pennathur,Subramaniam·70Perrone-Bizzozero,Nora·93Person,T.N.·106Perumal,Kalyani·70Peterson,Josh·35,106Petkau,A.·122Petyuk,Vladislav·109Pinney,Sean·22Playter,ChristopherS.·78Plevritis,SylviaK.·78Poirion,Olivier·74Pond,Sergei·83Probert,Chris·98Prodduturi,Naresh·75Pyc,MaryA.·84
Q
Qi,Yanjun·21Qu,Meng·4,65Quataert,SallyA.·120Qutub,AminaA.·57
R
Radcliffe,Richard·86Rademakers,Rosa·85Radivojac,Predrag·82Rakheja,Dinesh·18Rasmussen-Torvik,LauraJ.·45Ré,Christopher·79,90Rebhahn,JonathanA.·120Reddy,JosephS.·85Reed,Gay·105Reich,DavidL.·22Reid,Jeffrey·34Relling,M.V.·106Ren,Yingxue·85Restrepo,NicoleA.·20Rich,StephenS.·45Ricks,Doran·22Risacher,ShannonL.·8Riska,Shaun·108Ritchie,MarylynD.·34,106,119Roden,Dan·35Rodland,Karin·109Rogers,Linda·23Ross,Jason·105Ross,OwenA.·85Rossetti,Maura·80Rotman,Jeremy·80Rotter,JeromeI.·45Röttger,Richard·5Rubin,DanielL.·90Rudra,Pratyaydipta·86Russell,Nate·61Russell,Pamela·86
127
S
Saba,Laura·86Salman,Ali·68Samuels,David·35Samusik,Nikolay·118Sander,Thomas·24,99Sangkuhl,K.·106Sarangi,Vivekananda·85Saykin,AndrewJ.·8Scarpa,JosephR.·11Schadt,EricE.·11,23,56,72Schaid, Daniel ·108Scherbina,Anna·98Scheuermann,RichardH.·37Schlatzer,Daniela·67Schork,Nicholas·37Schreiber,StuartL.·31Schriml,L.M.·122Schultz,André·57Scott,ErickR.·23Scott,Madeleine·46Scott,S.A.·106Sengupta,Anita·18Sengupta,ParthoP.·22Senol,Damla·77,87Shabalina,SvetlanaA.·94Shah,NigamH.·17Shameer,Khader·22Sharma,Gaurav·120Shen,Li·8,71Shi,Wen·86Shifman,Sagiv·80Shin,Hyun-Tae·111Shrikumar,Avanti·98Shuldiner,AlanR.·107Simonovic,Janko·14Singh,Ritambhara·21Sinnwell,JasonP.·85Smelser,Diane·107Smith,Kyle·88Smith,Richard·109Snyder,John·27Snyder,Michael·90Soden,Sarah·102,113Song,Junyan·55Southerland,William·89Speyer,Gil·31Spreafico,Roberto·80Stacey,WilliamC.·97Stai,Tony·105Stanescu,Ana·47Statz,Benjamin·80Steemers,Frank·37Strauli,Nicolas·80Strickland,WilliamD.·39Stuart,JoshuaM.·38Su,AndrewI.·115Su,Hai·7Swank,Julie·105
Sweeney,TimothyE.·13
T
Taboada,E.·122Tam,Andrew·13Taroni,JaclynN.·73Tatonetti,NicholasP.·22Taylor,KentD.·45Teh,Charis·78Thibodeau, Stephen N.·108Thompson,JeffreyA.·32,112Tignor,Nicole·23Tijanic,Nebojsa·14Tintle,Nathan·2,50,54Tomczak,Aurelie·13Tran,DannyN.·37Tran,HaiJ.·31Tsueng,Ginger·115Tully,Tim·84Tunkle,Leo·53Twist,GreysonP.·81,102,106,113
V
Vallania,Francesco·13,46VanDerWey,Will·80VanHouten,Jacob·35Venepally,Pratap·37Venkataraman,GuhanRam·33Verma,A.·106Verma,ShefaliS.·34Vestal,Brian·86Volety,Rama·105vonKorff,Modest·24,99
W
Wagenknecht,LynneE.·45Wall,DennisPaul·33Wang,Beilun·21Wang,Changchang·56Wang,Chao·7Wang,Chen·75Wang,Liewei·96Wang,Pei·23Wang,Sheng·4,65Wang,Yu-Ping·9Weaver,Steven·83Weinshilboum,RichardM.·96Wertheim,Joel·83Westergaard,David·28Whaley,R.M.·106Whirl-Carrillo,M.·106Whitfield,MichaelL.·73Whiting,Kathleen·48Wiepert,Mathieu·105
128
Wiggins,Roger·70Wiley,Laura·35Wilkins,AngelaD.·25,29,103Williams,M.S.·106Wilson,JamesG.·45Wilson,Michael·114Wilson,StephenJ.·25Wiredja,Danica·67Wishart,DavidS.·114Wiwie,Christian·5Woon,M.·106Worrell,GregA.·97Wu,Chunlei·106,115
X
Xin,Hongyi·77Xin,Jiwen·115
Y
Yahi,Alexandre·22Yamaguchi,Rui·91Yan,Jingwen·8Yang,HarryTaegyun·80Yang,Lin·7
Yang,Shan·15Yang,W.·106Yao,Xiaohui·71Yoo,Byunggil·81Younkin,SteveG.·85Yu,Kun-Hsing·90Yun,JaeWon·111
Z
Zaitlen,Noah·80Zelikovsky,Alex·80Zhang,Bin·72Zhang,Can·15Zhang,Fan·37Zhang,Pengyue·71Zhang,Yan·59Zhang,Yao-zhong·91Zhu,Chengsheng·69Zhu,Jun·11Zhu,Kuixi·11Ziemek,Daniel·76Zille,Pascal·9Zunder,EliR.·39,78Zweig,Micol·23