15
Article Cross-Study Comparison Reveals Common Genomic, Network, and Functional Signatures of Desiccation Resistance in Drosophila melanogaster Marina Telonis-Scott,* ,1 Carla M. Sgr o, 1 Ary A. Hoffmann, 2 and Philippa C. Griffin 2 1 School of Biological Sciences, Monash University, Clayton, Melbourne, VIC, Australia 2 School of BioSciences, Bio21 Institute, University of Melbourne, Parkville, Melbourne, VIC, Australia *Corresponding author: E-mail: [email protected]. Associate editor: Ilya Ruvinsky Abstract Repeated attempts to map the genomic basis of complex traits often yield different outcomes because of the influence of genetic background, gene-by-environment interactions, and/or statistical limitations. However, where repeatability is low at the level of individual genes, overlap often occurs in gene ontology categories, genetic pathways, and interaction networks. Here we report on the genomic overlap for natural desiccation resistance from a Pool-genome-wide association study experiment and a selection experiment in flies collected from the same region in southeastern Australia in different years. We identified over 600 single nucleotide polymorphisms associated with desiccation resistance in flies derived from almost 1,000 wild-caught genotypes, a similar number of loci to that observed in our previous genomic study of selected lines, demonstrating the genetic complexity of this ecologically important trait. By harnessing the power of cross-study comparison, we narrowed the candidates from almost 400 genes in each study to a core set of 45 genes, enriched for stimulus, stress, and defense responses. In addition to gene-level overlap, there was higher order congruence at the network and functional levels, suggesting genetic redundancy in key stress sensing, stress response, immunity, signaling, and gene expression pathways. We also identified variants linked to different molecular aspects of desiccation physiology previously verified from functional experiments. Our approach provides insight into the genomic basis of a complex and ecologically important trait and predicts candidate genetic pathways to explore in multiple genetic backgrounds and related species within a functional framework. Key words: desiccation, Drosophila, GWAS, gene overlap. Introduction In climate change research there is increasing interest to con- sider not only the obvious impact of changing temperatures on biodiversity, but also fluctuations in rainfall and humidity (Bonebrake and Mastrandrea 2010; Clusella-Trullas et al. 2011; Chown 2012). Changes in water availability pose specific chal- lenges to terrestrial ectotherms such as insects, impacting activity, range distributions, species richness, and disease vector populations (reviewed in Chown et al. 2011, and ref- erences therein). Efforts to understand the responses of in- sects to water availability are underway, given that physiological responses are an integral component of predict- ing species responses to climate change (Chown et al. 2011; Hoffmann and Sgr o 2011). Insect water balance physiology is relatively well elucidated (Hadley 1994; Chown and Nicolson 2004; Bradley 2009), and the field is undergoing a crucial shift toward understanding the molecular underpinnings, largely achieved through high-throughput and transgenic technolo- gies in Drosophila. Insects can lose water from the epicuticle, through respi- ration via the spiracles or across the gut epithelia (Hadley 1994). Drosophila balance water via three main mechanisms: Altering water content, slowing water loss rate, and less commonly tolerating water loss (Gibbs and Matzkin 2001; Gibbs et al. 2003). Water retention appears to be the primary mechanism for withstanding desiccation in highly resistant cactophilic Drosophila (Gibbs and Matzkin 2001), as well as in Drosophila melanogaster (Telonis-Scott et al. 2006). How water is preserved, however, is highly variable depending on the species and/or genetic background. Cactophilic species have repeatedly colonized arid habitats via reduced metabolic rates that stem respiratory and cuticular water loss and extend energy utilization (Gibbs and Matzkin 2001; Marron et al. 2003), while in the laboratory, water retention and se- questration arise from multiple evolutionary pathways less clearly related to metabolic rate (Hoffmann and Parsons 1989a), often acting in concert (Hoffmann and Parsons 1989a; Djawdan et al. 1998; Folk et al. 2001; Telonis-Scott et al. 2006, 2012). Less resolved is the role of respiratory rel- ative to cuticular water loss via discontinuous gas exchange, although this might be an important water budget strategy for insects in general (Chown et al. 2011). Recent candidate gene-based approaches have seen new developments in understanding specific molecular aspects of this variable ecological trait in Drosophila. Diuretic peptide

Cross-Study Comparison Reveals Common Genomic, Network, … · 2016-02-06 · resistance using a high-throughput sequencing Pool-GWAS (genome-wide association study) approach ( Bastide

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cross-Study Comparison Reveals Common Genomic, Network, … · 2016-02-06 · resistance using a high-throughput sequencing Pool-GWAS (genome-wide association study) approach ( Bastide

Article

Cross-Study Comparison Reveals Common Genomic Networkand Functional Signatures of Desiccation Resistance inDrosophila melanogaster

Marina Telonis-Scott1 Carla M Sgro1 Ary A Hoffmann2 and Philippa C Griffin2

1School of Biological Sciences Monash University Clayton Melbourne VIC Australia2School of BioSciences Bio21 Institute University of Melbourne Parkville Melbourne VIC Australia

Corresponding author E-mail marinatelonisscottmonashedu

Associate editor Ilya Ruvinsky

Abstract

Repeated attempts to map the genomic basis of complex traits often yield different outcomes because of the influence ofgenetic background gene-by-environment interactions andor statistical limitations However where repeatability is lowat the level of individual genes overlap often occurs in gene ontology categories genetic pathways and interactionnetworks Here we report on the genomic overlap for natural desiccation resistance from a Pool-genome-wide associationstudy experiment and a selection experiment in flies collected from the same region in southeastern Australia in differentyears We identified over 600 single nucleotide polymorphisms associated with desiccation resistance in flies derived fromalmost 1000 wild-caught genotypes a similar number of loci to that observed in our previous genomic study of selectedlines demonstrating the genetic complexity of this ecologically important trait By harnessing the power of cross-studycomparison we narrowed the candidates from almost 400 genes in each study to a core set of 45 genes enriched forstimulus stress and defense responses In addition to gene-level overlap there was higher order congruence at thenetwork and functional levels suggesting genetic redundancy in key stress sensing stress response immunity signalingand gene expression pathways We also identified variants linked to different molecular aspects of desiccation physiologypreviously verified from functional experiments Our approach provides insight into the genomic basis of a complex andecologically important trait and predicts candidate genetic pathways to explore in multiple genetic backgrounds andrelated species within a functional framework

Key words desiccation Drosophila GWAS gene overlap

IntroductionIn climate change research there is increasing interest to con-sider not only the obvious impact of changing temperatureson biodiversity but also fluctuations in rainfall and humidity(Bonebrake and Mastrandrea 2010 Clusella-Trullas et al 2011Chown 2012) Changes in water availability pose specific chal-lenges to terrestrial ectotherms such as insects impactingactivity range distributions species richness and diseasevector populations (reviewed in Chown et al 2011 and ref-erences therein) Efforts to understand the responses of in-sects to water availability are underway given thatphysiological responses are an integral component of predict-ing species responses to climate change (Chown et al 2011Hoffmann and Sgro 2011) Insect water balance physiology isrelatively well elucidated (Hadley 1994 Chown and Nicolson2004 Bradley 2009) and the field is undergoing a crucial shifttoward understanding the molecular underpinnings largelyachieved through high-throughput and transgenic technolo-gies in Drosophila

Insects can lose water from the epicuticle through respi-ration via the spiracles or across the gut epithelia (Hadley1994) Drosophila balance water via three main mechanismsAltering water content slowing water loss rate and less

commonly tolerating water loss (Gibbs and Matzkin 2001Gibbs et al 2003) Water retention appears to be the primarymechanism for withstanding desiccation in highly resistantcactophilic Drosophila (Gibbs and Matzkin 2001) as well as inDrosophila melanogaster (Telonis-Scott et al 2006) Howwater is preserved however is highly variable depending onthe species andor genetic background Cactophilic specieshave repeatedly colonized arid habitats via reduced metabolicrates that stem respiratory and cuticular water loss andextend energy utilization (Gibbs and Matzkin 2001 Marronet al 2003) while in the laboratory water retention and se-questration arise from multiple evolutionary pathways lessclearly related to metabolic rate (Hoffmann and Parsons1989a) often acting in concert (Hoffmann and Parsons1989a Djawdan et al 1998 Folk et al 2001 Telonis-Scottet al 2006 2012) Less resolved is the role of respiratory rel-ative to cuticular water loss via discontinuous gas exchangealthough this might be an important water budget strategyfor insects in general (Chown et al 2011)

Recent candidate gene-based approaches have seen newdevelopments in understanding specific molecular aspects ofthis variable ecological trait in Drosophila Diuretic peptide

signaling in ion and water balance regulated by excretion viathe Malpighian tubules (MTs) and gut absorption has beenshown to impact desiccation resistance in D melanogaster(Kahsai et al 2010 Terhzaz et al 2012 2014 2015) Specificallyknockdown of the Capability (Capa) neuropeptide gene en-hanced desiccation resistance by reductions in respiratorycuticular and excretory water loss while also cross-conferringcold tolerance (Terhzaz et al 2015) Cuticular hydrocarbon(CHC) levels are also implicated a single gene encoding a fattyacid synthase mFAS was shown to impact both desiccationresistance and mate choice in Australian rainforest endemicsDrosophila serrata and Drosophila birchii (Chung et al 2014)In D melanogaster knockdown of CYP4G1 also reduced CHCproduction and impaired survival under low humidity (Qiuet al 2012) Metabolic signaling has also been shown toimpact desiccation resistance and includes components ofthe insulin signaling pathways (Seurooderberg et al 2011 Liu et al2015) Several other single gene studies highlight differentmechanisms such as desiccation avoidance in larvae via no-ciceptors encoded by members of the transient receptorpotential (TRP) aquaporin family (Johnson and Carder2012) cyclic adenosine monophosphate (cAMP)-dependentsignaling protein kinase desi (Kawano et al 2010) and poten-tial tissue protection via trehalose sugar accumulation(Thorat et al 2012)

Collectively these functional approaches highlight thecomplexity of insect water balance strategies in controlledbackgrounds but do not explain the variation observed innatural populations and species Resistance evolves rapidlyand is highly heritable (up to 60) in the cosmopolitan andunusually resistant species D melanogaster whereas heritabil-ity is much lower in less tolerant range-restricted species(Hoffmann and Parsons 1989b Kellermann et al 2009)Multiple genome-wide quantitative trait loci were identifiedfrom mapping lines constructed from a natural D melanoga-ster population suggesting a polygenetic architecture (Foleyand Telonis-Scott 2011) Transcriptomics revealed differentialregulation of thousands of genes in response to desiccation inDrosophila mojavensis (Matzkin and Markow 2009) whileartificial selection for desiccation resistance altered the basalexpression of over 200 genes in D melanogaster (Soslashrensenet al 2007)

Previously we used microarray-based genomic hybridiza-tion to survey allele frequency shifts in experimental evolutionlines of D melanogaster recently derived from the field(Telonis-Scott et al 2012) We documented shifts in over600 loci in response to selection for desiccation resistancefollowing a rapid phenotypic response after only 8 genera-tions of selection This variant identification approach waslimited to highlighting candidate genes and regions andnot actual nucleotide polymorphisms apart from those se-quenced post hoc We now expand on this to further explorethe unresolved natural genomic complexity of desiccationresistance using a high-throughput sequencing Pool-GWAS(genome-wide association study) approach (Bastide et al2013) In a GWAS framework for polygenic traits wheremany small effect genes contribute to a phenotype a largesample size is required for adequate power (Mackay et al

2009) Accordingly we sampled natural genetic variationfrom almost 1000 inseminated wild-caught females andcompared the upper 5 desiccation resistant tail fromalmost 10000 F1 progeny with a control sample chosen ran-domly from the same progeny set Harnessing the power ofPool-GWAS we sequenced a pool of 500 natural ldquoresistantrdquogenomes and compared allele frequencies with a pool of 500ldquorandom controlrdquo genomes

We employed a novel comparative approach with the re-sulting set of candidate loci and those detected in our previ-ous artificial selection study of flies collected several seasonsearlier from the same region of southeastern Australia(Telonis-Scott et al 2012) Cross-study repeatability of lociassociated with complex phenotypes is often low (Sarupet al 2011) and can be influenced by genetic backgroundselection intensity epistatic effects and gene-by-environmentinteractions (Mackay et al 2009 Civelek and Lusis 2014)Multiple genetic solutions appear to underlie the array ofdesiccation responses and we further explored this by inves-tigating overlap at the level of individual genes functionalgene ontology (GO) categories and proteinndashprotein interac-tion (PPI) networks Genetic variants contribute to final phe-notypes by way of ldquointermediate phenotypesrdquo (transcriptprotein and metabolite abundances) and correlationsshould theoretically occur across these biological scales(Civelek and Lusis 2014)

Here we report on a screen of natural nucleotide vari-ants associated with desiccation resistance and using a pow-erful analysis approach demonstrate common cross-studysignatures across different hierarchical levels from gene tonetwork and function We found evidence that some func-tional desiccation candidates may be important in wildpopulations and discovered variants linked to multiple phys-iological responses consistent with the traitrsquos complex under-pinnings at both the physiological and molecular levels

Results

Genome Wide Differentiation for Natural DesiccationResistance

We collected over 1000 D melanogaster isofemale linesfrom southern Australia and after one generation of lab-oratory culture screened over 9000 female progeny fordesiccation resistance The top ~5 desiccation-resistantflies were selected for Illumina sequencing (~500 flies)together with a random sampling of the same numberof flies from the families as the ldquocontrolrdquo pool Ourdesign incorporated a subpooling (technical replicate)strategy to both optimize Pool-seq allele frequencyestimates and control for technical bias on allele fre-quency estimates (see Materials and Methods) For thecontrol and desiccation-resistant samples respectively162Mndash259M and 188Mndash289M raw reads were obtainedper technical replicate and an average of 950 of readsmapped to the reference genome after trimming (mean ofall 10 replicates standard deviation [SD] 05)

Variance in allele frequency among the five technical rep-licates for each pool was low (supplementary fig S1

2

Supplementary Material online mean across control repli-cates 00028 median 00015 mean across resistant replicates00027 median 00016) corresponding to a mean SD of~45 around the mean allele frequency The mean pairwisedifference in allele frequency estimates was 55 and 54 forcontrol and desiccation-resistant replicates respectively con-sistent with previously published estimates (4ndash6 Kofleret al 2016)

For a test subset of 1803 randomly sampled single nucle-otide polymorphisms (SNPs) that showed no significant dif-ferentiation between control and desiccation-resistant poolsthe replicates accounted for the majority of total variance inallele frequency Replicates within pool category (control vsdesiccation resistant supplementary fig S1 SupplementaryMaterial online) explained between 15 and 100 (mean87 median 93) of what was a relatively low total variance(mean 00029 median 00019) The mean concordance cor-relation coefficient was 098 for each pool category higherthan a comparable Drosophila study that reported a correla-tion of 0898 between 2 replicates with 20 individuals(Zhu et al 2012) We therefore combined the technical rep-licates into a single control and resistant pool respectively forfurther analyses After processing the mean coverage depth

across the genome was 487 and 568 per sample respec-tively (equivalent to 097 and 11 per individual in thepools of 500 flies)

Allele Frequency Changes

The allele frequencies of 3528158 SNPs were comparedbetween the 2 pools using Fisherrsquos exact tests with apermutation-based false discovery rate (FDR) correctionchosen based on the top 005 of the simulated nullP values Six hundred forty-eight SNPs were differentiatedbetween desiccation resistant and control pools (fig 1 andsupplementary table S1 Supplementary Material online)The SNPs were nonrandomly distributed across chromo-somes (X2

4 frac14 4312 Plt 00001) highly abundant on theX chromosome (X2

1 frac14 3698 Plt 00001) while SNPs wereunderrepresented on both 2L and 3L (2L X2

1 frac14 641 Plt001 3L X2

1 frac14 397 Plt 005) In the desiccation pool theallelic distribution of the 648 SNPs ranged from low tofixed (4ndash100 fig 1B and supplementary table S1Supplementary Material online) The increase in frequencyof the favored ldquodesiccationrdquo alleles compared with the con-trols ranged from 3 to 41 (median increase 14 supple-mentary table S1 Supplementary Material online) where

Fig 1 (A) Manhattan plot of genome-wide P values for differentiation between desiccation-resistant and random control pools shown for the entiregenome where each point represents one SNP SNPs highlighted in red showed differentiation above the 005 threshold level determined from the nullP value distribution (see text for details) SNPs highlighted in blue were above the 005 threshold and also detected in Telonis-Scott et al (2012) (B)Standing allele frequency in the control pool plotted against the frequency change in the selected pool for the 648 differentiated SNPs (FDR 005)

3

the largest shifts tended to occur in common intermediatefrequencies (fig 1B)

Inversion Frequencies

To test whether inversions were associated with desiccationresistance allele frequencies were checked at SNPs diagnosticfor the following inversions In(3R)P (Anderson et al 2005Kapun et al 2014) In(2L)t (Andolfatto et al 1999 Kapun et al2014) In(2R)NS In(3L)P In(3R)C In(3R)K and In(3R)Mo(Kapun et al 2014) In(3L)P In(3R)K and In(3R)Mo werenot detected in this population and for all other inversionsinvestigated the control and desiccation-resistant pool fre-quencies did not differ more than 6

Genomic Distribution

We examined these candidates using a comprehensive ap-proach encompassing SNP gene functional ontology andnetwork levels First we examined SNP locations with respectto gene structure and putative effects of SNPs on genomefeatures The 648 SNPs mapped to 382 genes and were largelynoncoding (79 supplementary table S2 SupplementaryMaterial online) Coding variants comprised nearly 15 ofcandidate SNPs largely synonymous substitutions and theremainder nonsynonymous missense variants predicted toresult in amino acid substitution with length preservation(9 and 14 supplementary table S2 SupplementaryMaterial online) Based on proportions of features from allannotated SNPs compared with the candidates intron vari-ants were significantly underrepresented (X2

1 frac14706 Plt 001supplementary table S2 Supplementary Material online)while SNPs were enriched in 50UTRs 50UTR premature startcodon sites and splice regions (X2

1 frac14 820 Plt 001X2

1 frac14 3124 Plt 00001 X21 frac14 640 Plt 005 respectively

supplementary table S2 Supplementary Material online)

GO Functional Enrichment

For the GO analyses we found no overrepresented termsusing the stringent Gowinda gene mode but identified sev-eral categories using SNP mode with a 005 FDR thresholdGO terms were enriched for the broad categories of ldquochro-matinrdquo ldquochromosomerdquo and more specifically ldquodouble-stranded RNA bindingrdquo although significance of this categorywas due solely to nine differentiated SNPs in the DIP1 geneThe term ldquogamma-secretase complexrdquo was also enriched dueto three SNPs in the pen-2 gene

Desiccation Candidate Genes

Chown et al (2011) comprehensively summarized the pri-mary mechanisms underpinning desiccation resistancefrom the critical first phase of stress sensing and behavioralavoidance to the manifold physiological trajectories availableto counter water stress These include water content varia-tion water loss rates (desiccation resistance) and tolerance ofwater loss (dehydration tolerance) We expanded on thisreview to incorporate recent molecular research and summa-rized the primary mechanisms and known candidate genesrelevant to D melanogaster stress sensing and water balance

(table 1 and references therein) Informed by this body ofwork we applied a ldquotop-downrdquo mechanistic approach toscreening desiccation candidate genes in our data (egfrom initial hygrosensing to downstream key physiologicalpathways table 1) We did not observe differentiation inSNPs mapping to the TRP ion channels the key hygrosensingreceptor genes However we did map SNPs to genes involvedin stress sensing specific to the MT fluid secretion signalingpathways including two of the six cAMP and cyclic guanosinemonophosphate (cGMP) signaling phosphodiesterases dncand pde9 as well as klu indicated in NF-kB ortholog signalingin the renal tubules (table 1) A number of the stress-sensingSAPK pathway genes were differentiated (table 1) Key genesinvolved in the insulin signaling pathway implicated in met-abolic homeostasis and water balance were also differenti-ated including the insulin receptor InR and insulin receptorbinding Dok (table 1)

Network Analysis Modeling Genomic Differentiationfor Desiccation in a PPI Context

Given the genomic complexity underpinning desiccation re-sistance we also implemented a higher-order analysis to pro-vide an overview of potential architectures beyond the SNPand gene levels using PPI networks The SNPs mapped to 382genes which were annotated to 307 seed proteins occurring inthe full background D melanogaster PPI network The seedproteins produced a large first-order network with 3324nodes and 51299 edges

To determine if broader functional signatures could beascertained from the complex network we employed func-tional enrichment analyses on the 3324 nodes (table 2 andsupplementary table S3 Supplementary Material online)Strikingly the network was enriched for pathways involvedin gene expression from transcription through RNA process-ing to RNA transport metabolism and decay (table 2)The network analysis also revealed further complexity ofthe stress response with enrichment of proteins involved inregulating innate insect immunity (table 2) where multiplepathways were overrepresented including cytokine interleu-kin and toll-like receptor signaling cascades (table 2)Developmentalgene regulation signaling pathways were sig-nificant in the pathway analysis the KEGG database showedenrichment for Notch and Mitogen-activated protein kinase(MAPK) signaling while insulin and NGF pathways were high-lighted from the Reactome database (table 2)

Common Signatures of Desiccation Resistance acrossStudies

The key finding of this study is the degree of genomic overlapwith our previous mapping of alleles associated with artifi-cially selected desiccation phenotypes (Telonis-Scott et al2012) Although the flies were sampled from comparablesoutheastern Australian locations the study designs were dif-ferent Specifically the earlier study focused on resistancegenerated by artificial selection whereas this study focusedon natural variation in resistance Despite the differences instudy design we identified 45 genes common to both studies

4

Table 1 Summary of Primary Molecular Responses and Candidate Genes for Desiccation Response Mechanisms Identified in the Literature

Mechanism Molecular Function Genes Current Study(F1 wild-caught)

Telonis-Scott et al(2012) (F8 artificial

selection)

Hygrosensing Sensory moisture receptors (antennae legs gut mouthparts)

Temperature-activatedtransient receptor po-tential ionchannelsbcd

TRP ion channels trpl trp Trpg TrpA1 TrpmTrpml pain pyx wtrwnompC iav nan Amo

mdash trpl

Stress sensing Malpighian tubules (principal cells)

Tubule fluid secretion signaling pathways

cAMP signalingefg cAMP activation Dh44-R2 mdash mdash

cGMP signalingefg cGMP activation Capa CapaRcedilNplp1-4 mdash mdash

Receptors Gyc76c CG33958 CG34357 mdash CG34357

Kinases dg1-2 mdash mdash

cAMP + cGMPsignalingefgh

Hydrolyzing phosphodiesterases dnc pde1c pde6 pde9 pde11 dnc pde9 dnc pde1c pde9 pde11

Calcium signalingeg Calcium-sensitive nitric oxidesynthase

Nos mdash Nos

Desiccation specific Diuretic neuropeptide NFkBsignalingijk

Capa CapaR trpl Relish klu klu klu trpl

Anti-diureticneuropeptidel ITP sNPF Tk mdash ITP sNPF

Stress sensing Stress responsive pathway

Stress activatedprotein kinasepathwaymn

MAPKJun kinase

Aop Aplip1 Atf-2 bsk Btk29Acbt Cdc 42 Cka cno CYLDDok egr Gadd45 hep hppyJra kay medo Mkk4 msnMtl Pak puc pyd Rac 1-2Rho1 shark slpr Src42ASrc64B Tak1 Traf6

Aop Dokhepcedilkaymsn Mtlpuc Rac1Src64B

Mtl slpr

Metabolic homeostasis and water balance

Components of insu-lin signaling pathway

Receptorop InR InR mdash

Receptor binding chico IIp1-8 dock Dok poly Dok Poly

Resistance mechanisms Water loss barriers

Cuticularhydrocarbons

Fatty acid synthasesq desatura-sesr transporters oxidativedecarbonylaset

CG3523 CG3524 desat1-2 FatPCyp4g1

CG3523 mdash

Primary hemolymph sugartissue-protectant

Trehalose metabolism Trehalose-phosphatasebu treh-lase activitybu phosphoglyc-erate mutase

Tps1 Treh Pgm CG5171CG5177 CG6262 crc

Treh Pgm mdash

NOTEmdashCandidate genes differentiated in the current study and in Telonis-Scott et al (2012) are shown and are underlined where overlappingaFor brevity several comprehensive reviews are cited and readers are referred to references therein Note that water budgeting via discontinuous gas exchange was not includeddue to a lack of ldquobona fiderdquo molecular candidates and desiccation tolerance due to aquaporins was omitted as genes were not differentiated in either of our studiesbChown et al (2011)cJohnson and Carder (2012)dFowler and Montell (2013)eDay et al (2005)fDavies et al (2012)gDavies et al (2013)hDavies et al (2014)iTerhzaz et al (2012)jTerhzaz et al (2014)kTerhzaz et al (2015)lKahsai et al (2010)mHuang and Tunnacliffe (2004)nHuang and Tunnacliffe (2005)oSeurooderberg et al (2011)pLiu et al (2015)qChung et al (2014)rSinclair et al (2007)rFoley and Telonis-Scott (2011)tQiu et al (2012)uThorat et al (2012)

5

Conserved Genomic Signatures of Drosophila Desiccation Resistance doi101093molbevmsv349 MBE by guest on February 6 2016

httpmbeoxfordjournalsorg

Dow

nloaded from

out of the 382 and 416 genes for the current and 2012 studyrespectively (Fisherrsquos exact test for overlap P lt 2 1016odds ratio frac14 590 supplementary table S4 SupplementaryMaterial online) Bearing gene length bias in mind we exam-ined the average length of genes in both studies as well as thegene overlap to the average gene length in the Drosophilagenome (supplementary fig S3 Supplementary Materialonline) Although there was bias toward longer genes in theoverlap list this was also apparent in the full gene lists fromboth studies

The overlap genes mapped to all of the major chromo-somes and their differentiated variants were mostly noncod-ing SNPs located in introns (supplementary table S4Supplementary Material online) Two SNPs were synonymous(Strn-Mlick coding for a protein kinase and Cht7 chitinbinding supplementary table S4 Supplementary Materialonline) while a 30UTR variant was predicted to putativelyalter protein structure in a nondisruptive manner mostlikely impacting expression andor protein translational effi-ciency (jbug response to stimulus supplementary table S4Supplementary Material online) One nonsynonymous SNPwas predicted to alter protein function as a missense variant(fab1 signaling supplementary table S4 SupplementaryMaterial online) The increase in frequency of the favoreddesiccation alleles compared with the controls ranged from5 to 27 with a median increase of 164

Despite the strong overlap there were important differ-ences between the two studies For example our previouswork revealed evidence of a selective sweep in the 50 pro-moter region of the Dys gene in the artificially selected linesand we confirmed a higher frequency of the selected SNPs inan independent natural association study in resistant fliesfrom Coffs Harbour (Telonis-Scott et al 2012) In this studywe found no differentiation at the Dys locus between ourresistant and random controls However both resistant andcontrol pools had a high frequency of the selected allele de-tected in Telonis-Scott et al (2012) (supplementary table S5Supplementary Material online)

The 45 genes common to both studies were analyzed forbiological function while many genes are obviously pleiotro-pic (ie assigned multiple GO terms) almost half were in-volved in response to stimuli (cellular behavioral andregulatory) including signaling such as Rho protein signaltransduction and cAMP-mediated signaling (table 3) Othercategories of interest include spiracle morphogenesis sensoryorgan development including development of the compoundeye and stem cell and neuroblast differentiation (table 3)

With regard to specific stress responsedesiccation candi-dates (table 1) 4 of the 45 genes overlapped including thecAMPcGMP signaling phosphodiesterases pde9 and dncMAPK pathway genes Mtl and klu indicated in NF-kBorthologue signaling in the renal tubules (table 1)Furthermore there was overlap among mechanistic catego-ries where different genes in the same pathways weredetected For example additional key cAMPcGMP hydrolyz-ing phosphodiesterases pde1c and pde11 were significant inTelonis-Scott et al (2012) as well as slpr (MAPK pathway)and poly (insulin signaling) (table 1)

Table 2 Pathway Enrichment of the GWAS and Telonis-Scott et alFirst-Order PPI Networks Investigated Using FlyMine (FDR P lt 005)

Current Study GWAS Telonis-Scott et al(2012)

Database Pathway Pathway

KEGG Ribosome RibosomeNotch signaling pathway Notch signaling pathwayMAPK signaling pathwayDorso-ventral axis formationProgesterone-mediated oocyte

maturationWnt signaling pathwayEndocytosisProteasome

Reactome Immune system Immune systemAdaptive immune system Adaptive immune systemInnate immune system Innate immune systemSignaling by interleukins Signaling by interleukins(Toll-like receptor cascades) Toll-like receptor

cascadesTCR signaling(B cell receptor signaling)

Gene expression Gene expressionMetabolism of RNA Metabolism of RNANonsense-mediated decay Nonsense-mediated decayTranscriptionmRNA processing(RNA transport)Metabolism of proteins (Metabolism of

proteins)Translation (Translation)Cell cycle Cell cycle

Cell cycle checkpointsRegulation of mitotic cell

cycleMitotic metaphase and

anaphaseG0 and early G1 G0 and early G1

G1 phase(G1S transition)G2M transition(DNA replicationS

phase)DNA repairDouble-strand break

repairSignaling pathways Signaling pathwaysInsulin signaling pathwaySignaling by NGF Signaling by NGF

Signaling by FGFRSignaling by NOTCHSignaling by WntSignaling by EGFRDevelopmentAxon guidanceMembrane traffickingGap junction trafficking

and regulationProgramed cell deathApoptosisHemostasisHemostasis

NOTEmdashKEGG and Reactome pathway databases were queried All significant termsare reported with Reactome terms grouped together under parent terms in italictext (full list of terms available in supplementary table S3 Supplementary Materialonline) Parent terms without brackets are the names of pathways that were sig-nificantly enriched parent terms in parentheses were not listed as significantlyenriched themselves but are reported as a guide to the category contents

6

In addition to the significant gene-level overlap betweenthe two studies we observed significant overlap between thePPI networks constructed from each set of candidate genes(summarized in fig 2) We examined the overlap by compar-ing the first-order networks 1) at the level of node overlap and2) at the level of edge (interaction) overlap Three hundredseven GWAS seeds formed a network of 3324 nodes whileour previous set of candidates mapped to 337 seeds forming anetwork that comprised 3215 nodes One thousand eighthundred twenty-six or ~55 of the nodes overlapped be-tween networks which was significantly higher than the nulldistribution estimated from 1000 simulations (mean simu-lated overlap SD 1304 143 nodes observed overlap atthe 999th percentile fig 2A and C) The overlap at the inter-action level was not significantly higher than expected (meansimulated overlap SD 25704 4024 edges observed over-lap 30686 edges 89th percentile fig 2B and D) Hive plotswere used to visualize the separate networks (supplementaryfig S4A and B Supplementary Material online) as well as theoverlap between the networks (supplementary fig S4CSupplementary Material online)

Given the high degree of shared nodes from distinct setsof protein seeds we compared the networks for biologicalfunction (fig 2E) Both networks produced long lists of sig-nificantly enriched GO terms (supplementary table S3Supplementary Material online) and assessing overlap was

difficult due to the nonindependent nature of GO hierar-chies However a comparison of functional pathway enrich-ment revealed some differences Although both networksshowed enrichment in the immune system pathways theribosome and Notch and NGF signaling pathways as well assome involvement of gene expression and translation path-ways the network constructed from the Telonis-Scott et al(2012) gene list showed extensive involvement of cell cyclepathway proteins as well as significant enrichment for Wntsignaling EGRF signaling DNA double-strand break repairaxon guidance gap junction trafficking and apoptosis path-ways which were not enriched in the GWAS network TheGWAS network was specifically enriched for dorso-ventralaxis formation oocyte maturation and many more geneexpression pathways than the Telonis-Scott et al (2012)gene network (table 2 and supplementary table S3Supplementary Material online)

Discussion

Genomic Signature of Desiccation Resistance fromNatural Standing Genetic Variation

We report the first large-scale genome-wide screen for nat-ural alleles associated with survival of low humidity condi-tions in D melanogaster By combining a powerful naturalassociation study with high-throughput Pool-seq we

Table 3 Overrepresentation of GO Categories for the 45 Core Candidate Genes Overlapping between the Current and Our Previous(Telonis-Scott et al 2012) Genomic Study

Term ID TermSubterms FDR P valuea Genes

Cellularbehavioral response to stimulus defense and signaling

GO0050896 Response to stimulus (cellular response tostimulus regulation of response tostimulus)

0007 Abd-B dnc fz mam nkd slo ct fru kluRbf klg santa-maria jbug fab1CG33275 Mtl Trim9 cv-c Gprk2 Ect4CG43658

GO0048585 Negative regulation of response to stimu-lus (Rho protein signal transductionsignal transduction)

0042 dnc mam slo fru Trim9 cv-c

GO0023052 Signaling (cAMP-mediated signaling cellcommunication)

0031 Amph Pde9 dnc fz mam nkd ct kluRbf santa-maria fab1 CG33275 MtlTrim9 cv-c Gprk2 Ect4 CG43658

Development

GO0048863 Stem cell differentiation 0014 Abd-D mam nkd klu nub

GO0014016 Neuroblast differentiation 0048

GO0045165 Cell fate commitment 0010 Abd-D dnc fz mam nkd ct klu klg nub

GO0035277 Spiracle morphogenesis open trachealsystem

0010 Abd-D ct cv-c

GO0007423 Sensory organ development (eye wingdisc imaginal disc)

0008 fz mam ct klu klg sdk Amph Nckx30cMtl

GO0048736 Appendage development 0020 fz mam ct fru CG33275 Mtl Fili nubcv-c Gprk2 CG43658

GO0007552 Metamorphosis 0001 fz mam ct fru CG33275 Mtl Fili cv-cGprk2 CG43658

Cellular component organization

GO0030030 Cell projection organization 0020 dnc fz ct fru GluClalpha Nckx30c MtlTrim9 nub cv-c

NOTEmdashNote that most genes are assigned to multiple annotations and therefore are shown purely as a guide to putative functionaBonferronindashHochberg FDR and gene length corrected P values using the Flymine v400 database Gene Ontology Enrichment The terms were condensed and trimmed usingREVIGO

7

mapped SNPs associated with desiccation resistance in 500naturally derived genotypes Although Pool-seq has limitedpower to reveal very low frequency alleles to reconstructphase or adequately estimate allele frequencies in smallpools (Lynch et al 2014) the Pool-GWAS approach hasbeen validated by retrieval of known candidate genes in-volved in body pigmentation using similar numbers of ge-notypes as assessed here (Bastide et al 2013)

Although studies in controlled genetic backgrounds reportlarge effects of single genes on desiccation resistance (Table 1)our current and previous screens revealed similarly largenumbers of variants (over 600) mapping to over 300 genesA polygenic basis of desiccation resistance is consistent withquantitative genetic data for the trait (Foley and Telonis-Scott2011) and we also anticipate more complexity from naturalD melanogaster populations with greater standing geneticvariation than inbred laboratory strains However somefalse positive associations are likely in the GWAS due to thelimitations of quantitative trait mapping approaches Pooledevolve-and-resequence experiments routinely yield vast num-bers of candidate SNPs partly due to high levels of standinglinkage disequilibrium (LD) and hitchhiking neutral alleles

(Schleurootterer et al 2015) Large population sizes from specieswith low levels of LD such as D melanogaster somewhat cir-cumvent this issue and a notable feature of our experimentaldesign is our sampling Here we selected the most desicca-tion-resistant phenotypes directly from substantial standinggenetic variation where one generation of recombinationshould largely reflect natural haplotypes

We observed the largest candidate allele frequency differ-ences in loci that had intermediate allele frequencies in thecontrol pool This is in line with population genetics modelswhich predict that evolution from standing genetic variationwill initially be strongest for intermediate frequency alleleswhich can facilitate rapid change (Long et al 2015) Howeverwe also observed a number of low frequency alleles which ifbeneficial can inflate false positives due to long-range LD(Orozco-terWengel et al 2012 Nuzhdin and Turner 2013Tobler et al 2014) Nonetheless spurious LD associationscannot fully account for our candidate list as evidenced bythe desiccation functional candidates and cross-study over-lap Furthermore chromosome inversion dynamics can alsogenerate both short and long-range LD and contribute nu-merous false positive SNP phenotype associations (Hoffmann

Fig 2 Observed versus simulated network overlap between the GWAS first-order PPI network and the first-order PPI network from the Telonis-Scottet al (2012) gene list Overlap is presented in terms of (A) node number and (B) edge number Networks were simulated by resampling from the entireDrosophila melanogaster gene list (r553) as described in the text The histogram shows the distribution of overlap measures for 1000 simulations Theblack line represents the histogram as density and the blue line shows the corresponding normal distribution The red vertical line shows the observedoverlap from the real networks labeled as a percentile of the normal distribution Area-proportional Venn diagrams summarized the extent of overlapfor PPI networks constructed separately from the GWAS candidates and Telonis-Scott et al (2012) candidates (labeled ldquoprevious studyrdquo) (C) Overlapexpressed as the number of nodes Approximately 55 of the nodes overlapped between the two studies which was significantly higher than simulatedexpectations (999 percentile 2B) (D) Overlap expressed in the number of interactions This was not significantly higher than expected by simulation(89th percentile 2B) (E) Overlap at the level of GO biological process terms (directly compared GO terms by name this was not tested statisticallybecause of the complex hierarchical nature of GO terms and is presented for interest)

8

and Weeks 2007 Tobler et al 2014) but we found little as-sociation between the desiccation pool and inversion fre-quency changes consistent with the lack of latitudinalpatterns in desiccation resistanceand traitinversion fre-quency associations in east Australian D melanogaster(Hoffmann et al 2001 Weeks et al 2002)

Cross-Study Comparison Reveals a Core Set ofCandidates for a Complex Trait

Our data contain many candidate genes functions and path-ways for insect desiccation resistance but partitioning genu-ine adaptive loci from experimental noise remains asignificant challenge We therefore focus our efforts on ournovel comparative analysis where we identified 45 robustcandidates for further exploration Common genetic signa-tures are rarely documented in cross-study comparisons ofcomplex traits (Sarup et al 2011 Huang et al 2012 but cfVermeulen et al 2013) and never before for desiccation re-sistance Some degree of overlap between our studies likelyresults from a shared genetic background from similar sam-pling sites highlighting the replicability of these responsesOne limitation of this approach is the overpopulation ofthe ldquocorerdquo list by longer genes This result reflects the genelength bias inherent to GWAS approaches an issue that is notstraightforward to correct in a GWAS let alone in the com-parative framework applied here A focus on the commongenes functionally linked to the desiccation phenotype canprovide a meaningful framework for future research and alsosuggests that some large-effect candidate genes exhibit natu-ral variation contributing to the phenotype (table 1)

Other feasible candidates include genes expressed inthe MTs (Wang et al 2004) or essential for MT develop-ment (St Pierre et al 2014) providing further researchopportunity focused on the primary stress sensing andfluid secretion tissues Genes include one of the mosthighly expressed MT transcripts CG7084 as well as dncNckx30C slo cv-c and nkd Other aspects of stress resis-tance are also implicated CrebB for example is involvedin regulating insulin-regulated stress resistance and ther-mosensory behavior (Wang et al 2008) nkd also impactslarval cuticle development and shares a PDZ signalingdomain also present in the desiccation candidate desi(Kawano et al 2010 St Pierre et al 2014) Finally severalcore candidates were consistently reported in recent stud-ies suggesting generalized roles in stress responses andfitness including transcriptome studies on MT function(Wang et al 2004) inbreeding and cold resistance(Vermeulen et al 2013) oxidative stress (Weber et al2012) and genomic studies exploring age-specific SNPson fitness traits (Durham et al 2014) and genes likelyunder spatial selection in flies from climatically divergenthabitats (Fabian et al 2012)

Several of the enriched GO categories from the core listalso make biological sense including stimulus response(almost half the suite) defense and signaling particularlyin pathways involved in MT stress sensing (Davies et al 20122013 2014) Functional categories enriched for sensory

organ development were highlighted consistent with envi-ronmental sensing categories from desert Drosophila tran-scriptomics (Matzkin and Markow 2009 Rajpurohit et al2013) and stress detection via phototransduction pathwaysin D melanogaster under a range of stressors including des-iccation (Soslashrensen et al 2007)

Evidence for Conserved Signatures at Higher LevelOrganization

Beyond the gene level we observed a striking degree ofsimilarity between this study and that of Telonis-Scottet al (2012) at the network level where significantly over-lapping protein networks were constructed from largely dif-ferent seed protein sets This supports a scenario where theldquobiological information flow from DNA to phenotyperdquo(Civelek and Lusis 2014) contains inherent redundancywhere alternative genetic solutions underlie phenotypesand functions Resistance to perturbation by genetic or en-vironmental variation appears to increase with increasinghierarchical complexity that is phenotype ldquobufferingrdquo (Fuet al 2009) The same functional network in different pop-ulationsgenetic backgrounds can be impacted but theexact genes and SNPs responding to perturbation will notnecessarily overlap In Drosophila different sets of loci fromthe same founders mapped to the same PPI network in thecase of two complex traits chill coma recovery and startleresponse (Huang et al 2012) Furthermore interspecificshared networkspathways from different genes have beendocumented for numerous complex traits (Emilsson et al2008 Aytes et al 2014 Ichihashi et al 2014) providing op-portunities for investigating taxa where individual genefunction is poorly understood

Although the null distribution is not always obviouswhen comparing networks and their functions partly be-cause of reduced sensitivity from missing interactions(Leiserson et al 2013) here the shared network signaturesportray a plausible multifaceted stress response involvingstress sensing rapid gene accessibility and expression Thepredominance of immune system and stress responsepathways are of interest with reference immunitystresspathway ldquocross-talkrdquo particularly in the MTs whereabiotic stressors elicit strong immunity transcriptionalresponses (Davies et al 2012) The higher order architec-tures reflect crucial aspects of rapid gene expression con-trol in response to stress from chromatin organization(Alexander and Lomvardas 2014) to transcription RNAdecay and translation (reviwed in de Nadal et al 2011) InDrosophila variation in transcriptional regulation of indi-vidual genes can underpin divergent desiccation pheno-types (Chung et al 2014) or be functionally inferred frompatterns of many genes elicited during desiccation stress(Rajpurohit et al 2013) Variants with potential to mod-ulate expression at different stages of gene expressioncontrol were evident in our GWAS data alone for exam-ple enrichment for 50UTR and splice variants at the SNPlevel and for chromatin and chromosome structure termsat the functional level

9

Genetic Overlap Versus Genetic differencesmdashCausesand Implications

Despite the common signatures at the gene and higher orderlevels between our studies pervasive differences were ob-served These can arise from differences in standing geneticvariation in the natural populations where different alleliccompositions impact gene-by-environment interactions andepistasis Our network overlap analyses highlight the potentialfor different molecular trajectories to result in similar func-tions (Leiserson et al 2013) consistent with the multiplephysiological pathways potentially available to D melanoga-ster under water stress (Chown et al 2011) Experimentaldesign also presumably contributed to the study differencesStrong directional selection as applied in the 2012 study canincrease the frequency of rare beneficial alleles and likely re-duced overall diversity than did the single-generation GWASdesign Furthermore selection regimes often impose addi-tional selective pressures (eg food deprivation during desic-cation) resulting in correlated responses such as starvationresistance with desiccation resistance and these factors differfrom the laboratory to the field Finally evidence suggests innatural D melanogaster that standing genetic variation canvary extensively with sampling season (Itoh et al 2010Bergland et al 2014) Although we cannot compare levelsof standing variation between our two studies evidence sug-gests that this may have affected the Dys gene where ourGWAS approach revealed no differentiation between thecontrol and resistant pools at this locus but rather detectedthe alleles associated with the selective sweep in Telonis-Scottet al (2012) in both pools at high frequency Whether thisresulted from sampling the later collection after a longdrought (2004ndash2008 wwwbomgovauclimate last accessedJanuary 13 2016) is unclear but this observation highlightsthe relevance of temporal sampling to standing genetic var-iation when attempting to link complex phenotypes togenotypes

Materials and Methods

Natural Population Sampling

Drosophila melanogaster was collected from vineyard waste atKilchorn Winery Romsey Australia in April 2010 Over 1200isofemale lines were established in the laboratory from wild-caught females Species identification was conducted on maleF1 to remove Drosophila simulans contamination and over900 D melanogaster isofemale lines were retained All flieswere maintained in vials of dextrose cornmeal media at25 C and constant light

Desiccation Assay

From each isofemale line ten inseminated F1 females weresorted by aspiration without CO2 and held in vials until 4ndash5days old For the desiccation assay 10 females per line (over9000 flies in total) were screened by placing groups of femalesinto empty vials covered with gauze and randomly assigninglines into 5 large sealed glass tanks containing trays of silicadesiccant (relative humidity [RH] lt10) The tanks werescored for mortality hourly and the final 558 surviving

individuals were selected for the desiccation-resistant pool(558 flies lt6 of the total) The control pool constitutedthe same number of females sampled from over 200 ran-domly chosen isofemale lines that were frozen prior to thedesiccation assay We chose this random control approachinstead of comparing the phenotypic extremes (ie the latemortality tail vs the ldquoearly mortalityrdquo tail) because unfavor-able environmental conditions experienced by the wild-caught dam may inflate environmental variance due to car-ryover effects in the F1 offspring (Schiffer et al 2013)

DNA Extraction and Sequencing

We used pooled whole-genome sequencing (Pool-seq)(Futschik and Schleurootterer 2010) to estimate and compareallele frequency differences between the most desiccation-resistant and randomly sampled flies from a wild populationto associate naturally segregating candidate SNPs with theresponse to low humidity For each treatment (resistantand control) genomic DNA was extracted from 50 femaleheads using the Qiagen DNeasy Blood and Tissue Kit accord-ing to the manufacturerrsquos instructions (Qiagen HildenGermany) Two extractions were combined to create fivepools per treatment which were barcoded separately tocreate technical replicates (total 10 pools of 100 fliesn frac14 1000) The DNA was fragmented using a CovarisS2 ma-chine (Covaris Inc Woburn MA) and libraries were pre-pared from 1 mg genomic DNA (per pool of 100 flies) usingthe Illumina TruSeq Prep module (Illumina San Diego CA)To minimize the effect of variation across lanes the ten poolswere combined in equal concentration (fig 3) and run mul-tiplexed in each lane of a full flow cell of an Illumina HiSeq2000 sequencer Clusters were generated using the TruSeq PECluster Kit v5 on a cBot and sequenced using Illumina TruSeqSBS v3 chemistry (Illumina) Fragmentation library construc-tion and sequencing were performed at the Micromon Next-Gen Sequencing facility (Monash University ClaytonAustralia)

Fig 3 Experimental design See text for further explanation

10

Data Processing and Mapping

Processing was performed on a high-performance computingcluster using the Rubra pipeline system that makes use of theRuffus Python library (Goodstadt 2010) The final variant-calling pipeline was based on a pipeline developed by ClareSloggett (Sloggett et al 2014) and is publicly available athttpsgithubcomgriffinpGWAS_pipeline (last accessedJanuary 13 2016) Raw sequence reads were processed persample per lane with P1 and P2 read files processed sepa-rately initially Trimming was performed with trimmomatic v030 (Lohse et al 2012) Adapter sequences were also re-moved Leading and trailing bases with a quality scorebelow 30 were trimmed and a sliding window (width frac14 10quality threshold frac14 25) was used Reads shorter than 40 bpafter trimming were discarded Quality before and after trim-ming was examined using FastQC v 0101 (Andrews 2014)

Posttrimming reads that remained paired were used asinput into the alignment step Alignment to the D melano-gaster reference genome version r553 was performed withbwa v 062 (Li and Durbin 2009) using the bwa aln commandand the following options No seeding (-l 150) 1 rate ofmissing alignments given 2 uniform base error rate (-n001) a maximum of 2 gap opens per read (-o 2) a maximumof 12 gap extensions per read (-e 12) and long deletionswithin 12 bp of 30 end disallowed (-d 12) The default valueswere used for all other options Output files were then con-verted to sorted bam files using the bwa sampe commandand the ldquoSortSamrdquo option in Picard v 196

At this point the bam files were merged into one file persample using PicardMerge Duplicates were identified withPicard MarkDuplicates The tool RealignerTargetCreator inGATK v 26-5 (DePristo et al 2011) was applied to the tenbam files simultaneously producing a list of intervals contain-ing probable indels around which local realignment would beperformed This was used as input for the IndelRealigner toolwhich was also applied to all ten bam files simultaneously

To investigate the variation among technical replicatesallele frequency was estimated based on read counts foreach technical replicate separately for 100000 randomly cho-sen SNPs across the genome After excluding candidate des-iccation-resistance SNPs and SNPs that may have been falsepositives due to sequencing error (those with mean frequencylt004 across replicates) 1803 SNPs remained For each locuswe calculated the variance in allele frequency estimate amongthe five control replicates and the variance among the fivedesiccation-resistance replicates We compared this with thevariance due to pool category (control vs desiccation resis-tant) using analysis of variance (ANOVA) We also calculatedthe mean pairwise difference in allele frequency estimateamong control and among desiccation-resistance replicatesand the mean concordance correlation coefficient was calcu-lated over all pairwise comparisons within pool category(using the epiccc function in the epiR package Stevensonet al 2015)

The following approach was then taken to identify SNPsdiffering between the desiccation-resistant and controlgroups Based on the technical replicate analysis we merged

the five ldquodesiccation-resistantrdquo and the five random controlsamples into two bam files A pileup file was created withsamtools v 0119 (Li et al 2009) and converted to a ldquosyncrdquo fileusing the mpileup2sync script in Popoolation2 v 1201 (Kofleret al 2011) retaining reads with a mapping quality of at least20 and bases with a minimum quality score of 20 Reads withunmapped mates were also retained (option ndashA in samtoolsmpileup) For this step we used a repeat-masked version ofthe reference genome created with RepeatMasker 403 (Smitet al 2013-2015) and a repeat file containing all annotatedtransposons from D melanogaster including shared ancestralsequences (Flybase release r557) plus all repetitive elementsfrom RepBase release v1904 for D melanogaster Simple re-peats small RNA genes and bacterial insertions were notmasked (options ndashnolow ndashnorna ndashno_is) The rmblastsearch engine was used

Association Analysis

We then performed Fisherrsquos exact test with the Fisher testscript in Popoolation2 (Kofler et al 2011) We set the min-imum count (of the minor allele) frac14 10 minimum coverage(in both populations) frac14 20 and maximum coverage frac1410000 The test was performed for each SNP (slidingwindow off) Although there are numerous approaches tocalculating genome-wide significance thresholds in aPoolseq framework currently there is no consensus on anappropriate standardized statistical method Methods in-clude simulation approaches (Orozco-terWengel et al2012 Bastide et al 2013 Burke et al 2014) correctionusing a null distribution based on estimating genetic driftfrom replicated artificial selection line allele frequency vari-ance (Turner and Miller 2012) use of a simple thresholdbased on a minimum allele frequency difference (Turneret al 2010) and the FDR correction suggested by Storeyand Tibshirani (2003) (used by Fabian et al 2012)

We determined based on our design to calculate the FDRthreshold by comparison with a simulated distribution of nullP values following the approach of Bastide et al (2013) Foreach SNP locus a P value was calculated by simulating thedistribution of read counts between the major and minorallele according to a beta-binomial distribution with meana(a+b) and variance (ab)((a+b)2(a+b+1)) Thismodel allowed for two types of sampling variationStochastic variation in allele sampling across the phenotypesand the background variation in the representation of allelesin each pool caused by library preparation artifacts or sto-chastic variation driven by unequal coverage of the poolsThese sources of variation have also been recognized else-where as being important in interpreting pooled sequenceallele calls and identifying significant differentiation betweenpooled samples (Lynch et al 2014) The value of afrac14 37 waschosen by chi-square test as the best match to the observedP value distribution (afrac14 10 to afrac14 40 were tested) with acorresponding bfrac14 4315 to reflect the coverage depth differ-ence between the random control (487) and desiccation-resistant (568) pools These values were then used tosimulate a null data set 10 the size of the observed data

11

set Because the null P value distribution still did not fit theobserved P value distribution particularly well (supplemen-tary fig S2 Supplementary Material online) these values werenot used for a standard FDR correction Instead a P valuethreshold was calculated based on the lowest 005 of thenull P values similar in approach to Orozco-terWengel et al(2012)

Inversion Analysis

In D melanogaster several cosmopolitan inversion polymor-phisms show cross-continent latitudinal patterns that explaina substantial proportion of clinal variation for many traitsincluding thermal tolerance (reviewed in Hoffmann andWeeks 2007) Patterns of disequilibria generated betweenloci in the vicinity of the rearrangement can obscure signalsof selection (Hoffmann and Weeks 2007) and inversion fre-quencies are routinely considered in genomic studies ofnatural populations sampled from known latitudinal clines(Fabian et al 2012 Kapun et al 2014) We assessedassociations between inversion frequencies and our controland resistant pools at SNPs diagnostic for the following in-versions In3R(Payne) (Anderson et al 2005 Kapun et al2014) In(2L)t (Andolfatto et al 1999 Kapun et al 2014)In(2R)Ns In(3L)P In(3R)C In(3R)K and In(3R)Mo (Kapunet al 2014) Between one and five diagnostic SNPs were in-vestigated for each inversion (depending on availability) Weconsidered an inversion to be associated with desiccationresistance if the frequency of its diagnostic SNP differed sig-nificantly between the control and desiccation-resistant poolsby a Fisherrsquos exact test

Genome Feature Annotation

The list of candidate SNPs was converted to vcf format using acustom R script setting the alternate allele as the allele thatincreased in frequency from the control to the desiccation-resistant pool and setting the reference allele as the otherallele present in cases where both the major and minor allelesdiffered from the reference genome The genomic features inwhich the candidate SNPs occurred were annotated usingsnpEff v40 first creating a database from the D melanogasterv553 genome annotation with the default approach(Cingolani et al 2012) The default annotation settings wereused except for setting the parameter ldquo-ud 0rdquo to annotateSNPs to genes only if they were within the gene body Using2 tests on SNP counts we tested for over- or under-enrichment of feature types by calculating the proportionof lsquoallrsquo features from all SNPs detected in the study and theproportion of ldquocandidaterdquo SNPs identified as differentiated indesiccation-resistant flies (Fabian et al 2012)

GO Analysis

GO analyses was used to associate the candidates with theirbiological functions (Ashburner et al 2000) Previously devel-oped for biological pathway analysis of global gene expressionstudies (Wang et al 2010) typical GO analysis assumes that allgenes are sampled independently with equal probabilityGWAS violate these criteria where SNPs are more likely to

occur in longer genes than in shorter genes resulting in highertype I error rates in GO categories overpopulated with longgenes Gene length as well as overlapping gene biases (Wanget al 2011) were accounted for using Gowinda (Kofler andSchleurootterer 2012) which implements a permutation strategyto reduce false positives Gowinda does not reconstruct LDbetween SNPs but operates in two modes underpinned bytwo extreme assumptions 1) Gene mode (assumes all SNPs ina gene are in LD) and 2) SNP mode (assumes all SNPs in agene are completely independent)

The candidates were examined in both modes but giventhe stringency of gene mode high-resolution SNP modeoutput was utilized for the final analysis with the followingparameters 106 simulations minimum significance frac141 andminimum genes per GO categoryfrac14 3 (to exclude gene-poorcategories) P values were corrected using the Gowinda FDRalgorithm GO annotations from Flybase r557 were applied

Candidate Gene Analysis

In conjunction with the GO analysis we also examinedknown candidate genes associated with survival under lowRH to better characterize possible biological functions of ourGWAS candidates We took a conservative approach to can-didate gene assignment where the SNPs were mapped withinthe entire gene region without up- and downstream exten-sions Genes were curated from FlyBase and the literature(table 1) predominantly using detailed functional experimen-tal evidence as criteria for functional desiccation candidates

Network Analysis

We next obtained a higher level summary of the candidategene list using PPI network analysis The full candidate genelist was used to construct a first-order interaction networkusing all Drosophila PPIs listed in the Drosophila InteractionDatabase v 2014_10 (avoiding interologs) (Murali et al 2011)This ldquofullrdquo PPI network includes manually curated data frompublished literature and experimental data for 9633 proteinsand 93799 interactions For the network construction theigraph R package was used to build a subgraph from the fullPPI network containing the candidate proteins and their first-order interacting proteins (Csardi and Nepusz 2006) Self-connections were removed Functional enrichment analysiswas conducted on the list of network genes using FlyMinev400 (wwwflymineorg last accessed January 13 2016) withthe following parameters for GO enrichment and KEGGReactome pathway enrichment BenjaminindashHochberg FDRcorrection P lt 005 background frac14 all D melanogastergenes in our full PPI network and Ontology (for GO enrich-ment only) frac14 biological process or molecular function

Cross-Study Comparison Overlap with Our PreviousWork

Following on from the comparison between our gene list andthe candidate genes suggested from the literature we quan-titatively evaluated the degree of overlap between the candi-date genes identified in this study with alleles mapped in ourprevious study of flies collected from the same geographic

12

region (Telonis-Scott et al 2012) The ldquooverlaprdquo analysis wasperformed similarly for the GWAS candidate gene and net-work-level approaches First the gene list overlap was testedbetween the two studies using Fisherrsquos exact test where the382 genes identified from the candidate SNPs (this study)were compared with 416 genes identified from candidatesingle feature polymorphisms (Telonis-Scott et al 2012)The contingency table was constructed using the numberof annotated genes in the D melanogaster R553 genomebuilt as an approximation for the total number of genestested in each study and took the following form

where A frac14 gene list from Telonis-Scott et al (2012)n frac14 416 B frac14 gene list from this study n frac14 382 N frac14 totalnumber of genes in D melanogaster reference genome r553n frac14 17106

To annotate genome features in the overlap candidategene list we used the SNP data (this study) which betterresolves alleles than array-based genotyping Due to this lim-itation we did not compare exact genomic coordinates be-tween the studies but considered overlap to occur whendifferentiated alleles were detected in the same gene inboth studies We did however examine the allele frequenciesaround the Dys-RF promoter which was sequenced sepa-rately revealing a selective sweep in the experimental evolu-tion lines (Telonis-Scott et al 2012)

The overlapping candidates were tested for functional en-richment using a basic gene list approach As genes ratherthan SNPs were analyzed here Fly Mine v400 (wwwflymineorg) was applied with the following parameters Gene lengthnormalization BenjaminindashHochberg FDR correction Plt 005Ontology frac14 biological process and default background (allGO-annotated D melanogaster genes)

Finally we constructed a second PPI network from thecandidate genes mapped in Telonis-Scott et al (2012) to ex-amine system-level overlap between the two studies The 416genes mapped to 337 protein seeds and the network wasconstructed as described above The degree of network over-lap between the two studies was tested using simulation Ineach iteration a gene set of the same length as the list fromthis study and a gene set of the same length as the list fromTelonis-Scott et al (2012) were resampled randomly from theentire D melanogaster mapped gene annotation (r553) Eachresampled set was then used to build a first-order network aspreviously described The overlap network between the tworesampled sets was calculated using the graphintersectioncommand from the igraph package and zero-degree nodes(proteins with no interactions) were removed Overlap wasquantified by counting the number of nodes and the numberof edges in this overlap network The observed overlap wascompared with a simulated null distribution for both nodeand edge number where n frac14 1000 simulation iterations

Supplementary MaterialSupplementary figures S1ndashS4 and tables S1ndashS5 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

We thank Robert Good for the fly sampling and JenniferShirriffs and Lea and Alan Rako for outstanding assistancewith laboratory culture and substantial desiccation assaysWe are grateful to Melissa Davis for advice on network build-ing and comparison approaches and to five anonymous re-viewers whose comments improved the manuscript Thiswork was supported by a DECRA fellowship DE120102575and Monash University Technology voucher scheme toMTS an ARC Future Fellowship and Discovery Grants toCMS an ARC Laureate Fellowship to AAH and a Scienceand Industry Endowment Fund grant to CMS and AAH

ReferencesAlexander JM Lomvardas S 2014 Nuclear architecture as an epigenetic

regulator of neural development and function Neuroscience26439ndash50

Anderson AR Hoffmann AA McKechnie SW Umina PA Weeks AR2005 The latitudinal cline in the In(3R)Payne inversion polymor-phism has shifted in the last 20 years in Australian Drosophilamelanogaster populations Mol Ecol 14851ndash858

Andolfatto P Wall JD Kreitman M 1999 Unusual haplotype structureat the proximal breakpoint of In(2L)t in a natural population ofDrosophila melanogaster Genetics 1531297ndash1311

Andrews S 2014 FastQC Available from httpwwwbioinformaticsbbsrcacukprojectsfastqc

Ashburner M Ball CA Blake JA Botstein D Butler H Cherry JM DavisAP Dolinski K Dwight SS Eppig JT et al 2000 Gene ontology toolfor the unification of biology The Gene Ontology Consortium NatGenet 2525ndash29

Aytes A Mitrofanova A Lefebvre C Alvarez Mariano J Castillo-MartinM Zheng T Eastham James A Gopalan A Pienta Kenneth J ShenMM et al 2014 Cross-species regulatory network analysis identifiesa synergistic interaction between FOXM1 and CENPF that drivesprostate cancer malignancy Cancer Cell 25638ndash651

Bastide H Betancourt A Nolte V Tobler R Steuroobe P Futschik ASchleurootterer C 2013 A genome-wide fine-scale map of natural pig-mentation variation in Drosophila melanogaster PLoS Genet9e1003534

Bergland AO Behrman EL OrsquoBrien KR Schmidt PS Petrov DA 2014Genomic evidence of rapid and stable adaptive oscillations overseasonal time scales in Drosophila PLoS Genet 10e1004775

Bonebrake TC Mastrandrea MD 2010 Tolerance adaptation and pre-cipitation changes complicate latitudinal patterns of climate changeimpacts Proc Natl Acad Sci U S A 10712581ndash12586

Bradley TJ 2009 Animal osmoregulation Oxford Oxford UniversityPress

Burke MK Liti G Long AD 2014 Standing genetic variation drivesrepeatable experimental evolution in outcrossing populations ofSaccharomyces cerevisiae Mol Biol Evol 313228ndash3239

Chown SL 2012 Trait-based approaches to conservation physiologyforecasting environmental change risks from the bottom upPhilos Trans R Soc B 3671615ndash1627

Chown SL Nicolson SW 2004 Insect physiological ecology mechanismsand patterns Oxford Oxford University Press

Chown SL Soslashrensen JG Terblanche JS 2011 Water loss in insectsan environmental change perspective J Insect Physiol571070ndash1084

Not in GWAS In GWAS

Not in Telonis-Scott et al (2012) N ndash intersect-AndashB A-intersect

In Telonis-Scott et al (2012) B-intersect Intersect(AB)

13

Chung H Loehlin DW Dufour HD Vaccarro K Millar JG Carroll SB2014 A single gene affects both ecological divergence and matechoice in Drosophila Science 3431148ndash1151

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly(Austin) 680ndash92

Civelek M Lusis AJ 2014 Systems genetics approaches to understandcomplex traits Nat Rev Genet 1534ndash48

Clusella-Trullas S Blackburn TM Chown SL 2011 Climatic predictors oftemperature performance curve parameters in ectotherms implycomplex responses to climate change Am Nat 177738ndash751

Csardi G Nepusz T 2006 The igraph software package for complexnetwork research InterJournal Complex Systems 1695 Availablefrom httpigraphorg

Davies SA Cabrero P Overend G Aitchison L Sebastian S Terhzaz SDow JA 2014 Cell signalling mechanisms for insect stress toleranceJ Exp Biol 217119ndash128

Davies SA Cabrero P Povsic M Johnston NR Terhzaz S Dow JA 2013Signaling by Drosophila capa neuropeptides Gen Comp Endocrinol18860ndash66

Davies SA Overend G Sebastian S Cundall M Cabrero P Dow JATerhzaz S 2012 Immune and stress response lsquocross-talkrsquo in theDrosophila Malpighian tubule J Insect Physiol 58488ndash497

Day JP Dow JA Houslay MD Davies SA 2005 Cyclic nucleotide phos-phodiesterases in Drosophila melanogaster Biochem J 388333ndash342

de Nadal E Ammerer G Posas F 2011 Controlling gene expression inresponse to stress Nat Rev Genet 12833ndash845

DePristo MA Banks E Poplin R Garimella KV Maguire JR Hartl CPhilippakis AA del Angel G Rivas MA Hanna M et al 2011 Aframework for variation discovery and genotyping using next-gen-eration DNA sequencing data Nat Genet 43491ndash498

Djawdan M Chippindale AK Rose MR Bradley TJ 1998 Metabolic re-serves and evolved stress resistance in Drosophila melanogasterPhysiol Zool 71584ndash594

Durham MF Magwire MM Stone EA Leips J 2014 Genome-wide anal-ysis in Drosophila reveals age-specific effects of SNPs on fitness traitsNat Commun 54338

Emilsson V Thorleifsson G Zhang B Leonardson AS Zink F Zhu JCarlson S Helgason A Walters GB Gunnarsdottir S et al 2008Genetics of gene expression and its effect on disease Nature452423ndash428

Fabian DK Kapun M Nolte V Kofler R Schmidt PS Schleurootterer C FlattT 2012 Genome-wide patterns of latitudinal differentiation amongpopulations of Drosophila melanogaster from North America MolEcol 214748ndash4769

Foley BR Telonis-Scott M 2011 Quantitative genetic analysis suggestscausal association between cuticular hydrocarbon composition anddesiccation survival in Drosophila melanogaster Heredity 10668ndash77

Folk DG Han C Bradley TJ 2001 Water acquisition and partitioning inDrosophila melanogaster effects of selection for desiccation-resis-tance J Exp Biol 2043323ndash3331

Fowler MA Montell C 2013 Drosophila TRP channels and animal be-havior Life Sci 92394ndash403

Fu J Keurentjes JJB Bouwmeester H America T Verstappen FWA WardJL Beale MH de Vos RCH Dijkstra M Scheltema RA et al 2009System-wide molecular evidence for phenotypic buffering inArabidopsis Nat Genet 41166ndash167

Futschik A Schleurootterer C 2010 The next generation of molecular mar-kers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Gibbs AG Fukuzato F Matzkin LM 2003 Evolution of water conserva-tion mechanisms in Drosophila J Exp Biol 2061183ndash1192

Gibbs AG Matzkin LM 2001 Evolution of water balance in the genusDrosophila J Exp Biol 2042331ndash2338

Goodstadt L 2010 Ruffus a lightweight Python library for computa-tional pipelines Bioinformatics 262778ndash2779

Hadley NF 1994 Water relations of terrestrial arthropods San Diego(CA) Academic Press

Hoffmann AA Hallas R Sinclair C Mitrovski P 2001 Levels of variationin stress resistance in Drosophila among strains local populationsand geographic regions patterns for desiccation starvation coldresistance and associated traits Evolution 551621ndash1630

Hoffmann AA Parsons PA 1989a An integrated approach to environ-mental stress tolerance and life-history variation desiccation toler-ance in Drosophila Biol J Linn Soc 37117ndash136

Hoffmann AA Parsons PA 1989b Selection for increased desiccationresistance in Drosophila melanogaster additive genetic control andcorrelated responses for other stresses Genetics 122837ndash845

Hoffmann AA Sgro CM 2011 Climate change and evolutionary adap-tation Nature 470479ndash485

Hoffmann AA Weeks AR 2007 Climatic selection on genes and traitsafter a 100 year-old invasion a critical look at the temperate-tropicalclines in Drosophila melanogaster from eastern Australia Genetica129133ndash147

Huang W Richards S Carbone MA Zhu D Anholt RR Ayroles JFDuncan L Jordan KW Lawrence F Magwire MM et al 2012Epistasis dominates the genetic architecture of Drosophila quanti-tative traits Proc Natl Acad Sci U S A 10915553ndash15559

Huang Z Tunnacliffe A 2004 Response of human cells to desiccationcomparison with hyperosmotic stress response J Physiol558181ndash191

Huang Z Tunnacliffe A 2005 Gene induction by desiccation stress inhuman cell cultures FEBS Lett 5794973ndash4977

Ichihashi Y Aguilar-Martınez JA Farhi M Chitwood DH Kumar RMillon LV Peng J Maloof JN Sinha NR 2014 Evolutionary develop-mental transcriptomics reveals a gene network module regulatinginterspecific diversity in plant leaf shape Proc Natl Acad Sci U S A111E2616ndashE2621

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Johnson WA Carder JW 2012 Drosophila nociceptors mediate larvalaversion to dry surface environments utilizing both the painless TRPchannel and the DEGENaC subunit PPK1 PLoS One 7e32878

Kahsai L Kapan N Dircksen H Winther AM Nassel DR 2010 Metabolicstress responses in Drosophila are modulated by brain neurosecre-tory cells that produce multiple neuropeptides PLoS One 5e11480

Kapun M van Schalkwyk H McAllister B Flatt T Schleurootterer C 2014Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogasterMol Ecol 231813ndash1827

Kawano T Shimoda M Matsumoto H Ryuda M Tsuzuki S Hayakawa Y2010 Identification of a gene Desiccate contributing to desicca-tion resistance in Drosophila melanogaster J Biol Chem28538889ndash38897

Kellermann V van Heerwaarden B Sgro CM Hoffmann AA 2009Fundamental evolutionary limits in ecological traits driveDrosophila species distributions Science 3251244ndash1246

Kofler R Nolte V Schleurootterer C 2016 The impact of library preparationprotocols on the consistency of allele frequency estimates in Pool-Seq data Mol Ecol Resour 16118ndash122

Kofler R Pandey RV Schleurootterer C 2011 PoPoolation2 identifying dif-ferentiation between populations using sequencing of pooled DNAsamples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schleurootterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics282084ndash2085

Leiserson MDM Eldridge JV Ramachandran S Raphael BJ 2013Network analysis of GWAS data Curr Opin Genet Dev 23602ndash610

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

14

Liu Y Luo J Carlsson MA Neuroassel DR 2015 Serotonin and insulin-likepeptides modulate leucokinin-producing neurons that affectfeeding and water homeostasis in Drosophila J Comp Neurol5231840ndash1863

Lohse M Bolger AM Nagel A Fernie AR Lunn JE Stitt M Usadel B 2012RobiNA a user-friendly integrated software solution for RNA-Seq-based transcriptomics Nucleic Acids Res 40W622ndashW627

Long A Liti G Luptak A Tenaillon O 2015 Elucidating the moleculararchitecture of adaptation via evolve and resequence experimentsNat Rev Genet 16567ndash582

Lynch M Bost D Wilson S Maruki T Harrison S 2014 Population-genetic inference from pooled-sequencing data Genome Biol Evol61210ndash1218

Mackay TFC Stone EA Ayroles JF 2009 The genetics of quantitativetraits challenges and prospects Nat Rev Genet 10565ndash577

Marron MT Markow TA Kain KJ Gibbs AG 2003 Effects of starvationand desiccation on energy metabolism in desert and mesicDrosophila J Insect Physiol 49261ndash270

Matzkin LM Markow TA 2009 Transcriptional regulation of metabo-lism associated with the increased desiccation resistance of thecactophilic Drosophila mojavensis Genetics 1821279ndash1288

Murali T Pacifico S Yu J Guest S Roberts GG Finley RL Jr 2011 DroID2011 a comprehensive integrated resource for protein transcrip-tion factor RNA and gene interactions for Drosophila Nucleic AcidsRes 39D736ndashD743

Nuzhdin SV Turner TL 2013 Promises and limitations of hitchhikingmapping Curr Opin Genet Dev 23694ndash699

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Picard Picard A set of Java command line tools for manipulating high-throughput seqeuncing (HTS) data and formats Available fromhttpbroadinstitutegithubiopicard

Qiu Y Tittiger C Wicker-Thomas C Le Goff G Young S Wajnberg EFricaux T Taquet N Blomquist GJ Feyereisen R 2012 An insect-specific P450 oxidative decarbonylase for cuticular hydrocarbon bio-synthesis Proc Natl Acad Sci U S A 10914858ndash14863

Rajpurohit S Oliveira CC Etges WJ Gibbs AG 2013 Functional genomicand phenotypic responses to desiccation in natural populations of adesert Drosophilid Mol Ecol 222698ndash2715

Sarup P Soslashrensen JG Kristensen TN Hoffmann AA Loeschcke V PaigeKN Soslashrensen P 2011 Candidate genes detected in transcriptomestudies are strongly dependent on genetic background PLoS One6e15644

Schiffer M Hangartner S Hoffmann AA 2013 Assessing the relativeimportance of environmental effects carry-over effects and speciesdifferences in thermal stress resistance a comparison ofDrosophilids across field and laboratory generations J Exp Biol2163790ndash3798

Schleurootterer C Kofler R Versace E Tobler R Franssen SU 2015Combining experimental evolution with next-generation sequen-cing A powerful tool to study adaptation from standing geneticvariation Heredity 114431ndash440

Sinclair BJ Gibbs AG Roberts SP 2007 Gene transcription during ex-posure to and recovery from cold and desiccation stress inDrosophila melanogaster Insect Mol Biol 16435ndash443

Sloggett C Wakefield M Philip G Pope B 2014 Rubramdashflexible distrib-uted pipelines figshare Available from httpdxdoiorg106084m9figshare895626

Smit AFA Hubley R Green P 2013-2015 RepeatMasker Open-403Available from httpwwwrepeatmaskerorg

Seurooderberg JA Birse RT Neuroassel DR 2011 Insulin production and signalingin renal tubules of Drosophila is under control of tachykinin-relatedpeptide and regulates stress resistance PLoS One 6e19866

Soslashrensen JG Nielsen MM Loeschcke V 2007 Gene expression profileanalysis of Drosophila melanogaster selected for resistance to envi-ronmental stressors J Evol Biol 201624ndash1636

St Pierre SE Ponting L Stefancsik R McQuilton P 2014 FlyBase102mdashadvanced approaches to interrogating FlyBase Nucleic AcidsRes 42D780ndashD788

Stevenson M Nunes T Heuser C Marshall J Sanchez J Thornton RReiczigel J Robison-Cox J Sebastiani P Solymos P et al 2015 epiRTools for the analysis of epidemiological data R package version09-62 Available from httpCRANR-projectorgpackagefrac14epiR

Storey JD Tibshirani R 2003 Statistical significance for genomewidestudies Proc Natl Acad Sci U S A 1009440ndash9445

Telonis-Scott M Gane M DeGaris S Sgro CM Hoffmann AA 2012 Highresolution mapping of candidate alleles for desiccation resistancein Drosophila melanogaster under selection Mol Biol Evol291335ndash1351

Telonis-Scott M Guthridge KM Hoffmann AA 2006 A new set oflaboratory-selected Drosophila melanogaster lines for the analysisof desiccation resistance response to selection physiology and cor-related responses J Exp Biol 2091837ndash1847

Terhzaz S Cabrero P Robben JH Radford JC Hudson BD Milligan GDow JA Davies SA 2012 Mechanism and function of Drosophilacapa GPCR a desiccation stress-responsive receptor with functionalhomology to human neuromedinU receptor PLoS One 7e29897

Terhzaz S Overend G Sebastian S Dow JA Davies SA 2014 The Dmelanogaster capa-1 neuropeptide activates renal NF-kB signalingPeptides 53218ndash224

Terhzaz S Teets NM Cabrero P Henderson L Ritchie MG Nachman RJDow JA Denlinger DL Davies SA 2015 Insect capa neuropeptidesimpact desiccation and cold tolerance Proc Natl Acad Sci U S A1122882ndash2887

Thorat LJ Gaikwad SM Nath BB 2012 Trehalose as an indicator ofdesiccation stress in Drosophila melanogaster larvae a potentialmarker of anhydrobiosis Biochem Biophys Res Commun419638ndash642

Tobler R Franssen SU Kofler R Orozco-terWengel P Nolte VHermisson J Schl eurootterer C 2014 Massive habitat-specific ge-nomic response in D melanogaster populations during exper-imental evolution in hot and cold environments Mol Biol Evol31364ndash375

Turner TL Bourne EC Von Wettberg EJ Hu TT Nuzhdin SV 2010Population resequencing reveals local adaptation of Arabidopsislyrata to serpentine soils Nat Genet 42260ndash263

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Vermeulen CJ Soslashrensen P Kirilova Gagalova K Loeschcke V 2013Transcriptomic analysis of inbreeding depression in cold-sensitiveDrosophila melanogaster shows upregulation of the immune re-sponse J Evol Biol 261890ndash1902

Wang B Goode J Best J Meltzer J Schilman PE Chen J Garza D ThomasJB Montminy M 2008 The insulin-regulated CREB coactivatorTORC promotes stress resistance in Drosophila Cell Metab7434ndash444

Wang J Kean L Yang J Allan AK Davies SA Herzyk P Dow JA 2004Function-informed transcriptome analysis of Drosophila renaltubule Genome Biol 5R69

Wang K Li M Hakonarson H 2010 Analysing biological pathways ingenome-wide association studies Nat Rev Genet 11843ndash854

Wang L Jia P Wolfinger RD Chen X Zhao Z 2011 Gene set analysis ofgenome-wide association studies methodological issues and per-spectives Genomics 981ndash8

Weber AL Khan GF Magwire MM Tabor CL Mackay TF Anholt RR2012 Genome-wide association analysis of oxidative stress resistancein Drosophila melanogaster PLoS One 7e34745

Weeks AR McKechnie SW Hoffmann AA 2002 Dissecting adaptiveclinal variation markers inversions and sizestress associations inDrosophila melanogaster from a central field population Ecol Lett5756ndash763

Zhu Y Bergland AO Gonzalez J Petrov DA 2012 Empirical validation ofpooled whole genome population re-sequencing in Drosophila mel-anogaster PLoS One 7e41901

15

Page 2: Cross-Study Comparison Reveals Common Genomic, Network, … · 2016-02-06 · resistance using a high-throughput sequencing Pool-GWAS (genome-wide association study) approach ( Bastide

signaling in ion and water balance regulated by excretion viathe Malpighian tubules (MTs) and gut absorption has beenshown to impact desiccation resistance in D melanogaster(Kahsai et al 2010 Terhzaz et al 2012 2014 2015) Specificallyknockdown of the Capability (Capa) neuropeptide gene en-hanced desiccation resistance by reductions in respiratorycuticular and excretory water loss while also cross-conferringcold tolerance (Terhzaz et al 2015) Cuticular hydrocarbon(CHC) levels are also implicated a single gene encoding a fattyacid synthase mFAS was shown to impact both desiccationresistance and mate choice in Australian rainforest endemicsDrosophila serrata and Drosophila birchii (Chung et al 2014)In D melanogaster knockdown of CYP4G1 also reduced CHCproduction and impaired survival under low humidity (Qiuet al 2012) Metabolic signaling has also been shown toimpact desiccation resistance and includes components ofthe insulin signaling pathways (Seurooderberg et al 2011 Liu et al2015) Several other single gene studies highlight differentmechanisms such as desiccation avoidance in larvae via no-ciceptors encoded by members of the transient receptorpotential (TRP) aquaporin family (Johnson and Carder2012) cyclic adenosine monophosphate (cAMP)-dependentsignaling protein kinase desi (Kawano et al 2010) and poten-tial tissue protection via trehalose sugar accumulation(Thorat et al 2012)

Collectively these functional approaches highlight thecomplexity of insect water balance strategies in controlledbackgrounds but do not explain the variation observed innatural populations and species Resistance evolves rapidlyand is highly heritable (up to 60) in the cosmopolitan andunusually resistant species D melanogaster whereas heritabil-ity is much lower in less tolerant range-restricted species(Hoffmann and Parsons 1989b Kellermann et al 2009)Multiple genome-wide quantitative trait loci were identifiedfrom mapping lines constructed from a natural D melanoga-ster population suggesting a polygenetic architecture (Foleyand Telonis-Scott 2011) Transcriptomics revealed differentialregulation of thousands of genes in response to desiccation inDrosophila mojavensis (Matzkin and Markow 2009) whileartificial selection for desiccation resistance altered the basalexpression of over 200 genes in D melanogaster (Soslashrensenet al 2007)

Previously we used microarray-based genomic hybridiza-tion to survey allele frequency shifts in experimental evolutionlines of D melanogaster recently derived from the field(Telonis-Scott et al 2012) We documented shifts in over600 loci in response to selection for desiccation resistancefollowing a rapid phenotypic response after only 8 genera-tions of selection This variant identification approach waslimited to highlighting candidate genes and regions andnot actual nucleotide polymorphisms apart from those se-quenced post hoc We now expand on this to further explorethe unresolved natural genomic complexity of desiccationresistance using a high-throughput sequencing Pool-GWAS(genome-wide association study) approach (Bastide et al2013) In a GWAS framework for polygenic traits wheremany small effect genes contribute to a phenotype a largesample size is required for adequate power (Mackay et al

2009) Accordingly we sampled natural genetic variationfrom almost 1000 inseminated wild-caught females andcompared the upper 5 desiccation resistant tail fromalmost 10000 F1 progeny with a control sample chosen ran-domly from the same progeny set Harnessing the power ofPool-GWAS we sequenced a pool of 500 natural ldquoresistantrdquogenomes and compared allele frequencies with a pool of 500ldquorandom controlrdquo genomes

We employed a novel comparative approach with the re-sulting set of candidate loci and those detected in our previ-ous artificial selection study of flies collected several seasonsearlier from the same region of southeastern Australia(Telonis-Scott et al 2012) Cross-study repeatability of lociassociated with complex phenotypes is often low (Sarupet al 2011) and can be influenced by genetic backgroundselection intensity epistatic effects and gene-by-environmentinteractions (Mackay et al 2009 Civelek and Lusis 2014)Multiple genetic solutions appear to underlie the array ofdesiccation responses and we further explored this by inves-tigating overlap at the level of individual genes functionalgene ontology (GO) categories and proteinndashprotein interac-tion (PPI) networks Genetic variants contribute to final phe-notypes by way of ldquointermediate phenotypesrdquo (transcriptprotein and metabolite abundances) and correlationsshould theoretically occur across these biological scales(Civelek and Lusis 2014)

Here we report on a screen of natural nucleotide vari-ants associated with desiccation resistance and using a pow-erful analysis approach demonstrate common cross-studysignatures across different hierarchical levels from gene tonetwork and function We found evidence that some func-tional desiccation candidates may be important in wildpopulations and discovered variants linked to multiple phys-iological responses consistent with the traitrsquos complex under-pinnings at both the physiological and molecular levels

Results

Genome Wide Differentiation for Natural DesiccationResistance

We collected over 1000 D melanogaster isofemale linesfrom southern Australia and after one generation of lab-oratory culture screened over 9000 female progeny fordesiccation resistance The top ~5 desiccation-resistantflies were selected for Illumina sequencing (~500 flies)together with a random sampling of the same numberof flies from the families as the ldquocontrolrdquo pool Ourdesign incorporated a subpooling (technical replicate)strategy to both optimize Pool-seq allele frequencyestimates and control for technical bias on allele fre-quency estimates (see Materials and Methods) For thecontrol and desiccation-resistant samples respectively162Mndash259M and 188Mndash289M raw reads were obtainedper technical replicate and an average of 950 of readsmapped to the reference genome after trimming (mean ofall 10 replicates standard deviation [SD] 05)

Variance in allele frequency among the five technical rep-licates for each pool was low (supplementary fig S1

2

Supplementary Material online mean across control repli-cates 00028 median 00015 mean across resistant replicates00027 median 00016) corresponding to a mean SD of~45 around the mean allele frequency The mean pairwisedifference in allele frequency estimates was 55 and 54 forcontrol and desiccation-resistant replicates respectively con-sistent with previously published estimates (4ndash6 Kofleret al 2016)

For a test subset of 1803 randomly sampled single nucle-otide polymorphisms (SNPs) that showed no significant dif-ferentiation between control and desiccation-resistant poolsthe replicates accounted for the majority of total variance inallele frequency Replicates within pool category (control vsdesiccation resistant supplementary fig S1 SupplementaryMaterial online) explained between 15 and 100 (mean87 median 93) of what was a relatively low total variance(mean 00029 median 00019) The mean concordance cor-relation coefficient was 098 for each pool category higherthan a comparable Drosophila study that reported a correla-tion of 0898 between 2 replicates with 20 individuals(Zhu et al 2012) We therefore combined the technical rep-licates into a single control and resistant pool respectively forfurther analyses After processing the mean coverage depth

across the genome was 487 and 568 per sample respec-tively (equivalent to 097 and 11 per individual in thepools of 500 flies)

Allele Frequency Changes

The allele frequencies of 3528158 SNPs were comparedbetween the 2 pools using Fisherrsquos exact tests with apermutation-based false discovery rate (FDR) correctionchosen based on the top 005 of the simulated nullP values Six hundred forty-eight SNPs were differentiatedbetween desiccation resistant and control pools (fig 1 andsupplementary table S1 Supplementary Material online)The SNPs were nonrandomly distributed across chromo-somes (X2

4 frac14 4312 Plt 00001) highly abundant on theX chromosome (X2

1 frac14 3698 Plt 00001) while SNPs wereunderrepresented on both 2L and 3L (2L X2

1 frac14 641 Plt001 3L X2

1 frac14 397 Plt 005) In the desiccation pool theallelic distribution of the 648 SNPs ranged from low tofixed (4ndash100 fig 1B and supplementary table S1Supplementary Material online) The increase in frequencyof the favored ldquodesiccationrdquo alleles compared with the con-trols ranged from 3 to 41 (median increase 14 supple-mentary table S1 Supplementary Material online) where

Fig 1 (A) Manhattan plot of genome-wide P values for differentiation between desiccation-resistant and random control pools shown for the entiregenome where each point represents one SNP SNPs highlighted in red showed differentiation above the 005 threshold level determined from the nullP value distribution (see text for details) SNPs highlighted in blue were above the 005 threshold and also detected in Telonis-Scott et al (2012) (B)Standing allele frequency in the control pool plotted against the frequency change in the selected pool for the 648 differentiated SNPs (FDR 005)

3

the largest shifts tended to occur in common intermediatefrequencies (fig 1B)

Inversion Frequencies

To test whether inversions were associated with desiccationresistance allele frequencies were checked at SNPs diagnosticfor the following inversions In(3R)P (Anderson et al 2005Kapun et al 2014) In(2L)t (Andolfatto et al 1999 Kapun et al2014) In(2R)NS In(3L)P In(3R)C In(3R)K and In(3R)Mo(Kapun et al 2014) In(3L)P In(3R)K and In(3R)Mo werenot detected in this population and for all other inversionsinvestigated the control and desiccation-resistant pool fre-quencies did not differ more than 6

Genomic Distribution

We examined these candidates using a comprehensive ap-proach encompassing SNP gene functional ontology andnetwork levels First we examined SNP locations with respectto gene structure and putative effects of SNPs on genomefeatures The 648 SNPs mapped to 382 genes and were largelynoncoding (79 supplementary table S2 SupplementaryMaterial online) Coding variants comprised nearly 15 ofcandidate SNPs largely synonymous substitutions and theremainder nonsynonymous missense variants predicted toresult in amino acid substitution with length preservation(9 and 14 supplementary table S2 SupplementaryMaterial online) Based on proportions of features from allannotated SNPs compared with the candidates intron vari-ants were significantly underrepresented (X2

1 frac14706 Plt 001supplementary table S2 Supplementary Material online)while SNPs were enriched in 50UTRs 50UTR premature startcodon sites and splice regions (X2

1 frac14 820 Plt 001X2

1 frac14 3124 Plt 00001 X21 frac14 640 Plt 005 respectively

supplementary table S2 Supplementary Material online)

GO Functional Enrichment

For the GO analyses we found no overrepresented termsusing the stringent Gowinda gene mode but identified sev-eral categories using SNP mode with a 005 FDR thresholdGO terms were enriched for the broad categories of ldquochro-matinrdquo ldquochromosomerdquo and more specifically ldquodouble-stranded RNA bindingrdquo although significance of this categorywas due solely to nine differentiated SNPs in the DIP1 geneThe term ldquogamma-secretase complexrdquo was also enriched dueto three SNPs in the pen-2 gene

Desiccation Candidate Genes

Chown et al (2011) comprehensively summarized the pri-mary mechanisms underpinning desiccation resistancefrom the critical first phase of stress sensing and behavioralavoidance to the manifold physiological trajectories availableto counter water stress These include water content varia-tion water loss rates (desiccation resistance) and tolerance ofwater loss (dehydration tolerance) We expanded on thisreview to incorporate recent molecular research and summa-rized the primary mechanisms and known candidate genesrelevant to D melanogaster stress sensing and water balance

(table 1 and references therein) Informed by this body ofwork we applied a ldquotop-downrdquo mechanistic approach toscreening desiccation candidate genes in our data (egfrom initial hygrosensing to downstream key physiologicalpathways table 1) We did not observe differentiation inSNPs mapping to the TRP ion channels the key hygrosensingreceptor genes However we did map SNPs to genes involvedin stress sensing specific to the MT fluid secretion signalingpathways including two of the six cAMP and cyclic guanosinemonophosphate (cGMP) signaling phosphodiesterases dncand pde9 as well as klu indicated in NF-kB ortholog signalingin the renal tubules (table 1) A number of the stress-sensingSAPK pathway genes were differentiated (table 1) Key genesinvolved in the insulin signaling pathway implicated in met-abolic homeostasis and water balance were also differenti-ated including the insulin receptor InR and insulin receptorbinding Dok (table 1)

Network Analysis Modeling Genomic Differentiationfor Desiccation in a PPI Context

Given the genomic complexity underpinning desiccation re-sistance we also implemented a higher-order analysis to pro-vide an overview of potential architectures beyond the SNPand gene levels using PPI networks The SNPs mapped to 382genes which were annotated to 307 seed proteins occurring inthe full background D melanogaster PPI network The seedproteins produced a large first-order network with 3324nodes and 51299 edges

To determine if broader functional signatures could beascertained from the complex network we employed func-tional enrichment analyses on the 3324 nodes (table 2 andsupplementary table S3 Supplementary Material online)Strikingly the network was enriched for pathways involvedin gene expression from transcription through RNA process-ing to RNA transport metabolism and decay (table 2)The network analysis also revealed further complexity ofthe stress response with enrichment of proteins involved inregulating innate insect immunity (table 2) where multiplepathways were overrepresented including cytokine interleu-kin and toll-like receptor signaling cascades (table 2)Developmentalgene regulation signaling pathways were sig-nificant in the pathway analysis the KEGG database showedenrichment for Notch and Mitogen-activated protein kinase(MAPK) signaling while insulin and NGF pathways were high-lighted from the Reactome database (table 2)

Common Signatures of Desiccation Resistance acrossStudies

The key finding of this study is the degree of genomic overlapwith our previous mapping of alleles associated with artifi-cially selected desiccation phenotypes (Telonis-Scott et al2012) Although the flies were sampled from comparablesoutheastern Australian locations the study designs were dif-ferent Specifically the earlier study focused on resistancegenerated by artificial selection whereas this study focusedon natural variation in resistance Despite the differences instudy design we identified 45 genes common to both studies

4

Table 1 Summary of Primary Molecular Responses and Candidate Genes for Desiccation Response Mechanisms Identified in the Literature

Mechanism Molecular Function Genes Current Study(F1 wild-caught)

Telonis-Scott et al(2012) (F8 artificial

selection)

Hygrosensing Sensory moisture receptors (antennae legs gut mouthparts)

Temperature-activatedtransient receptor po-tential ionchannelsbcd

TRP ion channels trpl trp Trpg TrpA1 TrpmTrpml pain pyx wtrwnompC iav nan Amo

mdash trpl

Stress sensing Malpighian tubules (principal cells)

Tubule fluid secretion signaling pathways

cAMP signalingefg cAMP activation Dh44-R2 mdash mdash

cGMP signalingefg cGMP activation Capa CapaRcedilNplp1-4 mdash mdash

Receptors Gyc76c CG33958 CG34357 mdash CG34357

Kinases dg1-2 mdash mdash

cAMP + cGMPsignalingefgh

Hydrolyzing phosphodiesterases dnc pde1c pde6 pde9 pde11 dnc pde9 dnc pde1c pde9 pde11

Calcium signalingeg Calcium-sensitive nitric oxidesynthase

Nos mdash Nos

Desiccation specific Diuretic neuropeptide NFkBsignalingijk

Capa CapaR trpl Relish klu klu klu trpl

Anti-diureticneuropeptidel ITP sNPF Tk mdash ITP sNPF

Stress sensing Stress responsive pathway

Stress activatedprotein kinasepathwaymn

MAPKJun kinase

Aop Aplip1 Atf-2 bsk Btk29Acbt Cdc 42 Cka cno CYLDDok egr Gadd45 hep hppyJra kay medo Mkk4 msnMtl Pak puc pyd Rac 1-2Rho1 shark slpr Src42ASrc64B Tak1 Traf6

Aop Dokhepcedilkaymsn Mtlpuc Rac1Src64B

Mtl slpr

Metabolic homeostasis and water balance

Components of insu-lin signaling pathway

Receptorop InR InR mdash

Receptor binding chico IIp1-8 dock Dok poly Dok Poly

Resistance mechanisms Water loss barriers

Cuticularhydrocarbons

Fatty acid synthasesq desatura-sesr transporters oxidativedecarbonylaset

CG3523 CG3524 desat1-2 FatPCyp4g1

CG3523 mdash

Primary hemolymph sugartissue-protectant

Trehalose metabolism Trehalose-phosphatasebu treh-lase activitybu phosphoglyc-erate mutase

Tps1 Treh Pgm CG5171CG5177 CG6262 crc

Treh Pgm mdash

NOTEmdashCandidate genes differentiated in the current study and in Telonis-Scott et al (2012) are shown and are underlined where overlappingaFor brevity several comprehensive reviews are cited and readers are referred to references therein Note that water budgeting via discontinuous gas exchange was not includeddue to a lack of ldquobona fiderdquo molecular candidates and desiccation tolerance due to aquaporins was omitted as genes were not differentiated in either of our studiesbChown et al (2011)cJohnson and Carder (2012)dFowler and Montell (2013)eDay et al (2005)fDavies et al (2012)gDavies et al (2013)hDavies et al (2014)iTerhzaz et al (2012)jTerhzaz et al (2014)kTerhzaz et al (2015)lKahsai et al (2010)mHuang and Tunnacliffe (2004)nHuang and Tunnacliffe (2005)oSeurooderberg et al (2011)pLiu et al (2015)qChung et al (2014)rSinclair et al (2007)rFoley and Telonis-Scott (2011)tQiu et al (2012)uThorat et al (2012)

5

Conserved Genomic Signatures of Drosophila Desiccation Resistance doi101093molbevmsv349 MBE by guest on February 6 2016

httpmbeoxfordjournalsorg

Dow

nloaded from

out of the 382 and 416 genes for the current and 2012 studyrespectively (Fisherrsquos exact test for overlap P lt 2 1016odds ratio frac14 590 supplementary table S4 SupplementaryMaterial online) Bearing gene length bias in mind we exam-ined the average length of genes in both studies as well as thegene overlap to the average gene length in the Drosophilagenome (supplementary fig S3 Supplementary Materialonline) Although there was bias toward longer genes in theoverlap list this was also apparent in the full gene lists fromboth studies

The overlap genes mapped to all of the major chromo-somes and their differentiated variants were mostly noncod-ing SNPs located in introns (supplementary table S4Supplementary Material online) Two SNPs were synonymous(Strn-Mlick coding for a protein kinase and Cht7 chitinbinding supplementary table S4 Supplementary Materialonline) while a 30UTR variant was predicted to putativelyalter protein structure in a nondisruptive manner mostlikely impacting expression andor protein translational effi-ciency (jbug response to stimulus supplementary table S4Supplementary Material online) One nonsynonymous SNPwas predicted to alter protein function as a missense variant(fab1 signaling supplementary table S4 SupplementaryMaterial online) The increase in frequency of the favoreddesiccation alleles compared with the controls ranged from5 to 27 with a median increase of 164

Despite the strong overlap there were important differ-ences between the two studies For example our previouswork revealed evidence of a selective sweep in the 50 pro-moter region of the Dys gene in the artificially selected linesand we confirmed a higher frequency of the selected SNPs inan independent natural association study in resistant fliesfrom Coffs Harbour (Telonis-Scott et al 2012) In this studywe found no differentiation at the Dys locus between ourresistant and random controls However both resistant andcontrol pools had a high frequency of the selected allele de-tected in Telonis-Scott et al (2012) (supplementary table S5Supplementary Material online)

The 45 genes common to both studies were analyzed forbiological function while many genes are obviously pleiotro-pic (ie assigned multiple GO terms) almost half were in-volved in response to stimuli (cellular behavioral andregulatory) including signaling such as Rho protein signaltransduction and cAMP-mediated signaling (table 3) Othercategories of interest include spiracle morphogenesis sensoryorgan development including development of the compoundeye and stem cell and neuroblast differentiation (table 3)

With regard to specific stress responsedesiccation candi-dates (table 1) 4 of the 45 genes overlapped including thecAMPcGMP signaling phosphodiesterases pde9 and dncMAPK pathway genes Mtl and klu indicated in NF-kBorthologue signaling in the renal tubules (table 1)Furthermore there was overlap among mechanistic catego-ries where different genes in the same pathways weredetected For example additional key cAMPcGMP hydrolyz-ing phosphodiesterases pde1c and pde11 were significant inTelonis-Scott et al (2012) as well as slpr (MAPK pathway)and poly (insulin signaling) (table 1)

Table 2 Pathway Enrichment of the GWAS and Telonis-Scott et alFirst-Order PPI Networks Investigated Using FlyMine (FDR P lt 005)

Current Study GWAS Telonis-Scott et al(2012)

Database Pathway Pathway

KEGG Ribosome RibosomeNotch signaling pathway Notch signaling pathwayMAPK signaling pathwayDorso-ventral axis formationProgesterone-mediated oocyte

maturationWnt signaling pathwayEndocytosisProteasome

Reactome Immune system Immune systemAdaptive immune system Adaptive immune systemInnate immune system Innate immune systemSignaling by interleukins Signaling by interleukins(Toll-like receptor cascades) Toll-like receptor

cascadesTCR signaling(B cell receptor signaling)

Gene expression Gene expressionMetabolism of RNA Metabolism of RNANonsense-mediated decay Nonsense-mediated decayTranscriptionmRNA processing(RNA transport)Metabolism of proteins (Metabolism of

proteins)Translation (Translation)Cell cycle Cell cycle

Cell cycle checkpointsRegulation of mitotic cell

cycleMitotic metaphase and

anaphaseG0 and early G1 G0 and early G1

G1 phase(G1S transition)G2M transition(DNA replicationS

phase)DNA repairDouble-strand break

repairSignaling pathways Signaling pathwaysInsulin signaling pathwaySignaling by NGF Signaling by NGF

Signaling by FGFRSignaling by NOTCHSignaling by WntSignaling by EGFRDevelopmentAxon guidanceMembrane traffickingGap junction trafficking

and regulationProgramed cell deathApoptosisHemostasisHemostasis

NOTEmdashKEGG and Reactome pathway databases were queried All significant termsare reported with Reactome terms grouped together under parent terms in italictext (full list of terms available in supplementary table S3 Supplementary Materialonline) Parent terms without brackets are the names of pathways that were sig-nificantly enriched parent terms in parentheses were not listed as significantlyenriched themselves but are reported as a guide to the category contents

6

In addition to the significant gene-level overlap betweenthe two studies we observed significant overlap between thePPI networks constructed from each set of candidate genes(summarized in fig 2) We examined the overlap by compar-ing the first-order networks 1) at the level of node overlap and2) at the level of edge (interaction) overlap Three hundredseven GWAS seeds formed a network of 3324 nodes whileour previous set of candidates mapped to 337 seeds forming anetwork that comprised 3215 nodes One thousand eighthundred twenty-six or ~55 of the nodes overlapped be-tween networks which was significantly higher than the nulldistribution estimated from 1000 simulations (mean simu-lated overlap SD 1304 143 nodes observed overlap atthe 999th percentile fig 2A and C) The overlap at the inter-action level was not significantly higher than expected (meansimulated overlap SD 25704 4024 edges observed over-lap 30686 edges 89th percentile fig 2B and D) Hive plotswere used to visualize the separate networks (supplementaryfig S4A and B Supplementary Material online) as well as theoverlap between the networks (supplementary fig S4CSupplementary Material online)

Given the high degree of shared nodes from distinct setsof protein seeds we compared the networks for biologicalfunction (fig 2E) Both networks produced long lists of sig-nificantly enriched GO terms (supplementary table S3Supplementary Material online) and assessing overlap was

difficult due to the nonindependent nature of GO hierar-chies However a comparison of functional pathway enrich-ment revealed some differences Although both networksshowed enrichment in the immune system pathways theribosome and Notch and NGF signaling pathways as well assome involvement of gene expression and translation path-ways the network constructed from the Telonis-Scott et al(2012) gene list showed extensive involvement of cell cyclepathway proteins as well as significant enrichment for Wntsignaling EGRF signaling DNA double-strand break repairaxon guidance gap junction trafficking and apoptosis path-ways which were not enriched in the GWAS network TheGWAS network was specifically enriched for dorso-ventralaxis formation oocyte maturation and many more geneexpression pathways than the Telonis-Scott et al (2012)gene network (table 2 and supplementary table S3Supplementary Material online)

Discussion

Genomic Signature of Desiccation Resistance fromNatural Standing Genetic Variation

We report the first large-scale genome-wide screen for nat-ural alleles associated with survival of low humidity condi-tions in D melanogaster By combining a powerful naturalassociation study with high-throughput Pool-seq we

Table 3 Overrepresentation of GO Categories for the 45 Core Candidate Genes Overlapping between the Current and Our Previous(Telonis-Scott et al 2012) Genomic Study

Term ID TermSubterms FDR P valuea Genes

Cellularbehavioral response to stimulus defense and signaling

GO0050896 Response to stimulus (cellular response tostimulus regulation of response tostimulus)

0007 Abd-B dnc fz mam nkd slo ct fru kluRbf klg santa-maria jbug fab1CG33275 Mtl Trim9 cv-c Gprk2 Ect4CG43658

GO0048585 Negative regulation of response to stimu-lus (Rho protein signal transductionsignal transduction)

0042 dnc mam slo fru Trim9 cv-c

GO0023052 Signaling (cAMP-mediated signaling cellcommunication)

0031 Amph Pde9 dnc fz mam nkd ct kluRbf santa-maria fab1 CG33275 MtlTrim9 cv-c Gprk2 Ect4 CG43658

Development

GO0048863 Stem cell differentiation 0014 Abd-D mam nkd klu nub

GO0014016 Neuroblast differentiation 0048

GO0045165 Cell fate commitment 0010 Abd-D dnc fz mam nkd ct klu klg nub

GO0035277 Spiracle morphogenesis open trachealsystem

0010 Abd-D ct cv-c

GO0007423 Sensory organ development (eye wingdisc imaginal disc)

0008 fz mam ct klu klg sdk Amph Nckx30cMtl

GO0048736 Appendage development 0020 fz mam ct fru CG33275 Mtl Fili nubcv-c Gprk2 CG43658

GO0007552 Metamorphosis 0001 fz mam ct fru CG33275 Mtl Fili cv-cGprk2 CG43658

Cellular component organization

GO0030030 Cell projection organization 0020 dnc fz ct fru GluClalpha Nckx30c MtlTrim9 nub cv-c

NOTEmdashNote that most genes are assigned to multiple annotations and therefore are shown purely as a guide to putative functionaBonferronindashHochberg FDR and gene length corrected P values using the Flymine v400 database Gene Ontology Enrichment The terms were condensed and trimmed usingREVIGO

7

mapped SNPs associated with desiccation resistance in 500naturally derived genotypes Although Pool-seq has limitedpower to reveal very low frequency alleles to reconstructphase or adequately estimate allele frequencies in smallpools (Lynch et al 2014) the Pool-GWAS approach hasbeen validated by retrieval of known candidate genes in-volved in body pigmentation using similar numbers of ge-notypes as assessed here (Bastide et al 2013)

Although studies in controlled genetic backgrounds reportlarge effects of single genes on desiccation resistance (Table 1)our current and previous screens revealed similarly largenumbers of variants (over 600) mapping to over 300 genesA polygenic basis of desiccation resistance is consistent withquantitative genetic data for the trait (Foley and Telonis-Scott2011) and we also anticipate more complexity from naturalD melanogaster populations with greater standing geneticvariation than inbred laboratory strains However somefalse positive associations are likely in the GWAS due to thelimitations of quantitative trait mapping approaches Pooledevolve-and-resequence experiments routinely yield vast num-bers of candidate SNPs partly due to high levels of standinglinkage disequilibrium (LD) and hitchhiking neutral alleles

(Schleurootterer et al 2015) Large population sizes from specieswith low levels of LD such as D melanogaster somewhat cir-cumvent this issue and a notable feature of our experimentaldesign is our sampling Here we selected the most desicca-tion-resistant phenotypes directly from substantial standinggenetic variation where one generation of recombinationshould largely reflect natural haplotypes

We observed the largest candidate allele frequency differ-ences in loci that had intermediate allele frequencies in thecontrol pool This is in line with population genetics modelswhich predict that evolution from standing genetic variationwill initially be strongest for intermediate frequency alleleswhich can facilitate rapid change (Long et al 2015) Howeverwe also observed a number of low frequency alleles which ifbeneficial can inflate false positives due to long-range LD(Orozco-terWengel et al 2012 Nuzhdin and Turner 2013Tobler et al 2014) Nonetheless spurious LD associationscannot fully account for our candidate list as evidenced bythe desiccation functional candidates and cross-study over-lap Furthermore chromosome inversion dynamics can alsogenerate both short and long-range LD and contribute nu-merous false positive SNP phenotype associations (Hoffmann

Fig 2 Observed versus simulated network overlap between the GWAS first-order PPI network and the first-order PPI network from the Telonis-Scottet al (2012) gene list Overlap is presented in terms of (A) node number and (B) edge number Networks were simulated by resampling from the entireDrosophila melanogaster gene list (r553) as described in the text The histogram shows the distribution of overlap measures for 1000 simulations Theblack line represents the histogram as density and the blue line shows the corresponding normal distribution The red vertical line shows the observedoverlap from the real networks labeled as a percentile of the normal distribution Area-proportional Venn diagrams summarized the extent of overlapfor PPI networks constructed separately from the GWAS candidates and Telonis-Scott et al (2012) candidates (labeled ldquoprevious studyrdquo) (C) Overlapexpressed as the number of nodes Approximately 55 of the nodes overlapped between the two studies which was significantly higher than simulatedexpectations (999 percentile 2B) (D) Overlap expressed in the number of interactions This was not significantly higher than expected by simulation(89th percentile 2B) (E) Overlap at the level of GO biological process terms (directly compared GO terms by name this was not tested statisticallybecause of the complex hierarchical nature of GO terms and is presented for interest)

8

and Weeks 2007 Tobler et al 2014) but we found little as-sociation between the desiccation pool and inversion fre-quency changes consistent with the lack of latitudinalpatterns in desiccation resistanceand traitinversion fre-quency associations in east Australian D melanogaster(Hoffmann et al 2001 Weeks et al 2002)

Cross-Study Comparison Reveals a Core Set ofCandidates for a Complex Trait

Our data contain many candidate genes functions and path-ways for insect desiccation resistance but partitioning genu-ine adaptive loci from experimental noise remains asignificant challenge We therefore focus our efforts on ournovel comparative analysis where we identified 45 robustcandidates for further exploration Common genetic signa-tures are rarely documented in cross-study comparisons ofcomplex traits (Sarup et al 2011 Huang et al 2012 but cfVermeulen et al 2013) and never before for desiccation re-sistance Some degree of overlap between our studies likelyresults from a shared genetic background from similar sam-pling sites highlighting the replicability of these responsesOne limitation of this approach is the overpopulation ofthe ldquocorerdquo list by longer genes This result reflects the genelength bias inherent to GWAS approaches an issue that is notstraightforward to correct in a GWAS let alone in the com-parative framework applied here A focus on the commongenes functionally linked to the desiccation phenotype canprovide a meaningful framework for future research and alsosuggests that some large-effect candidate genes exhibit natu-ral variation contributing to the phenotype (table 1)

Other feasible candidates include genes expressed inthe MTs (Wang et al 2004) or essential for MT develop-ment (St Pierre et al 2014) providing further researchopportunity focused on the primary stress sensing andfluid secretion tissues Genes include one of the mosthighly expressed MT transcripts CG7084 as well as dncNckx30C slo cv-c and nkd Other aspects of stress resis-tance are also implicated CrebB for example is involvedin regulating insulin-regulated stress resistance and ther-mosensory behavior (Wang et al 2008) nkd also impactslarval cuticle development and shares a PDZ signalingdomain also present in the desiccation candidate desi(Kawano et al 2010 St Pierre et al 2014) Finally severalcore candidates were consistently reported in recent stud-ies suggesting generalized roles in stress responses andfitness including transcriptome studies on MT function(Wang et al 2004) inbreeding and cold resistance(Vermeulen et al 2013) oxidative stress (Weber et al2012) and genomic studies exploring age-specific SNPson fitness traits (Durham et al 2014) and genes likelyunder spatial selection in flies from climatically divergenthabitats (Fabian et al 2012)

Several of the enriched GO categories from the core listalso make biological sense including stimulus response(almost half the suite) defense and signaling particularlyin pathways involved in MT stress sensing (Davies et al 20122013 2014) Functional categories enriched for sensory

organ development were highlighted consistent with envi-ronmental sensing categories from desert Drosophila tran-scriptomics (Matzkin and Markow 2009 Rajpurohit et al2013) and stress detection via phototransduction pathwaysin D melanogaster under a range of stressors including des-iccation (Soslashrensen et al 2007)

Evidence for Conserved Signatures at Higher LevelOrganization

Beyond the gene level we observed a striking degree ofsimilarity between this study and that of Telonis-Scottet al (2012) at the network level where significantly over-lapping protein networks were constructed from largely dif-ferent seed protein sets This supports a scenario where theldquobiological information flow from DNA to phenotyperdquo(Civelek and Lusis 2014) contains inherent redundancywhere alternative genetic solutions underlie phenotypesand functions Resistance to perturbation by genetic or en-vironmental variation appears to increase with increasinghierarchical complexity that is phenotype ldquobufferingrdquo (Fuet al 2009) The same functional network in different pop-ulationsgenetic backgrounds can be impacted but theexact genes and SNPs responding to perturbation will notnecessarily overlap In Drosophila different sets of loci fromthe same founders mapped to the same PPI network in thecase of two complex traits chill coma recovery and startleresponse (Huang et al 2012) Furthermore interspecificshared networkspathways from different genes have beendocumented for numerous complex traits (Emilsson et al2008 Aytes et al 2014 Ichihashi et al 2014) providing op-portunities for investigating taxa where individual genefunction is poorly understood

Although the null distribution is not always obviouswhen comparing networks and their functions partly be-cause of reduced sensitivity from missing interactions(Leiserson et al 2013) here the shared network signaturesportray a plausible multifaceted stress response involvingstress sensing rapid gene accessibility and expression Thepredominance of immune system and stress responsepathways are of interest with reference immunitystresspathway ldquocross-talkrdquo particularly in the MTs whereabiotic stressors elicit strong immunity transcriptionalresponses (Davies et al 2012) The higher order architec-tures reflect crucial aspects of rapid gene expression con-trol in response to stress from chromatin organization(Alexander and Lomvardas 2014) to transcription RNAdecay and translation (reviwed in de Nadal et al 2011) InDrosophila variation in transcriptional regulation of indi-vidual genes can underpin divergent desiccation pheno-types (Chung et al 2014) or be functionally inferred frompatterns of many genes elicited during desiccation stress(Rajpurohit et al 2013) Variants with potential to mod-ulate expression at different stages of gene expressioncontrol were evident in our GWAS data alone for exam-ple enrichment for 50UTR and splice variants at the SNPlevel and for chromatin and chromosome structure termsat the functional level

9

Genetic Overlap Versus Genetic differencesmdashCausesand Implications

Despite the common signatures at the gene and higher orderlevels between our studies pervasive differences were ob-served These can arise from differences in standing geneticvariation in the natural populations where different alleliccompositions impact gene-by-environment interactions andepistasis Our network overlap analyses highlight the potentialfor different molecular trajectories to result in similar func-tions (Leiserson et al 2013) consistent with the multiplephysiological pathways potentially available to D melanoga-ster under water stress (Chown et al 2011) Experimentaldesign also presumably contributed to the study differencesStrong directional selection as applied in the 2012 study canincrease the frequency of rare beneficial alleles and likely re-duced overall diversity than did the single-generation GWASdesign Furthermore selection regimes often impose addi-tional selective pressures (eg food deprivation during desic-cation) resulting in correlated responses such as starvationresistance with desiccation resistance and these factors differfrom the laboratory to the field Finally evidence suggests innatural D melanogaster that standing genetic variation canvary extensively with sampling season (Itoh et al 2010Bergland et al 2014) Although we cannot compare levelsof standing variation between our two studies evidence sug-gests that this may have affected the Dys gene where ourGWAS approach revealed no differentiation between thecontrol and resistant pools at this locus but rather detectedthe alleles associated with the selective sweep in Telonis-Scottet al (2012) in both pools at high frequency Whether thisresulted from sampling the later collection after a longdrought (2004ndash2008 wwwbomgovauclimate last accessedJanuary 13 2016) is unclear but this observation highlightsthe relevance of temporal sampling to standing genetic var-iation when attempting to link complex phenotypes togenotypes

Materials and Methods

Natural Population Sampling

Drosophila melanogaster was collected from vineyard waste atKilchorn Winery Romsey Australia in April 2010 Over 1200isofemale lines were established in the laboratory from wild-caught females Species identification was conducted on maleF1 to remove Drosophila simulans contamination and over900 D melanogaster isofemale lines were retained All flieswere maintained in vials of dextrose cornmeal media at25 C and constant light

Desiccation Assay

From each isofemale line ten inseminated F1 females weresorted by aspiration without CO2 and held in vials until 4ndash5days old For the desiccation assay 10 females per line (over9000 flies in total) were screened by placing groups of femalesinto empty vials covered with gauze and randomly assigninglines into 5 large sealed glass tanks containing trays of silicadesiccant (relative humidity [RH] lt10) The tanks werescored for mortality hourly and the final 558 surviving

individuals were selected for the desiccation-resistant pool(558 flies lt6 of the total) The control pool constitutedthe same number of females sampled from over 200 ran-domly chosen isofemale lines that were frozen prior to thedesiccation assay We chose this random control approachinstead of comparing the phenotypic extremes (ie the latemortality tail vs the ldquoearly mortalityrdquo tail) because unfavor-able environmental conditions experienced by the wild-caught dam may inflate environmental variance due to car-ryover effects in the F1 offspring (Schiffer et al 2013)

DNA Extraction and Sequencing

We used pooled whole-genome sequencing (Pool-seq)(Futschik and Schleurootterer 2010) to estimate and compareallele frequency differences between the most desiccation-resistant and randomly sampled flies from a wild populationto associate naturally segregating candidate SNPs with theresponse to low humidity For each treatment (resistantand control) genomic DNA was extracted from 50 femaleheads using the Qiagen DNeasy Blood and Tissue Kit accord-ing to the manufacturerrsquos instructions (Qiagen HildenGermany) Two extractions were combined to create fivepools per treatment which were barcoded separately tocreate technical replicates (total 10 pools of 100 fliesn frac14 1000) The DNA was fragmented using a CovarisS2 ma-chine (Covaris Inc Woburn MA) and libraries were pre-pared from 1 mg genomic DNA (per pool of 100 flies) usingthe Illumina TruSeq Prep module (Illumina San Diego CA)To minimize the effect of variation across lanes the ten poolswere combined in equal concentration (fig 3) and run mul-tiplexed in each lane of a full flow cell of an Illumina HiSeq2000 sequencer Clusters were generated using the TruSeq PECluster Kit v5 on a cBot and sequenced using Illumina TruSeqSBS v3 chemistry (Illumina) Fragmentation library construc-tion and sequencing were performed at the Micromon Next-Gen Sequencing facility (Monash University ClaytonAustralia)

Fig 3 Experimental design See text for further explanation

10

Data Processing and Mapping

Processing was performed on a high-performance computingcluster using the Rubra pipeline system that makes use of theRuffus Python library (Goodstadt 2010) The final variant-calling pipeline was based on a pipeline developed by ClareSloggett (Sloggett et al 2014) and is publicly available athttpsgithubcomgriffinpGWAS_pipeline (last accessedJanuary 13 2016) Raw sequence reads were processed persample per lane with P1 and P2 read files processed sepa-rately initially Trimming was performed with trimmomatic v030 (Lohse et al 2012) Adapter sequences were also re-moved Leading and trailing bases with a quality scorebelow 30 were trimmed and a sliding window (width frac14 10quality threshold frac14 25) was used Reads shorter than 40 bpafter trimming were discarded Quality before and after trim-ming was examined using FastQC v 0101 (Andrews 2014)

Posttrimming reads that remained paired were used asinput into the alignment step Alignment to the D melano-gaster reference genome version r553 was performed withbwa v 062 (Li and Durbin 2009) using the bwa aln commandand the following options No seeding (-l 150) 1 rate ofmissing alignments given 2 uniform base error rate (-n001) a maximum of 2 gap opens per read (-o 2) a maximumof 12 gap extensions per read (-e 12) and long deletionswithin 12 bp of 30 end disallowed (-d 12) The default valueswere used for all other options Output files were then con-verted to sorted bam files using the bwa sampe commandand the ldquoSortSamrdquo option in Picard v 196

At this point the bam files were merged into one file persample using PicardMerge Duplicates were identified withPicard MarkDuplicates The tool RealignerTargetCreator inGATK v 26-5 (DePristo et al 2011) was applied to the tenbam files simultaneously producing a list of intervals contain-ing probable indels around which local realignment would beperformed This was used as input for the IndelRealigner toolwhich was also applied to all ten bam files simultaneously

To investigate the variation among technical replicatesallele frequency was estimated based on read counts foreach technical replicate separately for 100000 randomly cho-sen SNPs across the genome After excluding candidate des-iccation-resistance SNPs and SNPs that may have been falsepositives due to sequencing error (those with mean frequencylt004 across replicates) 1803 SNPs remained For each locuswe calculated the variance in allele frequency estimate amongthe five control replicates and the variance among the fivedesiccation-resistance replicates We compared this with thevariance due to pool category (control vs desiccation resis-tant) using analysis of variance (ANOVA) We also calculatedthe mean pairwise difference in allele frequency estimateamong control and among desiccation-resistance replicatesand the mean concordance correlation coefficient was calcu-lated over all pairwise comparisons within pool category(using the epiccc function in the epiR package Stevensonet al 2015)

The following approach was then taken to identify SNPsdiffering between the desiccation-resistant and controlgroups Based on the technical replicate analysis we merged

the five ldquodesiccation-resistantrdquo and the five random controlsamples into two bam files A pileup file was created withsamtools v 0119 (Li et al 2009) and converted to a ldquosyncrdquo fileusing the mpileup2sync script in Popoolation2 v 1201 (Kofleret al 2011) retaining reads with a mapping quality of at least20 and bases with a minimum quality score of 20 Reads withunmapped mates were also retained (option ndashA in samtoolsmpileup) For this step we used a repeat-masked version ofthe reference genome created with RepeatMasker 403 (Smitet al 2013-2015) and a repeat file containing all annotatedtransposons from D melanogaster including shared ancestralsequences (Flybase release r557) plus all repetitive elementsfrom RepBase release v1904 for D melanogaster Simple re-peats small RNA genes and bacterial insertions were notmasked (options ndashnolow ndashnorna ndashno_is) The rmblastsearch engine was used

Association Analysis

We then performed Fisherrsquos exact test with the Fisher testscript in Popoolation2 (Kofler et al 2011) We set the min-imum count (of the minor allele) frac14 10 minimum coverage(in both populations) frac14 20 and maximum coverage frac1410000 The test was performed for each SNP (slidingwindow off) Although there are numerous approaches tocalculating genome-wide significance thresholds in aPoolseq framework currently there is no consensus on anappropriate standardized statistical method Methods in-clude simulation approaches (Orozco-terWengel et al2012 Bastide et al 2013 Burke et al 2014) correctionusing a null distribution based on estimating genetic driftfrom replicated artificial selection line allele frequency vari-ance (Turner and Miller 2012) use of a simple thresholdbased on a minimum allele frequency difference (Turneret al 2010) and the FDR correction suggested by Storeyand Tibshirani (2003) (used by Fabian et al 2012)

We determined based on our design to calculate the FDRthreshold by comparison with a simulated distribution of nullP values following the approach of Bastide et al (2013) Foreach SNP locus a P value was calculated by simulating thedistribution of read counts between the major and minorallele according to a beta-binomial distribution with meana(a+b) and variance (ab)((a+b)2(a+b+1)) Thismodel allowed for two types of sampling variationStochastic variation in allele sampling across the phenotypesand the background variation in the representation of allelesin each pool caused by library preparation artifacts or sto-chastic variation driven by unequal coverage of the poolsThese sources of variation have also been recognized else-where as being important in interpreting pooled sequenceallele calls and identifying significant differentiation betweenpooled samples (Lynch et al 2014) The value of afrac14 37 waschosen by chi-square test as the best match to the observedP value distribution (afrac14 10 to afrac14 40 were tested) with acorresponding bfrac14 4315 to reflect the coverage depth differ-ence between the random control (487) and desiccation-resistant (568) pools These values were then used tosimulate a null data set 10 the size of the observed data

11

set Because the null P value distribution still did not fit theobserved P value distribution particularly well (supplemen-tary fig S2 Supplementary Material online) these values werenot used for a standard FDR correction Instead a P valuethreshold was calculated based on the lowest 005 of thenull P values similar in approach to Orozco-terWengel et al(2012)

Inversion Analysis

In D melanogaster several cosmopolitan inversion polymor-phisms show cross-continent latitudinal patterns that explaina substantial proportion of clinal variation for many traitsincluding thermal tolerance (reviewed in Hoffmann andWeeks 2007) Patterns of disequilibria generated betweenloci in the vicinity of the rearrangement can obscure signalsof selection (Hoffmann and Weeks 2007) and inversion fre-quencies are routinely considered in genomic studies ofnatural populations sampled from known latitudinal clines(Fabian et al 2012 Kapun et al 2014) We assessedassociations between inversion frequencies and our controland resistant pools at SNPs diagnostic for the following in-versions In3R(Payne) (Anderson et al 2005 Kapun et al2014) In(2L)t (Andolfatto et al 1999 Kapun et al 2014)In(2R)Ns In(3L)P In(3R)C In(3R)K and In(3R)Mo (Kapunet al 2014) Between one and five diagnostic SNPs were in-vestigated for each inversion (depending on availability) Weconsidered an inversion to be associated with desiccationresistance if the frequency of its diagnostic SNP differed sig-nificantly between the control and desiccation-resistant poolsby a Fisherrsquos exact test

Genome Feature Annotation

The list of candidate SNPs was converted to vcf format using acustom R script setting the alternate allele as the allele thatincreased in frequency from the control to the desiccation-resistant pool and setting the reference allele as the otherallele present in cases where both the major and minor allelesdiffered from the reference genome The genomic features inwhich the candidate SNPs occurred were annotated usingsnpEff v40 first creating a database from the D melanogasterv553 genome annotation with the default approach(Cingolani et al 2012) The default annotation settings wereused except for setting the parameter ldquo-ud 0rdquo to annotateSNPs to genes only if they were within the gene body Using2 tests on SNP counts we tested for over- or under-enrichment of feature types by calculating the proportionof lsquoallrsquo features from all SNPs detected in the study and theproportion of ldquocandidaterdquo SNPs identified as differentiated indesiccation-resistant flies (Fabian et al 2012)

GO Analysis

GO analyses was used to associate the candidates with theirbiological functions (Ashburner et al 2000) Previously devel-oped for biological pathway analysis of global gene expressionstudies (Wang et al 2010) typical GO analysis assumes that allgenes are sampled independently with equal probabilityGWAS violate these criteria where SNPs are more likely to

occur in longer genes than in shorter genes resulting in highertype I error rates in GO categories overpopulated with longgenes Gene length as well as overlapping gene biases (Wanget al 2011) were accounted for using Gowinda (Kofler andSchleurootterer 2012) which implements a permutation strategyto reduce false positives Gowinda does not reconstruct LDbetween SNPs but operates in two modes underpinned bytwo extreme assumptions 1) Gene mode (assumes all SNPs ina gene are in LD) and 2) SNP mode (assumes all SNPs in agene are completely independent)

The candidates were examined in both modes but giventhe stringency of gene mode high-resolution SNP modeoutput was utilized for the final analysis with the followingparameters 106 simulations minimum significance frac141 andminimum genes per GO categoryfrac14 3 (to exclude gene-poorcategories) P values were corrected using the Gowinda FDRalgorithm GO annotations from Flybase r557 were applied

Candidate Gene Analysis

In conjunction with the GO analysis we also examinedknown candidate genes associated with survival under lowRH to better characterize possible biological functions of ourGWAS candidates We took a conservative approach to can-didate gene assignment where the SNPs were mapped withinthe entire gene region without up- and downstream exten-sions Genes were curated from FlyBase and the literature(table 1) predominantly using detailed functional experimen-tal evidence as criteria for functional desiccation candidates

Network Analysis

We next obtained a higher level summary of the candidategene list using PPI network analysis The full candidate genelist was used to construct a first-order interaction networkusing all Drosophila PPIs listed in the Drosophila InteractionDatabase v 2014_10 (avoiding interologs) (Murali et al 2011)This ldquofullrdquo PPI network includes manually curated data frompublished literature and experimental data for 9633 proteinsand 93799 interactions For the network construction theigraph R package was used to build a subgraph from the fullPPI network containing the candidate proteins and their first-order interacting proteins (Csardi and Nepusz 2006) Self-connections were removed Functional enrichment analysiswas conducted on the list of network genes using FlyMinev400 (wwwflymineorg last accessed January 13 2016) withthe following parameters for GO enrichment and KEGGReactome pathway enrichment BenjaminindashHochberg FDRcorrection P lt 005 background frac14 all D melanogastergenes in our full PPI network and Ontology (for GO enrich-ment only) frac14 biological process or molecular function

Cross-Study Comparison Overlap with Our PreviousWork

Following on from the comparison between our gene list andthe candidate genes suggested from the literature we quan-titatively evaluated the degree of overlap between the candi-date genes identified in this study with alleles mapped in ourprevious study of flies collected from the same geographic

12

region (Telonis-Scott et al 2012) The ldquooverlaprdquo analysis wasperformed similarly for the GWAS candidate gene and net-work-level approaches First the gene list overlap was testedbetween the two studies using Fisherrsquos exact test where the382 genes identified from the candidate SNPs (this study)were compared with 416 genes identified from candidatesingle feature polymorphisms (Telonis-Scott et al 2012)The contingency table was constructed using the numberof annotated genes in the D melanogaster R553 genomebuilt as an approximation for the total number of genestested in each study and took the following form

where A frac14 gene list from Telonis-Scott et al (2012)n frac14 416 B frac14 gene list from this study n frac14 382 N frac14 totalnumber of genes in D melanogaster reference genome r553n frac14 17106

To annotate genome features in the overlap candidategene list we used the SNP data (this study) which betterresolves alleles than array-based genotyping Due to this lim-itation we did not compare exact genomic coordinates be-tween the studies but considered overlap to occur whendifferentiated alleles were detected in the same gene inboth studies We did however examine the allele frequenciesaround the Dys-RF promoter which was sequenced sepa-rately revealing a selective sweep in the experimental evolu-tion lines (Telonis-Scott et al 2012)

The overlapping candidates were tested for functional en-richment using a basic gene list approach As genes ratherthan SNPs were analyzed here Fly Mine v400 (wwwflymineorg) was applied with the following parameters Gene lengthnormalization BenjaminindashHochberg FDR correction Plt 005Ontology frac14 biological process and default background (allGO-annotated D melanogaster genes)

Finally we constructed a second PPI network from thecandidate genes mapped in Telonis-Scott et al (2012) to ex-amine system-level overlap between the two studies The 416genes mapped to 337 protein seeds and the network wasconstructed as described above The degree of network over-lap between the two studies was tested using simulation Ineach iteration a gene set of the same length as the list fromthis study and a gene set of the same length as the list fromTelonis-Scott et al (2012) were resampled randomly from theentire D melanogaster mapped gene annotation (r553) Eachresampled set was then used to build a first-order network aspreviously described The overlap network between the tworesampled sets was calculated using the graphintersectioncommand from the igraph package and zero-degree nodes(proteins with no interactions) were removed Overlap wasquantified by counting the number of nodes and the numberof edges in this overlap network The observed overlap wascompared with a simulated null distribution for both nodeand edge number where n frac14 1000 simulation iterations

Supplementary MaterialSupplementary figures S1ndashS4 and tables S1ndashS5 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

We thank Robert Good for the fly sampling and JenniferShirriffs and Lea and Alan Rako for outstanding assistancewith laboratory culture and substantial desiccation assaysWe are grateful to Melissa Davis for advice on network build-ing and comparison approaches and to five anonymous re-viewers whose comments improved the manuscript Thiswork was supported by a DECRA fellowship DE120102575and Monash University Technology voucher scheme toMTS an ARC Future Fellowship and Discovery Grants toCMS an ARC Laureate Fellowship to AAH and a Scienceand Industry Endowment Fund grant to CMS and AAH

ReferencesAlexander JM Lomvardas S 2014 Nuclear architecture as an epigenetic

regulator of neural development and function Neuroscience26439ndash50

Anderson AR Hoffmann AA McKechnie SW Umina PA Weeks AR2005 The latitudinal cline in the In(3R)Payne inversion polymor-phism has shifted in the last 20 years in Australian Drosophilamelanogaster populations Mol Ecol 14851ndash858

Andolfatto P Wall JD Kreitman M 1999 Unusual haplotype structureat the proximal breakpoint of In(2L)t in a natural population ofDrosophila melanogaster Genetics 1531297ndash1311

Andrews S 2014 FastQC Available from httpwwwbioinformaticsbbsrcacukprojectsfastqc

Ashburner M Ball CA Blake JA Botstein D Butler H Cherry JM DavisAP Dolinski K Dwight SS Eppig JT et al 2000 Gene ontology toolfor the unification of biology The Gene Ontology Consortium NatGenet 2525ndash29

Aytes A Mitrofanova A Lefebvre C Alvarez Mariano J Castillo-MartinM Zheng T Eastham James A Gopalan A Pienta Kenneth J ShenMM et al 2014 Cross-species regulatory network analysis identifiesa synergistic interaction between FOXM1 and CENPF that drivesprostate cancer malignancy Cancer Cell 25638ndash651

Bastide H Betancourt A Nolte V Tobler R Steuroobe P Futschik ASchleurootterer C 2013 A genome-wide fine-scale map of natural pig-mentation variation in Drosophila melanogaster PLoS Genet9e1003534

Bergland AO Behrman EL OrsquoBrien KR Schmidt PS Petrov DA 2014Genomic evidence of rapid and stable adaptive oscillations overseasonal time scales in Drosophila PLoS Genet 10e1004775

Bonebrake TC Mastrandrea MD 2010 Tolerance adaptation and pre-cipitation changes complicate latitudinal patterns of climate changeimpacts Proc Natl Acad Sci U S A 10712581ndash12586

Bradley TJ 2009 Animal osmoregulation Oxford Oxford UniversityPress

Burke MK Liti G Long AD 2014 Standing genetic variation drivesrepeatable experimental evolution in outcrossing populations ofSaccharomyces cerevisiae Mol Biol Evol 313228ndash3239

Chown SL 2012 Trait-based approaches to conservation physiologyforecasting environmental change risks from the bottom upPhilos Trans R Soc B 3671615ndash1627

Chown SL Nicolson SW 2004 Insect physiological ecology mechanismsand patterns Oxford Oxford University Press

Chown SL Soslashrensen JG Terblanche JS 2011 Water loss in insectsan environmental change perspective J Insect Physiol571070ndash1084

Not in GWAS In GWAS

Not in Telonis-Scott et al (2012) N ndash intersect-AndashB A-intersect

In Telonis-Scott et al (2012) B-intersect Intersect(AB)

13

Chung H Loehlin DW Dufour HD Vaccarro K Millar JG Carroll SB2014 A single gene affects both ecological divergence and matechoice in Drosophila Science 3431148ndash1151

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly(Austin) 680ndash92

Civelek M Lusis AJ 2014 Systems genetics approaches to understandcomplex traits Nat Rev Genet 1534ndash48

Clusella-Trullas S Blackburn TM Chown SL 2011 Climatic predictors oftemperature performance curve parameters in ectotherms implycomplex responses to climate change Am Nat 177738ndash751

Csardi G Nepusz T 2006 The igraph software package for complexnetwork research InterJournal Complex Systems 1695 Availablefrom httpigraphorg

Davies SA Cabrero P Overend G Aitchison L Sebastian S Terhzaz SDow JA 2014 Cell signalling mechanisms for insect stress toleranceJ Exp Biol 217119ndash128

Davies SA Cabrero P Povsic M Johnston NR Terhzaz S Dow JA 2013Signaling by Drosophila capa neuropeptides Gen Comp Endocrinol18860ndash66

Davies SA Overend G Sebastian S Cundall M Cabrero P Dow JATerhzaz S 2012 Immune and stress response lsquocross-talkrsquo in theDrosophila Malpighian tubule J Insect Physiol 58488ndash497

Day JP Dow JA Houslay MD Davies SA 2005 Cyclic nucleotide phos-phodiesterases in Drosophila melanogaster Biochem J 388333ndash342

de Nadal E Ammerer G Posas F 2011 Controlling gene expression inresponse to stress Nat Rev Genet 12833ndash845

DePristo MA Banks E Poplin R Garimella KV Maguire JR Hartl CPhilippakis AA del Angel G Rivas MA Hanna M et al 2011 Aframework for variation discovery and genotyping using next-gen-eration DNA sequencing data Nat Genet 43491ndash498

Djawdan M Chippindale AK Rose MR Bradley TJ 1998 Metabolic re-serves and evolved stress resistance in Drosophila melanogasterPhysiol Zool 71584ndash594

Durham MF Magwire MM Stone EA Leips J 2014 Genome-wide anal-ysis in Drosophila reveals age-specific effects of SNPs on fitness traitsNat Commun 54338

Emilsson V Thorleifsson G Zhang B Leonardson AS Zink F Zhu JCarlson S Helgason A Walters GB Gunnarsdottir S et al 2008Genetics of gene expression and its effect on disease Nature452423ndash428

Fabian DK Kapun M Nolte V Kofler R Schmidt PS Schleurootterer C FlattT 2012 Genome-wide patterns of latitudinal differentiation amongpopulations of Drosophila melanogaster from North America MolEcol 214748ndash4769

Foley BR Telonis-Scott M 2011 Quantitative genetic analysis suggestscausal association between cuticular hydrocarbon composition anddesiccation survival in Drosophila melanogaster Heredity 10668ndash77

Folk DG Han C Bradley TJ 2001 Water acquisition and partitioning inDrosophila melanogaster effects of selection for desiccation-resis-tance J Exp Biol 2043323ndash3331

Fowler MA Montell C 2013 Drosophila TRP channels and animal be-havior Life Sci 92394ndash403

Fu J Keurentjes JJB Bouwmeester H America T Verstappen FWA WardJL Beale MH de Vos RCH Dijkstra M Scheltema RA et al 2009System-wide molecular evidence for phenotypic buffering inArabidopsis Nat Genet 41166ndash167

Futschik A Schleurootterer C 2010 The next generation of molecular mar-kers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Gibbs AG Fukuzato F Matzkin LM 2003 Evolution of water conserva-tion mechanisms in Drosophila J Exp Biol 2061183ndash1192

Gibbs AG Matzkin LM 2001 Evolution of water balance in the genusDrosophila J Exp Biol 2042331ndash2338

Goodstadt L 2010 Ruffus a lightweight Python library for computa-tional pipelines Bioinformatics 262778ndash2779

Hadley NF 1994 Water relations of terrestrial arthropods San Diego(CA) Academic Press

Hoffmann AA Hallas R Sinclair C Mitrovski P 2001 Levels of variationin stress resistance in Drosophila among strains local populationsand geographic regions patterns for desiccation starvation coldresistance and associated traits Evolution 551621ndash1630

Hoffmann AA Parsons PA 1989a An integrated approach to environ-mental stress tolerance and life-history variation desiccation toler-ance in Drosophila Biol J Linn Soc 37117ndash136

Hoffmann AA Parsons PA 1989b Selection for increased desiccationresistance in Drosophila melanogaster additive genetic control andcorrelated responses for other stresses Genetics 122837ndash845

Hoffmann AA Sgro CM 2011 Climate change and evolutionary adap-tation Nature 470479ndash485

Hoffmann AA Weeks AR 2007 Climatic selection on genes and traitsafter a 100 year-old invasion a critical look at the temperate-tropicalclines in Drosophila melanogaster from eastern Australia Genetica129133ndash147

Huang W Richards S Carbone MA Zhu D Anholt RR Ayroles JFDuncan L Jordan KW Lawrence F Magwire MM et al 2012Epistasis dominates the genetic architecture of Drosophila quanti-tative traits Proc Natl Acad Sci U S A 10915553ndash15559

Huang Z Tunnacliffe A 2004 Response of human cells to desiccationcomparison with hyperosmotic stress response J Physiol558181ndash191

Huang Z Tunnacliffe A 2005 Gene induction by desiccation stress inhuman cell cultures FEBS Lett 5794973ndash4977

Ichihashi Y Aguilar-Martınez JA Farhi M Chitwood DH Kumar RMillon LV Peng J Maloof JN Sinha NR 2014 Evolutionary develop-mental transcriptomics reveals a gene network module regulatinginterspecific diversity in plant leaf shape Proc Natl Acad Sci U S A111E2616ndashE2621

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Johnson WA Carder JW 2012 Drosophila nociceptors mediate larvalaversion to dry surface environments utilizing both the painless TRPchannel and the DEGENaC subunit PPK1 PLoS One 7e32878

Kahsai L Kapan N Dircksen H Winther AM Nassel DR 2010 Metabolicstress responses in Drosophila are modulated by brain neurosecre-tory cells that produce multiple neuropeptides PLoS One 5e11480

Kapun M van Schalkwyk H McAllister B Flatt T Schleurootterer C 2014Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogasterMol Ecol 231813ndash1827

Kawano T Shimoda M Matsumoto H Ryuda M Tsuzuki S Hayakawa Y2010 Identification of a gene Desiccate contributing to desicca-tion resistance in Drosophila melanogaster J Biol Chem28538889ndash38897

Kellermann V van Heerwaarden B Sgro CM Hoffmann AA 2009Fundamental evolutionary limits in ecological traits driveDrosophila species distributions Science 3251244ndash1246

Kofler R Nolte V Schleurootterer C 2016 The impact of library preparationprotocols on the consistency of allele frequency estimates in Pool-Seq data Mol Ecol Resour 16118ndash122

Kofler R Pandey RV Schleurootterer C 2011 PoPoolation2 identifying dif-ferentiation between populations using sequencing of pooled DNAsamples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schleurootterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics282084ndash2085

Leiserson MDM Eldridge JV Ramachandran S Raphael BJ 2013Network analysis of GWAS data Curr Opin Genet Dev 23602ndash610

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

14

Liu Y Luo J Carlsson MA Neuroassel DR 2015 Serotonin and insulin-likepeptides modulate leucokinin-producing neurons that affectfeeding and water homeostasis in Drosophila J Comp Neurol5231840ndash1863

Lohse M Bolger AM Nagel A Fernie AR Lunn JE Stitt M Usadel B 2012RobiNA a user-friendly integrated software solution for RNA-Seq-based transcriptomics Nucleic Acids Res 40W622ndashW627

Long A Liti G Luptak A Tenaillon O 2015 Elucidating the moleculararchitecture of adaptation via evolve and resequence experimentsNat Rev Genet 16567ndash582

Lynch M Bost D Wilson S Maruki T Harrison S 2014 Population-genetic inference from pooled-sequencing data Genome Biol Evol61210ndash1218

Mackay TFC Stone EA Ayroles JF 2009 The genetics of quantitativetraits challenges and prospects Nat Rev Genet 10565ndash577

Marron MT Markow TA Kain KJ Gibbs AG 2003 Effects of starvationand desiccation on energy metabolism in desert and mesicDrosophila J Insect Physiol 49261ndash270

Matzkin LM Markow TA 2009 Transcriptional regulation of metabo-lism associated with the increased desiccation resistance of thecactophilic Drosophila mojavensis Genetics 1821279ndash1288

Murali T Pacifico S Yu J Guest S Roberts GG Finley RL Jr 2011 DroID2011 a comprehensive integrated resource for protein transcrip-tion factor RNA and gene interactions for Drosophila Nucleic AcidsRes 39D736ndashD743

Nuzhdin SV Turner TL 2013 Promises and limitations of hitchhikingmapping Curr Opin Genet Dev 23694ndash699

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Picard Picard A set of Java command line tools for manipulating high-throughput seqeuncing (HTS) data and formats Available fromhttpbroadinstitutegithubiopicard

Qiu Y Tittiger C Wicker-Thomas C Le Goff G Young S Wajnberg EFricaux T Taquet N Blomquist GJ Feyereisen R 2012 An insect-specific P450 oxidative decarbonylase for cuticular hydrocarbon bio-synthesis Proc Natl Acad Sci U S A 10914858ndash14863

Rajpurohit S Oliveira CC Etges WJ Gibbs AG 2013 Functional genomicand phenotypic responses to desiccation in natural populations of adesert Drosophilid Mol Ecol 222698ndash2715

Sarup P Soslashrensen JG Kristensen TN Hoffmann AA Loeschcke V PaigeKN Soslashrensen P 2011 Candidate genes detected in transcriptomestudies are strongly dependent on genetic background PLoS One6e15644

Schiffer M Hangartner S Hoffmann AA 2013 Assessing the relativeimportance of environmental effects carry-over effects and speciesdifferences in thermal stress resistance a comparison ofDrosophilids across field and laboratory generations J Exp Biol2163790ndash3798

Schleurootterer C Kofler R Versace E Tobler R Franssen SU 2015Combining experimental evolution with next-generation sequen-cing A powerful tool to study adaptation from standing geneticvariation Heredity 114431ndash440

Sinclair BJ Gibbs AG Roberts SP 2007 Gene transcription during ex-posure to and recovery from cold and desiccation stress inDrosophila melanogaster Insect Mol Biol 16435ndash443

Sloggett C Wakefield M Philip G Pope B 2014 Rubramdashflexible distrib-uted pipelines figshare Available from httpdxdoiorg106084m9figshare895626

Smit AFA Hubley R Green P 2013-2015 RepeatMasker Open-403Available from httpwwwrepeatmaskerorg

Seurooderberg JA Birse RT Neuroassel DR 2011 Insulin production and signalingin renal tubules of Drosophila is under control of tachykinin-relatedpeptide and regulates stress resistance PLoS One 6e19866

Soslashrensen JG Nielsen MM Loeschcke V 2007 Gene expression profileanalysis of Drosophila melanogaster selected for resistance to envi-ronmental stressors J Evol Biol 201624ndash1636

St Pierre SE Ponting L Stefancsik R McQuilton P 2014 FlyBase102mdashadvanced approaches to interrogating FlyBase Nucleic AcidsRes 42D780ndashD788

Stevenson M Nunes T Heuser C Marshall J Sanchez J Thornton RReiczigel J Robison-Cox J Sebastiani P Solymos P et al 2015 epiRTools for the analysis of epidemiological data R package version09-62 Available from httpCRANR-projectorgpackagefrac14epiR

Storey JD Tibshirani R 2003 Statistical significance for genomewidestudies Proc Natl Acad Sci U S A 1009440ndash9445

Telonis-Scott M Gane M DeGaris S Sgro CM Hoffmann AA 2012 Highresolution mapping of candidate alleles for desiccation resistancein Drosophila melanogaster under selection Mol Biol Evol291335ndash1351

Telonis-Scott M Guthridge KM Hoffmann AA 2006 A new set oflaboratory-selected Drosophila melanogaster lines for the analysisof desiccation resistance response to selection physiology and cor-related responses J Exp Biol 2091837ndash1847

Terhzaz S Cabrero P Robben JH Radford JC Hudson BD Milligan GDow JA Davies SA 2012 Mechanism and function of Drosophilacapa GPCR a desiccation stress-responsive receptor with functionalhomology to human neuromedinU receptor PLoS One 7e29897

Terhzaz S Overend G Sebastian S Dow JA Davies SA 2014 The Dmelanogaster capa-1 neuropeptide activates renal NF-kB signalingPeptides 53218ndash224

Terhzaz S Teets NM Cabrero P Henderson L Ritchie MG Nachman RJDow JA Denlinger DL Davies SA 2015 Insect capa neuropeptidesimpact desiccation and cold tolerance Proc Natl Acad Sci U S A1122882ndash2887

Thorat LJ Gaikwad SM Nath BB 2012 Trehalose as an indicator ofdesiccation stress in Drosophila melanogaster larvae a potentialmarker of anhydrobiosis Biochem Biophys Res Commun419638ndash642

Tobler R Franssen SU Kofler R Orozco-terWengel P Nolte VHermisson J Schl eurootterer C 2014 Massive habitat-specific ge-nomic response in D melanogaster populations during exper-imental evolution in hot and cold environments Mol Biol Evol31364ndash375

Turner TL Bourne EC Von Wettberg EJ Hu TT Nuzhdin SV 2010Population resequencing reveals local adaptation of Arabidopsislyrata to serpentine soils Nat Genet 42260ndash263

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Vermeulen CJ Soslashrensen P Kirilova Gagalova K Loeschcke V 2013Transcriptomic analysis of inbreeding depression in cold-sensitiveDrosophila melanogaster shows upregulation of the immune re-sponse J Evol Biol 261890ndash1902

Wang B Goode J Best J Meltzer J Schilman PE Chen J Garza D ThomasJB Montminy M 2008 The insulin-regulated CREB coactivatorTORC promotes stress resistance in Drosophila Cell Metab7434ndash444

Wang J Kean L Yang J Allan AK Davies SA Herzyk P Dow JA 2004Function-informed transcriptome analysis of Drosophila renaltubule Genome Biol 5R69

Wang K Li M Hakonarson H 2010 Analysing biological pathways ingenome-wide association studies Nat Rev Genet 11843ndash854

Wang L Jia P Wolfinger RD Chen X Zhao Z 2011 Gene set analysis ofgenome-wide association studies methodological issues and per-spectives Genomics 981ndash8

Weber AL Khan GF Magwire MM Tabor CL Mackay TF Anholt RR2012 Genome-wide association analysis of oxidative stress resistancein Drosophila melanogaster PLoS One 7e34745

Weeks AR McKechnie SW Hoffmann AA 2002 Dissecting adaptiveclinal variation markers inversions and sizestress associations inDrosophila melanogaster from a central field population Ecol Lett5756ndash763

Zhu Y Bergland AO Gonzalez J Petrov DA 2012 Empirical validation ofpooled whole genome population re-sequencing in Drosophila mel-anogaster PLoS One 7e41901

15

Page 3: Cross-Study Comparison Reveals Common Genomic, Network, … · 2016-02-06 · resistance using a high-throughput sequencing Pool-GWAS (genome-wide association study) approach ( Bastide

Supplementary Material online mean across control repli-cates 00028 median 00015 mean across resistant replicates00027 median 00016) corresponding to a mean SD of~45 around the mean allele frequency The mean pairwisedifference in allele frequency estimates was 55 and 54 forcontrol and desiccation-resistant replicates respectively con-sistent with previously published estimates (4ndash6 Kofleret al 2016)

For a test subset of 1803 randomly sampled single nucle-otide polymorphisms (SNPs) that showed no significant dif-ferentiation between control and desiccation-resistant poolsthe replicates accounted for the majority of total variance inallele frequency Replicates within pool category (control vsdesiccation resistant supplementary fig S1 SupplementaryMaterial online) explained between 15 and 100 (mean87 median 93) of what was a relatively low total variance(mean 00029 median 00019) The mean concordance cor-relation coefficient was 098 for each pool category higherthan a comparable Drosophila study that reported a correla-tion of 0898 between 2 replicates with 20 individuals(Zhu et al 2012) We therefore combined the technical rep-licates into a single control and resistant pool respectively forfurther analyses After processing the mean coverage depth

across the genome was 487 and 568 per sample respec-tively (equivalent to 097 and 11 per individual in thepools of 500 flies)

Allele Frequency Changes

The allele frequencies of 3528158 SNPs were comparedbetween the 2 pools using Fisherrsquos exact tests with apermutation-based false discovery rate (FDR) correctionchosen based on the top 005 of the simulated nullP values Six hundred forty-eight SNPs were differentiatedbetween desiccation resistant and control pools (fig 1 andsupplementary table S1 Supplementary Material online)The SNPs were nonrandomly distributed across chromo-somes (X2

4 frac14 4312 Plt 00001) highly abundant on theX chromosome (X2

1 frac14 3698 Plt 00001) while SNPs wereunderrepresented on both 2L and 3L (2L X2

1 frac14 641 Plt001 3L X2

1 frac14 397 Plt 005) In the desiccation pool theallelic distribution of the 648 SNPs ranged from low tofixed (4ndash100 fig 1B and supplementary table S1Supplementary Material online) The increase in frequencyof the favored ldquodesiccationrdquo alleles compared with the con-trols ranged from 3 to 41 (median increase 14 supple-mentary table S1 Supplementary Material online) where

Fig 1 (A) Manhattan plot of genome-wide P values for differentiation between desiccation-resistant and random control pools shown for the entiregenome where each point represents one SNP SNPs highlighted in red showed differentiation above the 005 threshold level determined from the nullP value distribution (see text for details) SNPs highlighted in blue were above the 005 threshold and also detected in Telonis-Scott et al (2012) (B)Standing allele frequency in the control pool plotted against the frequency change in the selected pool for the 648 differentiated SNPs (FDR 005)

3

the largest shifts tended to occur in common intermediatefrequencies (fig 1B)

Inversion Frequencies

To test whether inversions were associated with desiccationresistance allele frequencies were checked at SNPs diagnosticfor the following inversions In(3R)P (Anderson et al 2005Kapun et al 2014) In(2L)t (Andolfatto et al 1999 Kapun et al2014) In(2R)NS In(3L)P In(3R)C In(3R)K and In(3R)Mo(Kapun et al 2014) In(3L)P In(3R)K and In(3R)Mo werenot detected in this population and for all other inversionsinvestigated the control and desiccation-resistant pool fre-quencies did not differ more than 6

Genomic Distribution

We examined these candidates using a comprehensive ap-proach encompassing SNP gene functional ontology andnetwork levels First we examined SNP locations with respectto gene structure and putative effects of SNPs on genomefeatures The 648 SNPs mapped to 382 genes and were largelynoncoding (79 supplementary table S2 SupplementaryMaterial online) Coding variants comprised nearly 15 ofcandidate SNPs largely synonymous substitutions and theremainder nonsynonymous missense variants predicted toresult in amino acid substitution with length preservation(9 and 14 supplementary table S2 SupplementaryMaterial online) Based on proportions of features from allannotated SNPs compared with the candidates intron vari-ants were significantly underrepresented (X2

1 frac14706 Plt 001supplementary table S2 Supplementary Material online)while SNPs were enriched in 50UTRs 50UTR premature startcodon sites and splice regions (X2

1 frac14 820 Plt 001X2

1 frac14 3124 Plt 00001 X21 frac14 640 Plt 005 respectively

supplementary table S2 Supplementary Material online)

GO Functional Enrichment

For the GO analyses we found no overrepresented termsusing the stringent Gowinda gene mode but identified sev-eral categories using SNP mode with a 005 FDR thresholdGO terms were enriched for the broad categories of ldquochro-matinrdquo ldquochromosomerdquo and more specifically ldquodouble-stranded RNA bindingrdquo although significance of this categorywas due solely to nine differentiated SNPs in the DIP1 geneThe term ldquogamma-secretase complexrdquo was also enriched dueto three SNPs in the pen-2 gene

Desiccation Candidate Genes

Chown et al (2011) comprehensively summarized the pri-mary mechanisms underpinning desiccation resistancefrom the critical first phase of stress sensing and behavioralavoidance to the manifold physiological trajectories availableto counter water stress These include water content varia-tion water loss rates (desiccation resistance) and tolerance ofwater loss (dehydration tolerance) We expanded on thisreview to incorporate recent molecular research and summa-rized the primary mechanisms and known candidate genesrelevant to D melanogaster stress sensing and water balance

(table 1 and references therein) Informed by this body ofwork we applied a ldquotop-downrdquo mechanistic approach toscreening desiccation candidate genes in our data (egfrom initial hygrosensing to downstream key physiologicalpathways table 1) We did not observe differentiation inSNPs mapping to the TRP ion channels the key hygrosensingreceptor genes However we did map SNPs to genes involvedin stress sensing specific to the MT fluid secretion signalingpathways including two of the six cAMP and cyclic guanosinemonophosphate (cGMP) signaling phosphodiesterases dncand pde9 as well as klu indicated in NF-kB ortholog signalingin the renal tubules (table 1) A number of the stress-sensingSAPK pathway genes were differentiated (table 1) Key genesinvolved in the insulin signaling pathway implicated in met-abolic homeostasis and water balance were also differenti-ated including the insulin receptor InR and insulin receptorbinding Dok (table 1)

Network Analysis Modeling Genomic Differentiationfor Desiccation in a PPI Context

Given the genomic complexity underpinning desiccation re-sistance we also implemented a higher-order analysis to pro-vide an overview of potential architectures beyond the SNPand gene levels using PPI networks The SNPs mapped to 382genes which were annotated to 307 seed proteins occurring inthe full background D melanogaster PPI network The seedproteins produced a large first-order network with 3324nodes and 51299 edges

To determine if broader functional signatures could beascertained from the complex network we employed func-tional enrichment analyses on the 3324 nodes (table 2 andsupplementary table S3 Supplementary Material online)Strikingly the network was enriched for pathways involvedin gene expression from transcription through RNA process-ing to RNA transport metabolism and decay (table 2)The network analysis also revealed further complexity ofthe stress response with enrichment of proteins involved inregulating innate insect immunity (table 2) where multiplepathways were overrepresented including cytokine interleu-kin and toll-like receptor signaling cascades (table 2)Developmentalgene regulation signaling pathways were sig-nificant in the pathway analysis the KEGG database showedenrichment for Notch and Mitogen-activated protein kinase(MAPK) signaling while insulin and NGF pathways were high-lighted from the Reactome database (table 2)

Common Signatures of Desiccation Resistance acrossStudies

The key finding of this study is the degree of genomic overlapwith our previous mapping of alleles associated with artifi-cially selected desiccation phenotypes (Telonis-Scott et al2012) Although the flies were sampled from comparablesoutheastern Australian locations the study designs were dif-ferent Specifically the earlier study focused on resistancegenerated by artificial selection whereas this study focusedon natural variation in resistance Despite the differences instudy design we identified 45 genes common to both studies

4

Table 1 Summary of Primary Molecular Responses and Candidate Genes for Desiccation Response Mechanisms Identified in the Literature

Mechanism Molecular Function Genes Current Study(F1 wild-caught)

Telonis-Scott et al(2012) (F8 artificial

selection)

Hygrosensing Sensory moisture receptors (antennae legs gut mouthparts)

Temperature-activatedtransient receptor po-tential ionchannelsbcd

TRP ion channels trpl trp Trpg TrpA1 TrpmTrpml pain pyx wtrwnompC iav nan Amo

mdash trpl

Stress sensing Malpighian tubules (principal cells)

Tubule fluid secretion signaling pathways

cAMP signalingefg cAMP activation Dh44-R2 mdash mdash

cGMP signalingefg cGMP activation Capa CapaRcedilNplp1-4 mdash mdash

Receptors Gyc76c CG33958 CG34357 mdash CG34357

Kinases dg1-2 mdash mdash

cAMP + cGMPsignalingefgh

Hydrolyzing phosphodiesterases dnc pde1c pde6 pde9 pde11 dnc pde9 dnc pde1c pde9 pde11

Calcium signalingeg Calcium-sensitive nitric oxidesynthase

Nos mdash Nos

Desiccation specific Diuretic neuropeptide NFkBsignalingijk

Capa CapaR trpl Relish klu klu klu trpl

Anti-diureticneuropeptidel ITP sNPF Tk mdash ITP sNPF

Stress sensing Stress responsive pathway

Stress activatedprotein kinasepathwaymn

MAPKJun kinase

Aop Aplip1 Atf-2 bsk Btk29Acbt Cdc 42 Cka cno CYLDDok egr Gadd45 hep hppyJra kay medo Mkk4 msnMtl Pak puc pyd Rac 1-2Rho1 shark slpr Src42ASrc64B Tak1 Traf6

Aop Dokhepcedilkaymsn Mtlpuc Rac1Src64B

Mtl slpr

Metabolic homeostasis and water balance

Components of insu-lin signaling pathway

Receptorop InR InR mdash

Receptor binding chico IIp1-8 dock Dok poly Dok Poly

Resistance mechanisms Water loss barriers

Cuticularhydrocarbons

Fatty acid synthasesq desatura-sesr transporters oxidativedecarbonylaset

CG3523 CG3524 desat1-2 FatPCyp4g1

CG3523 mdash

Primary hemolymph sugartissue-protectant

Trehalose metabolism Trehalose-phosphatasebu treh-lase activitybu phosphoglyc-erate mutase

Tps1 Treh Pgm CG5171CG5177 CG6262 crc

Treh Pgm mdash

NOTEmdashCandidate genes differentiated in the current study and in Telonis-Scott et al (2012) are shown and are underlined where overlappingaFor brevity several comprehensive reviews are cited and readers are referred to references therein Note that water budgeting via discontinuous gas exchange was not includeddue to a lack of ldquobona fiderdquo molecular candidates and desiccation tolerance due to aquaporins was omitted as genes were not differentiated in either of our studiesbChown et al (2011)cJohnson and Carder (2012)dFowler and Montell (2013)eDay et al (2005)fDavies et al (2012)gDavies et al (2013)hDavies et al (2014)iTerhzaz et al (2012)jTerhzaz et al (2014)kTerhzaz et al (2015)lKahsai et al (2010)mHuang and Tunnacliffe (2004)nHuang and Tunnacliffe (2005)oSeurooderberg et al (2011)pLiu et al (2015)qChung et al (2014)rSinclair et al (2007)rFoley and Telonis-Scott (2011)tQiu et al (2012)uThorat et al (2012)

5

Conserved Genomic Signatures of Drosophila Desiccation Resistance doi101093molbevmsv349 MBE by guest on February 6 2016

httpmbeoxfordjournalsorg

Dow

nloaded from

out of the 382 and 416 genes for the current and 2012 studyrespectively (Fisherrsquos exact test for overlap P lt 2 1016odds ratio frac14 590 supplementary table S4 SupplementaryMaterial online) Bearing gene length bias in mind we exam-ined the average length of genes in both studies as well as thegene overlap to the average gene length in the Drosophilagenome (supplementary fig S3 Supplementary Materialonline) Although there was bias toward longer genes in theoverlap list this was also apparent in the full gene lists fromboth studies

The overlap genes mapped to all of the major chromo-somes and their differentiated variants were mostly noncod-ing SNPs located in introns (supplementary table S4Supplementary Material online) Two SNPs were synonymous(Strn-Mlick coding for a protein kinase and Cht7 chitinbinding supplementary table S4 Supplementary Materialonline) while a 30UTR variant was predicted to putativelyalter protein structure in a nondisruptive manner mostlikely impacting expression andor protein translational effi-ciency (jbug response to stimulus supplementary table S4Supplementary Material online) One nonsynonymous SNPwas predicted to alter protein function as a missense variant(fab1 signaling supplementary table S4 SupplementaryMaterial online) The increase in frequency of the favoreddesiccation alleles compared with the controls ranged from5 to 27 with a median increase of 164

Despite the strong overlap there were important differ-ences between the two studies For example our previouswork revealed evidence of a selective sweep in the 50 pro-moter region of the Dys gene in the artificially selected linesand we confirmed a higher frequency of the selected SNPs inan independent natural association study in resistant fliesfrom Coffs Harbour (Telonis-Scott et al 2012) In this studywe found no differentiation at the Dys locus between ourresistant and random controls However both resistant andcontrol pools had a high frequency of the selected allele de-tected in Telonis-Scott et al (2012) (supplementary table S5Supplementary Material online)

The 45 genes common to both studies were analyzed forbiological function while many genes are obviously pleiotro-pic (ie assigned multiple GO terms) almost half were in-volved in response to stimuli (cellular behavioral andregulatory) including signaling such as Rho protein signaltransduction and cAMP-mediated signaling (table 3) Othercategories of interest include spiracle morphogenesis sensoryorgan development including development of the compoundeye and stem cell and neuroblast differentiation (table 3)

With regard to specific stress responsedesiccation candi-dates (table 1) 4 of the 45 genes overlapped including thecAMPcGMP signaling phosphodiesterases pde9 and dncMAPK pathway genes Mtl and klu indicated in NF-kBorthologue signaling in the renal tubules (table 1)Furthermore there was overlap among mechanistic catego-ries where different genes in the same pathways weredetected For example additional key cAMPcGMP hydrolyz-ing phosphodiesterases pde1c and pde11 were significant inTelonis-Scott et al (2012) as well as slpr (MAPK pathway)and poly (insulin signaling) (table 1)

Table 2 Pathway Enrichment of the GWAS and Telonis-Scott et alFirst-Order PPI Networks Investigated Using FlyMine (FDR P lt 005)

Current Study GWAS Telonis-Scott et al(2012)

Database Pathway Pathway

KEGG Ribosome RibosomeNotch signaling pathway Notch signaling pathwayMAPK signaling pathwayDorso-ventral axis formationProgesterone-mediated oocyte

maturationWnt signaling pathwayEndocytosisProteasome

Reactome Immune system Immune systemAdaptive immune system Adaptive immune systemInnate immune system Innate immune systemSignaling by interleukins Signaling by interleukins(Toll-like receptor cascades) Toll-like receptor

cascadesTCR signaling(B cell receptor signaling)

Gene expression Gene expressionMetabolism of RNA Metabolism of RNANonsense-mediated decay Nonsense-mediated decayTranscriptionmRNA processing(RNA transport)Metabolism of proteins (Metabolism of

proteins)Translation (Translation)Cell cycle Cell cycle

Cell cycle checkpointsRegulation of mitotic cell

cycleMitotic metaphase and

anaphaseG0 and early G1 G0 and early G1

G1 phase(G1S transition)G2M transition(DNA replicationS

phase)DNA repairDouble-strand break

repairSignaling pathways Signaling pathwaysInsulin signaling pathwaySignaling by NGF Signaling by NGF

Signaling by FGFRSignaling by NOTCHSignaling by WntSignaling by EGFRDevelopmentAxon guidanceMembrane traffickingGap junction trafficking

and regulationProgramed cell deathApoptosisHemostasisHemostasis

NOTEmdashKEGG and Reactome pathway databases were queried All significant termsare reported with Reactome terms grouped together under parent terms in italictext (full list of terms available in supplementary table S3 Supplementary Materialonline) Parent terms without brackets are the names of pathways that were sig-nificantly enriched parent terms in parentheses were not listed as significantlyenriched themselves but are reported as a guide to the category contents

6

In addition to the significant gene-level overlap betweenthe two studies we observed significant overlap between thePPI networks constructed from each set of candidate genes(summarized in fig 2) We examined the overlap by compar-ing the first-order networks 1) at the level of node overlap and2) at the level of edge (interaction) overlap Three hundredseven GWAS seeds formed a network of 3324 nodes whileour previous set of candidates mapped to 337 seeds forming anetwork that comprised 3215 nodes One thousand eighthundred twenty-six or ~55 of the nodes overlapped be-tween networks which was significantly higher than the nulldistribution estimated from 1000 simulations (mean simu-lated overlap SD 1304 143 nodes observed overlap atthe 999th percentile fig 2A and C) The overlap at the inter-action level was not significantly higher than expected (meansimulated overlap SD 25704 4024 edges observed over-lap 30686 edges 89th percentile fig 2B and D) Hive plotswere used to visualize the separate networks (supplementaryfig S4A and B Supplementary Material online) as well as theoverlap between the networks (supplementary fig S4CSupplementary Material online)

Given the high degree of shared nodes from distinct setsof protein seeds we compared the networks for biologicalfunction (fig 2E) Both networks produced long lists of sig-nificantly enriched GO terms (supplementary table S3Supplementary Material online) and assessing overlap was

difficult due to the nonindependent nature of GO hierar-chies However a comparison of functional pathway enrich-ment revealed some differences Although both networksshowed enrichment in the immune system pathways theribosome and Notch and NGF signaling pathways as well assome involvement of gene expression and translation path-ways the network constructed from the Telonis-Scott et al(2012) gene list showed extensive involvement of cell cyclepathway proteins as well as significant enrichment for Wntsignaling EGRF signaling DNA double-strand break repairaxon guidance gap junction trafficking and apoptosis path-ways which were not enriched in the GWAS network TheGWAS network was specifically enriched for dorso-ventralaxis formation oocyte maturation and many more geneexpression pathways than the Telonis-Scott et al (2012)gene network (table 2 and supplementary table S3Supplementary Material online)

Discussion

Genomic Signature of Desiccation Resistance fromNatural Standing Genetic Variation

We report the first large-scale genome-wide screen for nat-ural alleles associated with survival of low humidity condi-tions in D melanogaster By combining a powerful naturalassociation study with high-throughput Pool-seq we

Table 3 Overrepresentation of GO Categories for the 45 Core Candidate Genes Overlapping between the Current and Our Previous(Telonis-Scott et al 2012) Genomic Study

Term ID TermSubterms FDR P valuea Genes

Cellularbehavioral response to stimulus defense and signaling

GO0050896 Response to stimulus (cellular response tostimulus regulation of response tostimulus)

0007 Abd-B dnc fz mam nkd slo ct fru kluRbf klg santa-maria jbug fab1CG33275 Mtl Trim9 cv-c Gprk2 Ect4CG43658

GO0048585 Negative regulation of response to stimu-lus (Rho protein signal transductionsignal transduction)

0042 dnc mam slo fru Trim9 cv-c

GO0023052 Signaling (cAMP-mediated signaling cellcommunication)

0031 Amph Pde9 dnc fz mam nkd ct kluRbf santa-maria fab1 CG33275 MtlTrim9 cv-c Gprk2 Ect4 CG43658

Development

GO0048863 Stem cell differentiation 0014 Abd-D mam nkd klu nub

GO0014016 Neuroblast differentiation 0048

GO0045165 Cell fate commitment 0010 Abd-D dnc fz mam nkd ct klu klg nub

GO0035277 Spiracle morphogenesis open trachealsystem

0010 Abd-D ct cv-c

GO0007423 Sensory organ development (eye wingdisc imaginal disc)

0008 fz mam ct klu klg sdk Amph Nckx30cMtl

GO0048736 Appendage development 0020 fz mam ct fru CG33275 Mtl Fili nubcv-c Gprk2 CG43658

GO0007552 Metamorphosis 0001 fz mam ct fru CG33275 Mtl Fili cv-cGprk2 CG43658

Cellular component organization

GO0030030 Cell projection organization 0020 dnc fz ct fru GluClalpha Nckx30c MtlTrim9 nub cv-c

NOTEmdashNote that most genes are assigned to multiple annotations and therefore are shown purely as a guide to putative functionaBonferronindashHochberg FDR and gene length corrected P values using the Flymine v400 database Gene Ontology Enrichment The terms were condensed and trimmed usingREVIGO

7

mapped SNPs associated with desiccation resistance in 500naturally derived genotypes Although Pool-seq has limitedpower to reveal very low frequency alleles to reconstructphase or adequately estimate allele frequencies in smallpools (Lynch et al 2014) the Pool-GWAS approach hasbeen validated by retrieval of known candidate genes in-volved in body pigmentation using similar numbers of ge-notypes as assessed here (Bastide et al 2013)

Although studies in controlled genetic backgrounds reportlarge effects of single genes on desiccation resistance (Table 1)our current and previous screens revealed similarly largenumbers of variants (over 600) mapping to over 300 genesA polygenic basis of desiccation resistance is consistent withquantitative genetic data for the trait (Foley and Telonis-Scott2011) and we also anticipate more complexity from naturalD melanogaster populations with greater standing geneticvariation than inbred laboratory strains However somefalse positive associations are likely in the GWAS due to thelimitations of quantitative trait mapping approaches Pooledevolve-and-resequence experiments routinely yield vast num-bers of candidate SNPs partly due to high levels of standinglinkage disequilibrium (LD) and hitchhiking neutral alleles

(Schleurootterer et al 2015) Large population sizes from specieswith low levels of LD such as D melanogaster somewhat cir-cumvent this issue and a notable feature of our experimentaldesign is our sampling Here we selected the most desicca-tion-resistant phenotypes directly from substantial standinggenetic variation where one generation of recombinationshould largely reflect natural haplotypes

We observed the largest candidate allele frequency differ-ences in loci that had intermediate allele frequencies in thecontrol pool This is in line with population genetics modelswhich predict that evolution from standing genetic variationwill initially be strongest for intermediate frequency alleleswhich can facilitate rapid change (Long et al 2015) Howeverwe also observed a number of low frequency alleles which ifbeneficial can inflate false positives due to long-range LD(Orozco-terWengel et al 2012 Nuzhdin and Turner 2013Tobler et al 2014) Nonetheless spurious LD associationscannot fully account for our candidate list as evidenced bythe desiccation functional candidates and cross-study over-lap Furthermore chromosome inversion dynamics can alsogenerate both short and long-range LD and contribute nu-merous false positive SNP phenotype associations (Hoffmann

Fig 2 Observed versus simulated network overlap between the GWAS first-order PPI network and the first-order PPI network from the Telonis-Scottet al (2012) gene list Overlap is presented in terms of (A) node number and (B) edge number Networks were simulated by resampling from the entireDrosophila melanogaster gene list (r553) as described in the text The histogram shows the distribution of overlap measures for 1000 simulations Theblack line represents the histogram as density and the blue line shows the corresponding normal distribution The red vertical line shows the observedoverlap from the real networks labeled as a percentile of the normal distribution Area-proportional Venn diagrams summarized the extent of overlapfor PPI networks constructed separately from the GWAS candidates and Telonis-Scott et al (2012) candidates (labeled ldquoprevious studyrdquo) (C) Overlapexpressed as the number of nodes Approximately 55 of the nodes overlapped between the two studies which was significantly higher than simulatedexpectations (999 percentile 2B) (D) Overlap expressed in the number of interactions This was not significantly higher than expected by simulation(89th percentile 2B) (E) Overlap at the level of GO biological process terms (directly compared GO terms by name this was not tested statisticallybecause of the complex hierarchical nature of GO terms and is presented for interest)

8

and Weeks 2007 Tobler et al 2014) but we found little as-sociation between the desiccation pool and inversion fre-quency changes consistent with the lack of latitudinalpatterns in desiccation resistanceand traitinversion fre-quency associations in east Australian D melanogaster(Hoffmann et al 2001 Weeks et al 2002)

Cross-Study Comparison Reveals a Core Set ofCandidates for a Complex Trait

Our data contain many candidate genes functions and path-ways for insect desiccation resistance but partitioning genu-ine adaptive loci from experimental noise remains asignificant challenge We therefore focus our efforts on ournovel comparative analysis where we identified 45 robustcandidates for further exploration Common genetic signa-tures are rarely documented in cross-study comparisons ofcomplex traits (Sarup et al 2011 Huang et al 2012 but cfVermeulen et al 2013) and never before for desiccation re-sistance Some degree of overlap between our studies likelyresults from a shared genetic background from similar sam-pling sites highlighting the replicability of these responsesOne limitation of this approach is the overpopulation ofthe ldquocorerdquo list by longer genes This result reflects the genelength bias inherent to GWAS approaches an issue that is notstraightforward to correct in a GWAS let alone in the com-parative framework applied here A focus on the commongenes functionally linked to the desiccation phenotype canprovide a meaningful framework for future research and alsosuggests that some large-effect candidate genes exhibit natu-ral variation contributing to the phenotype (table 1)

Other feasible candidates include genes expressed inthe MTs (Wang et al 2004) or essential for MT develop-ment (St Pierre et al 2014) providing further researchopportunity focused on the primary stress sensing andfluid secretion tissues Genes include one of the mosthighly expressed MT transcripts CG7084 as well as dncNckx30C slo cv-c and nkd Other aspects of stress resis-tance are also implicated CrebB for example is involvedin regulating insulin-regulated stress resistance and ther-mosensory behavior (Wang et al 2008) nkd also impactslarval cuticle development and shares a PDZ signalingdomain also present in the desiccation candidate desi(Kawano et al 2010 St Pierre et al 2014) Finally severalcore candidates were consistently reported in recent stud-ies suggesting generalized roles in stress responses andfitness including transcriptome studies on MT function(Wang et al 2004) inbreeding and cold resistance(Vermeulen et al 2013) oxidative stress (Weber et al2012) and genomic studies exploring age-specific SNPson fitness traits (Durham et al 2014) and genes likelyunder spatial selection in flies from climatically divergenthabitats (Fabian et al 2012)

Several of the enriched GO categories from the core listalso make biological sense including stimulus response(almost half the suite) defense and signaling particularlyin pathways involved in MT stress sensing (Davies et al 20122013 2014) Functional categories enriched for sensory

organ development were highlighted consistent with envi-ronmental sensing categories from desert Drosophila tran-scriptomics (Matzkin and Markow 2009 Rajpurohit et al2013) and stress detection via phototransduction pathwaysin D melanogaster under a range of stressors including des-iccation (Soslashrensen et al 2007)

Evidence for Conserved Signatures at Higher LevelOrganization

Beyond the gene level we observed a striking degree ofsimilarity between this study and that of Telonis-Scottet al (2012) at the network level where significantly over-lapping protein networks were constructed from largely dif-ferent seed protein sets This supports a scenario where theldquobiological information flow from DNA to phenotyperdquo(Civelek and Lusis 2014) contains inherent redundancywhere alternative genetic solutions underlie phenotypesand functions Resistance to perturbation by genetic or en-vironmental variation appears to increase with increasinghierarchical complexity that is phenotype ldquobufferingrdquo (Fuet al 2009) The same functional network in different pop-ulationsgenetic backgrounds can be impacted but theexact genes and SNPs responding to perturbation will notnecessarily overlap In Drosophila different sets of loci fromthe same founders mapped to the same PPI network in thecase of two complex traits chill coma recovery and startleresponse (Huang et al 2012) Furthermore interspecificshared networkspathways from different genes have beendocumented for numerous complex traits (Emilsson et al2008 Aytes et al 2014 Ichihashi et al 2014) providing op-portunities for investigating taxa where individual genefunction is poorly understood

Although the null distribution is not always obviouswhen comparing networks and their functions partly be-cause of reduced sensitivity from missing interactions(Leiserson et al 2013) here the shared network signaturesportray a plausible multifaceted stress response involvingstress sensing rapid gene accessibility and expression Thepredominance of immune system and stress responsepathways are of interest with reference immunitystresspathway ldquocross-talkrdquo particularly in the MTs whereabiotic stressors elicit strong immunity transcriptionalresponses (Davies et al 2012) The higher order architec-tures reflect crucial aspects of rapid gene expression con-trol in response to stress from chromatin organization(Alexander and Lomvardas 2014) to transcription RNAdecay and translation (reviwed in de Nadal et al 2011) InDrosophila variation in transcriptional regulation of indi-vidual genes can underpin divergent desiccation pheno-types (Chung et al 2014) or be functionally inferred frompatterns of many genes elicited during desiccation stress(Rajpurohit et al 2013) Variants with potential to mod-ulate expression at different stages of gene expressioncontrol were evident in our GWAS data alone for exam-ple enrichment for 50UTR and splice variants at the SNPlevel and for chromatin and chromosome structure termsat the functional level

9

Genetic Overlap Versus Genetic differencesmdashCausesand Implications

Despite the common signatures at the gene and higher orderlevels between our studies pervasive differences were ob-served These can arise from differences in standing geneticvariation in the natural populations where different alleliccompositions impact gene-by-environment interactions andepistasis Our network overlap analyses highlight the potentialfor different molecular trajectories to result in similar func-tions (Leiserson et al 2013) consistent with the multiplephysiological pathways potentially available to D melanoga-ster under water stress (Chown et al 2011) Experimentaldesign also presumably contributed to the study differencesStrong directional selection as applied in the 2012 study canincrease the frequency of rare beneficial alleles and likely re-duced overall diversity than did the single-generation GWASdesign Furthermore selection regimes often impose addi-tional selective pressures (eg food deprivation during desic-cation) resulting in correlated responses such as starvationresistance with desiccation resistance and these factors differfrom the laboratory to the field Finally evidence suggests innatural D melanogaster that standing genetic variation canvary extensively with sampling season (Itoh et al 2010Bergland et al 2014) Although we cannot compare levelsof standing variation between our two studies evidence sug-gests that this may have affected the Dys gene where ourGWAS approach revealed no differentiation between thecontrol and resistant pools at this locus but rather detectedthe alleles associated with the selective sweep in Telonis-Scottet al (2012) in both pools at high frequency Whether thisresulted from sampling the later collection after a longdrought (2004ndash2008 wwwbomgovauclimate last accessedJanuary 13 2016) is unclear but this observation highlightsthe relevance of temporal sampling to standing genetic var-iation when attempting to link complex phenotypes togenotypes

Materials and Methods

Natural Population Sampling

Drosophila melanogaster was collected from vineyard waste atKilchorn Winery Romsey Australia in April 2010 Over 1200isofemale lines were established in the laboratory from wild-caught females Species identification was conducted on maleF1 to remove Drosophila simulans contamination and over900 D melanogaster isofemale lines were retained All flieswere maintained in vials of dextrose cornmeal media at25 C and constant light

Desiccation Assay

From each isofemale line ten inseminated F1 females weresorted by aspiration without CO2 and held in vials until 4ndash5days old For the desiccation assay 10 females per line (over9000 flies in total) were screened by placing groups of femalesinto empty vials covered with gauze and randomly assigninglines into 5 large sealed glass tanks containing trays of silicadesiccant (relative humidity [RH] lt10) The tanks werescored for mortality hourly and the final 558 surviving

individuals were selected for the desiccation-resistant pool(558 flies lt6 of the total) The control pool constitutedthe same number of females sampled from over 200 ran-domly chosen isofemale lines that were frozen prior to thedesiccation assay We chose this random control approachinstead of comparing the phenotypic extremes (ie the latemortality tail vs the ldquoearly mortalityrdquo tail) because unfavor-able environmental conditions experienced by the wild-caught dam may inflate environmental variance due to car-ryover effects in the F1 offspring (Schiffer et al 2013)

DNA Extraction and Sequencing

We used pooled whole-genome sequencing (Pool-seq)(Futschik and Schleurootterer 2010) to estimate and compareallele frequency differences between the most desiccation-resistant and randomly sampled flies from a wild populationto associate naturally segregating candidate SNPs with theresponse to low humidity For each treatment (resistantand control) genomic DNA was extracted from 50 femaleheads using the Qiagen DNeasy Blood and Tissue Kit accord-ing to the manufacturerrsquos instructions (Qiagen HildenGermany) Two extractions were combined to create fivepools per treatment which were barcoded separately tocreate technical replicates (total 10 pools of 100 fliesn frac14 1000) The DNA was fragmented using a CovarisS2 ma-chine (Covaris Inc Woburn MA) and libraries were pre-pared from 1 mg genomic DNA (per pool of 100 flies) usingthe Illumina TruSeq Prep module (Illumina San Diego CA)To minimize the effect of variation across lanes the ten poolswere combined in equal concentration (fig 3) and run mul-tiplexed in each lane of a full flow cell of an Illumina HiSeq2000 sequencer Clusters were generated using the TruSeq PECluster Kit v5 on a cBot and sequenced using Illumina TruSeqSBS v3 chemistry (Illumina) Fragmentation library construc-tion and sequencing were performed at the Micromon Next-Gen Sequencing facility (Monash University ClaytonAustralia)

Fig 3 Experimental design See text for further explanation

10

Data Processing and Mapping

Processing was performed on a high-performance computingcluster using the Rubra pipeline system that makes use of theRuffus Python library (Goodstadt 2010) The final variant-calling pipeline was based on a pipeline developed by ClareSloggett (Sloggett et al 2014) and is publicly available athttpsgithubcomgriffinpGWAS_pipeline (last accessedJanuary 13 2016) Raw sequence reads were processed persample per lane with P1 and P2 read files processed sepa-rately initially Trimming was performed with trimmomatic v030 (Lohse et al 2012) Adapter sequences were also re-moved Leading and trailing bases with a quality scorebelow 30 were trimmed and a sliding window (width frac14 10quality threshold frac14 25) was used Reads shorter than 40 bpafter trimming were discarded Quality before and after trim-ming was examined using FastQC v 0101 (Andrews 2014)

Posttrimming reads that remained paired were used asinput into the alignment step Alignment to the D melano-gaster reference genome version r553 was performed withbwa v 062 (Li and Durbin 2009) using the bwa aln commandand the following options No seeding (-l 150) 1 rate ofmissing alignments given 2 uniform base error rate (-n001) a maximum of 2 gap opens per read (-o 2) a maximumof 12 gap extensions per read (-e 12) and long deletionswithin 12 bp of 30 end disallowed (-d 12) The default valueswere used for all other options Output files were then con-verted to sorted bam files using the bwa sampe commandand the ldquoSortSamrdquo option in Picard v 196

At this point the bam files were merged into one file persample using PicardMerge Duplicates were identified withPicard MarkDuplicates The tool RealignerTargetCreator inGATK v 26-5 (DePristo et al 2011) was applied to the tenbam files simultaneously producing a list of intervals contain-ing probable indels around which local realignment would beperformed This was used as input for the IndelRealigner toolwhich was also applied to all ten bam files simultaneously

To investigate the variation among technical replicatesallele frequency was estimated based on read counts foreach technical replicate separately for 100000 randomly cho-sen SNPs across the genome After excluding candidate des-iccation-resistance SNPs and SNPs that may have been falsepositives due to sequencing error (those with mean frequencylt004 across replicates) 1803 SNPs remained For each locuswe calculated the variance in allele frequency estimate amongthe five control replicates and the variance among the fivedesiccation-resistance replicates We compared this with thevariance due to pool category (control vs desiccation resis-tant) using analysis of variance (ANOVA) We also calculatedthe mean pairwise difference in allele frequency estimateamong control and among desiccation-resistance replicatesand the mean concordance correlation coefficient was calcu-lated over all pairwise comparisons within pool category(using the epiccc function in the epiR package Stevensonet al 2015)

The following approach was then taken to identify SNPsdiffering between the desiccation-resistant and controlgroups Based on the technical replicate analysis we merged

the five ldquodesiccation-resistantrdquo and the five random controlsamples into two bam files A pileup file was created withsamtools v 0119 (Li et al 2009) and converted to a ldquosyncrdquo fileusing the mpileup2sync script in Popoolation2 v 1201 (Kofleret al 2011) retaining reads with a mapping quality of at least20 and bases with a minimum quality score of 20 Reads withunmapped mates were also retained (option ndashA in samtoolsmpileup) For this step we used a repeat-masked version ofthe reference genome created with RepeatMasker 403 (Smitet al 2013-2015) and a repeat file containing all annotatedtransposons from D melanogaster including shared ancestralsequences (Flybase release r557) plus all repetitive elementsfrom RepBase release v1904 for D melanogaster Simple re-peats small RNA genes and bacterial insertions were notmasked (options ndashnolow ndashnorna ndashno_is) The rmblastsearch engine was used

Association Analysis

We then performed Fisherrsquos exact test with the Fisher testscript in Popoolation2 (Kofler et al 2011) We set the min-imum count (of the minor allele) frac14 10 minimum coverage(in both populations) frac14 20 and maximum coverage frac1410000 The test was performed for each SNP (slidingwindow off) Although there are numerous approaches tocalculating genome-wide significance thresholds in aPoolseq framework currently there is no consensus on anappropriate standardized statistical method Methods in-clude simulation approaches (Orozco-terWengel et al2012 Bastide et al 2013 Burke et al 2014) correctionusing a null distribution based on estimating genetic driftfrom replicated artificial selection line allele frequency vari-ance (Turner and Miller 2012) use of a simple thresholdbased on a minimum allele frequency difference (Turneret al 2010) and the FDR correction suggested by Storeyand Tibshirani (2003) (used by Fabian et al 2012)

We determined based on our design to calculate the FDRthreshold by comparison with a simulated distribution of nullP values following the approach of Bastide et al (2013) Foreach SNP locus a P value was calculated by simulating thedistribution of read counts between the major and minorallele according to a beta-binomial distribution with meana(a+b) and variance (ab)((a+b)2(a+b+1)) Thismodel allowed for two types of sampling variationStochastic variation in allele sampling across the phenotypesand the background variation in the representation of allelesin each pool caused by library preparation artifacts or sto-chastic variation driven by unequal coverage of the poolsThese sources of variation have also been recognized else-where as being important in interpreting pooled sequenceallele calls and identifying significant differentiation betweenpooled samples (Lynch et al 2014) The value of afrac14 37 waschosen by chi-square test as the best match to the observedP value distribution (afrac14 10 to afrac14 40 were tested) with acorresponding bfrac14 4315 to reflect the coverage depth differ-ence between the random control (487) and desiccation-resistant (568) pools These values were then used tosimulate a null data set 10 the size of the observed data

11

set Because the null P value distribution still did not fit theobserved P value distribution particularly well (supplemen-tary fig S2 Supplementary Material online) these values werenot used for a standard FDR correction Instead a P valuethreshold was calculated based on the lowest 005 of thenull P values similar in approach to Orozco-terWengel et al(2012)

Inversion Analysis

In D melanogaster several cosmopolitan inversion polymor-phisms show cross-continent latitudinal patterns that explaina substantial proportion of clinal variation for many traitsincluding thermal tolerance (reviewed in Hoffmann andWeeks 2007) Patterns of disequilibria generated betweenloci in the vicinity of the rearrangement can obscure signalsof selection (Hoffmann and Weeks 2007) and inversion fre-quencies are routinely considered in genomic studies ofnatural populations sampled from known latitudinal clines(Fabian et al 2012 Kapun et al 2014) We assessedassociations between inversion frequencies and our controland resistant pools at SNPs diagnostic for the following in-versions In3R(Payne) (Anderson et al 2005 Kapun et al2014) In(2L)t (Andolfatto et al 1999 Kapun et al 2014)In(2R)Ns In(3L)P In(3R)C In(3R)K and In(3R)Mo (Kapunet al 2014) Between one and five diagnostic SNPs were in-vestigated for each inversion (depending on availability) Weconsidered an inversion to be associated with desiccationresistance if the frequency of its diagnostic SNP differed sig-nificantly between the control and desiccation-resistant poolsby a Fisherrsquos exact test

Genome Feature Annotation

The list of candidate SNPs was converted to vcf format using acustom R script setting the alternate allele as the allele thatincreased in frequency from the control to the desiccation-resistant pool and setting the reference allele as the otherallele present in cases where both the major and minor allelesdiffered from the reference genome The genomic features inwhich the candidate SNPs occurred were annotated usingsnpEff v40 first creating a database from the D melanogasterv553 genome annotation with the default approach(Cingolani et al 2012) The default annotation settings wereused except for setting the parameter ldquo-ud 0rdquo to annotateSNPs to genes only if they were within the gene body Using2 tests on SNP counts we tested for over- or under-enrichment of feature types by calculating the proportionof lsquoallrsquo features from all SNPs detected in the study and theproportion of ldquocandidaterdquo SNPs identified as differentiated indesiccation-resistant flies (Fabian et al 2012)

GO Analysis

GO analyses was used to associate the candidates with theirbiological functions (Ashburner et al 2000) Previously devel-oped for biological pathway analysis of global gene expressionstudies (Wang et al 2010) typical GO analysis assumes that allgenes are sampled independently with equal probabilityGWAS violate these criteria where SNPs are more likely to

occur in longer genes than in shorter genes resulting in highertype I error rates in GO categories overpopulated with longgenes Gene length as well as overlapping gene biases (Wanget al 2011) were accounted for using Gowinda (Kofler andSchleurootterer 2012) which implements a permutation strategyto reduce false positives Gowinda does not reconstruct LDbetween SNPs but operates in two modes underpinned bytwo extreme assumptions 1) Gene mode (assumes all SNPs ina gene are in LD) and 2) SNP mode (assumes all SNPs in agene are completely independent)

The candidates were examined in both modes but giventhe stringency of gene mode high-resolution SNP modeoutput was utilized for the final analysis with the followingparameters 106 simulations minimum significance frac141 andminimum genes per GO categoryfrac14 3 (to exclude gene-poorcategories) P values were corrected using the Gowinda FDRalgorithm GO annotations from Flybase r557 were applied

Candidate Gene Analysis

In conjunction with the GO analysis we also examinedknown candidate genes associated with survival under lowRH to better characterize possible biological functions of ourGWAS candidates We took a conservative approach to can-didate gene assignment where the SNPs were mapped withinthe entire gene region without up- and downstream exten-sions Genes were curated from FlyBase and the literature(table 1) predominantly using detailed functional experimen-tal evidence as criteria for functional desiccation candidates

Network Analysis

We next obtained a higher level summary of the candidategene list using PPI network analysis The full candidate genelist was used to construct a first-order interaction networkusing all Drosophila PPIs listed in the Drosophila InteractionDatabase v 2014_10 (avoiding interologs) (Murali et al 2011)This ldquofullrdquo PPI network includes manually curated data frompublished literature and experimental data for 9633 proteinsand 93799 interactions For the network construction theigraph R package was used to build a subgraph from the fullPPI network containing the candidate proteins and their first-order interacting proteins (Csardi and Nepusz 2006) Self-connections were removed Functional enrichment analysiswas conducted on the list of network genes using FlyMinev400 (wwwflymineorg last accessed January 13 2016) withthe following parameters for GO enrichment and KEGGReactome pathway enrichment BenjaminindashHochberg FDRcorrection P lt 005 background frac14 all D melanogastergenes in our full PPI network and Ontology (for GO enrich-ment only) frac14 biological process or molecular function

Cross-Study Comparison Overlap with Our PreviousWork

Following on from the comparison between our gene list andthe candidate genes suggested from the literature we quan-titatively evaluated the degree of overlap between the candi-date genes identified in this study with alleles mapped in ourprevious study of flies collected from the same geographic

12

region (Telonis-Scott et al 2012) The ldquooverlaprdquo analysis wasperformed similarly for the GWAS candidate gene and net-work-level approaches First the gene list overlap was testedbetween the two studies using Fisherrsquos exact test where the382 genes identified from the candidate SNPs (this study)were compared with 416 genes identified from candidatesingle feature polymorphisms (Telonis-Scott et al 2012)The contingency table was constructed using the numberof annotated genes in the D melanogaster R553 genomebuilt as an approximation for the total number of genestested in each study and took the following form

where A frac14 gene list from Telonis-Scott et al (2012)n frac14 416 B frac14 gene list from this study n frac14 382 N frac14 totalnumber of genes in D melanogaster reference genome r553n frac14 17106

To annotate genome features in the overlap candidategene list we used the SNP data (this study) which betterresolves alleles than array-based genotyping Due to this lim-itation we did not compare exact genomic coordinates be-tween the studies but considered overlap to occur whendifferentiated alleles were detected in the same gene inboth studies We did however examine the allele frequenciesaround the Dys-RF promoter which was sequenced sepa-rately revealing a selective sweep in the experimental evolu-tion lines (Telonis-Scott et al 2012)

The overlapping candidates were tested for functional en-richment using a basic gene list approach As genes ratherthan SNPs were analyzed here Fly Mine v400 (wwwflymineorg) was applied with the following parameters Gene lengthnormalization BenjaminindashHochberg FDR correction Plt 005Ontology frac14 biological process and default background (allGO-annotated D melanogaster genes)

Finally we constructed a second PPI network from thecandidate genes mapped in Telonis-Scott et al (2012) to ex-amine system-level overlap between the two studies The 416genes mapped to 337 protein seeds and the network wasconstructed as described above The degree of network over-lap between the two studies was tested using simulation Ineach iteration a gene set of the same length as the list fromthis study and a gene set of the same length as the list fromTelonis-Scott et al (2012) were resampled randomly from theentire D melanogaster mapped gene annotation (r553) Eachresampled set was then used to build a first-order network aspreviously described The overlap network between the tworesampled sets was calculated using the graphintersectioncommand from the igraph package and zero-degree nodes(proteins with no interactions) were removed Overlap wasquantified by counting the number of nodes and the numberof edges in this overlap network The observed overlap wascompared with a simulated null distribution for both nodeand edge number where n frac14 1000 simulation iterations

Supplementary MaterialSupplementary figures S1ndashS4 and tables S1ndashS5 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

We thank Robert Good for the fly sampling and JenniferShirriffs and Lea and Alan Rako for outstanding assistancewith laboratory culture and substantial desiccation assaysWe are grateful to Melissa Davis for advice on network build-ing and comparison approaches and to five anonymous re-viewers whose comments improved the manuscript Thiswork was supported by a DECRA fellowship DE120102575and Monash University Technology voucher scheme toMTS an ARC Future Fellowship and Discovery Grants toCMS an ARC Laureate Fellowship to AAH and a Scienceand Industry Endowment Fund grant to CMS and AAH

ReferencesAlexander JM Lomvardas S 2014 Nuclear architecture as an epigenetic

regulator of neural development and function Neuroscience26439ndash50

Anderson AR Hoffmann AA McKechnie SW Umina PA Weeks AR2005 The latitudinal cline in the In(3R)Payne inversion polymor-phism has shifted in the last 20 years in Australian Drosophilamelanogaster populations Mol Ecol 14851ndash858

Andolfatto P Wall JD Kreitman M 1999 Unusual haplotype structureat the proximal breakpoint of In(2L)t in a natural population ofDrosophila melanogaster Genetics 1531297ndash1311

Andrews S 2014 FastQC Available from httpwwwbioinformaticsbbsrcacukprojectsfastqc

Ashburner M Ball CA Blake JA Botstein D Butler H Cherry JM DavisAP Dolinski K Dwight SS Eppig JT et al 2000 Gene ontology toolfor the unification of biology The Gene Ontology Consortium NatGenet 2525ndash29

Aytes A Mitrofanova A Lefebvre C Alvarez Mariano J Castillo-MartinM Zheng T Eastham James A Gopalan A Pienta Kenneth J ShenMM et al 2014 Cross-species regulatory network analysis identifiesa synergistic interaction between FOXM1 and CENPF that drivesprostate cancer malignancy Cancer Cell 25638ndash651

Bastide H Betancourt A Nolte V Tobler R Steuroobe P Futschik ASchleurootterer C 2013 A genome-wide fine-scale map of natural pig-mentation variation in Drosophila melanogaster PLoS Genet9e1003534

Bergland AO Behrman EL OrsquoBrien KR Schmidt PS Petrov DA 2014Genomic evidence of rapid and stable adaptive oscillations overseasonal time scales in Drosophila PLoS Genet 10e1004775

Bonebrake TC Mastrandrea MD 2010 Tolerance adaptation and pre-cipitation changes complicate latitudinal patterns of climate changeimpacts Proc Natl Acad Sci U S A 10712581ndash12586

Bradley TJ 2009 Animal osmoregulation Oxford Oxford UniversityPress

Burke MK Liti G Long AD 2014 Standing genetic variation drivesrepeatable experimental evolution in outcrossing populations ofSaccharomyces cerevisiae Mol Biol Evol 313228ndash3239

Chown SL 2012 Trait-based approaches to conservation physiologyforecasting environmental change risks from the bottom upPhilos Trans R Soc B 3671615ndash1627

Chown SL Nicolson SW 2004 Insect physiological ecology mechanismsand patterns Oxford Oxford University Press

Chown SL Soslashrensen JG Terblanche JS 2011 Water loss in insectsan environmental change perspective J Insect Physiol571070ndash1084

Not in GWAS In GWAS

Not in Telonis-Scott et al (2012) N ndash intersect-AndashB A-intersect

In Telonis-Scott et al (2012) B-intersect Intersect(AB)

13

Chung H Loehlin DW Dufour HD Vaccarro K Millar JG Carroll SB2014 A single gene affects both ecological divergence and matechoice in Drosophila Science 3431148ndash1151

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly(Austin) 680ndash92

Civelek M Lusis AJ 2014 Systems genetics approaches to understandcomplex traits Nat Rev Genet 1534ndash48

Clusella-Trullas S Blackburn TM Chown SL 2011 Climatic predictors oftemperature performance curve parameters in ectotherms implycomplex responses to climate change Am Nat 177738ndash751

Csardi G Nepusz T 2006 The igraph software package for complexnetwork research InterJournal Complex Systems 1695 Availablefrom httpigraphorg

Davies SA Cabrero P Overend G Aitchison L Sebastian S Terhzaz SDow JA 2014 Cell signalling mechanisms for insect stress toleranceJ Exp Biol 217119ndash128

Davies SA Cabrero P Povsic M Johnston NR Terhzaz S Dow JA 2013Signaling by Drosophila capa neuropeptides Gen Comp Endocrinol18860ndash66

Davies SA Overend G Sebastian S Cundall M Cabrero P Dow JATerhzaz S 2012 Immune and stress response lsquocross-talkrsquo in theDrosophila Malpighian tubule J Insect Physiol 58488ndash497

Day JP Dow JA Houslay MD Davies SA 2005 Cyclic nucleotide phos-phodiesterases in Drosophila melanogaster Biochem J 388333ndash342

de Nadal E Ammerer G Posas F 2011 Controlling gene expression inresponse to stress Nat Rev Genet 12833ndash845

DePristo MA Banks E Poplin R Garimella KV Maguire JR Hartl CPhilippakis AA del Angel G Rivas MA Hanna M et al 2011 Aframework for variation discovery and genotyping using next-gen-eration DNA sequencing data Nat Genet 43491ndash498

Djawdan M Chippindale AK Rose MR Bradley TJ 1998 Metabolic re-serves and evolved stress resistance in Drosophila melanogasterPhysiol Zool 71584ndash594

Durham MF Magwire MM Stone EA Leips J 2014 Genome-wide anal-ysis in Drosophila reveals age-specific effects of SNPs on fitness traitsNat Commun 54338

Emilsson V Thorleifsson G Zhang B Leonardson AS Zink F Zhu JCarlson S Helgason A Walters GB Gunnarsdottir S et al 2008Genetics of gene expression and its effect on disease Nature452423ndash428

Fabian DK Kapun M Nolte V Kofler R Schmidt PS Schleurootterer C FlattT 2012 Genome-wide patterns of latitudinal differentiation amongpopulations of Drosophila melanogaster from North America MolEcol 214748ndash4769

Foley BR Telonis-Scott M 2011 Quantitative genetic analysis suggestscausal association between cuticular hydrocarbon composition anddesiccation survival in Drosophila melanogaster Heredity 10668ndash77

Folk DG Han C Bradley TJ 2001 Water acquisition and partitioning inDrosophila melanogaster effects of selection for desiccation-resis-tance J Exp Biol 2043323ndash3331

Fowler MA Montell C 2013 Drosophila TRP channels and animal be-havior Life Sci 92394ndash403

Fu J Keurentjes JJB Bouwmeester H America T Verstappen FWA WardJL Beale MH de Vos RCH Dijkstra M Scheltema RA et al 2009System-wide molecular evidence for phenotypic buffering inArabidopsis Nat Genet 41166ndash167

Futschik A Schleurootterer C 2010 The next generation of molecular mar-kers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Gibbs AG Fukuzato F Matzkin LM 2003 Evolution of water conserva-tion mechanisms in Drosophila J Exp Biol 2061183ndash1192

Gibbs AG Matzkin LM 2001 Evolution of water balance in the genusDrosophila J Exp Biol 2042331ndash2338

Goodstadt L 2010 Ruffus a lightweight Python library for computa-tional pipelines Bioinformatics 262778ndash2779

Hadley NF 1994 Water relations of terrestrial arthropods San Diego(CA) Academic Press

Hoffmann AA Hallas R Sinclair C Mitrovski P 2001 Levels of variationin stress resistance in Drosophila among strains local populationsand geographic regions patterns for desiccation starvation coldresistance and associated traits Evolution 551621ndash1630

Hoffmann AA Parsons PA 1989a An integrated approach to environ-mental stress tolerance and life-history variation desiccation toler-ance in Drosophila Biol J Linn Soc 37117ndash136

Hoffmann AA Parsons PA 1989b Selection for increased desiccationresistance in Drosophila melanogaster additive genetic control andcorrelated responses for other stresses Genetics 122837ndash845

Hoffmann AA Sgro CM 2011 Climate change and evolutionary adap-tation Nature 470479ndash485

Hoffmann AA Weeks AR 2007 Climatic selection on genes and traitsafter a 100 year-old invasion a critical look at the temperate-tropicalclines in Drosophila melanogaster from eastern Australia Genetica129133ndash147

Huang W Richards S Carbone MA Zhu D Anholt RR Ayroles JFDuncan L Jordan KW Lawrence F Magwire MM et al 2012Epistasis dominates the genetic architecture of Drosophila quanti-tative traits Proc Natl Acad Sci U S A 10915553ndash15559

Huang Z Tunnacliffe A 2004 Response of human cells to desiccationcomparison with hyperosmotic stress response J Physiol558181ndash191

Huang Z Tunnacliffe A 2005 Gene induction by desiccation stress inhuman cell cultures FEBS Lett 5794973ndash4977

Ichihashi Y Aguilar-Martınez JA Farhi M Chitwood DH Kumar RMillon LV Peng J Maloof JN Sinha NR 2014 Evolutionary develop-mental transcriptomics reveals a gene network module regulatinginterspecific diversity in plant leaf shape Proc Natl Acad Sci U S A111E2616ndashE2621

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Johnson WA Carder JW 2012 Drosophila nociceptors mediate larvalaversion to dry surface environments utilizing both the painless TRPchannel and the DEGENaC subunit PPK1 PLoS One 7e32878

Kahsai L Kapan N Dircksen H Winther AM Nassel DR 2010 Metabolicstress responses in Drosophila are modulated by brain neurosecre-tory cells that produce multiple neuropeptides PLoS One 5e11480

Kapun M van Schalkwyk H McAllister B Flatt T Schleurootterer C 2014Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogasterMol Ecol 231813ndash1827

Kawano T Shimoda M Matsumoto H Ryuda M Tsuzuki S Hayakawa Y2010 Identification of a gene Desiccate contributing to desicca-tion resistance in Drosophila melanogaster J Biol Chem28538889ndash38897

Kellermann V van Heerwaarden B Sgro CM Hoffmann AA 2009Fundamental evolutionary limits in ecological traits driveDrosophila species distributions Science 3251244ndash1246

Kofler R Nolte V Schleurootterer C 2016 The impact of library preparationprotocols on the consistency of allele frequency estimates in Pool-Seq data Mol Ecol Resour 16118ndash122

Kofler R Pandey RV Schleurootterer C 2011 PoPoolation2 identifying dif-ferentiation between populations using sequencing of pooled DNAsamples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schleurootterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics282084ndash2085

Leiserson MDM Eldridge JV Ramachandran S Raphael BJ 2013Network analysis of GWAS data Curr Opin Genet Dev 23602ndash610

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

14

Liu Y Luo J Carlsson MA Neuroassel DR 2015 Serotonin and insulin-likepeptides modulate leucokinin-producing neurons that affectfeeding and water homeostasis in Drosophila J Comp Neurol5231840ndash1863

Lohse M Bolger AM Nagel A Fernie AR Lunn JE Stitt M Usadel B 2012RobiNA a user-friendly integrated software solution for RNA-Seq-based transcriptomics Nucleic Acids Res 40W622ndashW627

Long A Liti G Luptak A Tenaillon O 2015 Elucidating the moleculararchitecture of adaptation via evolve and resequence experimentsNat Rev Genet 16567ndash582

Lynch M Bost D Wilson S Maruki T Harrison S 2014 Population-genetic inference from pooled-sequencing data Genome Biol Evol61210ndash1218

Mackay TFC Stone EA Ayroles JF 2009 The genetics of quantitativetraits challenges and prospects Nat Rev Genet 10565ndash577

Marron MT Markow TA Kain KJ Gibbs AG 2003 Effects of starvationand desiccation on energy metabolism in desert and mesicDrosophila J Insect Physiol 49261ndash270

Matzkin LM Markow TA 2009 Transcriptional regulation of metabo-lism associated with the increased desiccation resistance of thecactophilic Drosophila mojavensis Genetics 1821279ndash1288

Murali T Pacifico S Yu J Guest S Roberts GG Finley RL Jr 2011 DroID2011 a comprehensive integrated resource for protein transcrip-tion factor RNA and gene interactions for Drosophila Nucleic AcidsRes 39D736ndashD743

Nuzhdin SV Turner TL 2013 Promises and limitations of hitchhikingmapping Curr Opin Genet Dev 23694ndash699

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Picard Picard A set of Java command line tools for manipulating high-throughput seqeuncing (HTS) data and formats Available fromhttpbroadinstitutegithubiopicard

Qiu Y Tittiger C Wicker-Thomas C Le Goff G Young S Wajnberg EFricaux T Taquet N Blomquist GJ Feyereisen R 2012 An insect-specific P450 oxidative decarbonylase for cuticular hydrocarbon bio-synthesis Proc Natl Acad Sci U S A 10914858ndash14863

Rajpurohit S Oliveira CC Etges WJ Gibbs AG 2013 Functional genomicand phenotypic responses to desiccation in natural populations of adesert Drosophilid Mol Ecol 222698ndash2715

Sarup P Soslashrensen JG Kristensen TN Hoffmann AA Loeschcke V PaigeKN Soslashrensen P 2011 Candidate genes detected in transcriptomestudies are strongly dependent on genetic background PLoS One6e15644

Schiffer M Hangartner S Hoffmann AA 2013 Assessing the relativeimportance of environmental effects carry-over effects and speciesdifferences in thermal stress resistance a comparison ofDrosophilids across field and laboratory generations J Exp Biol2163790ndash3798

Schleurootterer C Kofler R Versace E Tobler R Franssen SU 2015Combining experimental evolution with next-generation sequen-cing A powerful tool to study adaptation from standing geneticvariation Heredity 114431ndash440

Sinclair BJ Gibbs AG Roberts SP 2007 Gene transcription during ex-posure to and recovery from cold and desiccation stress inDrosophila melanogaster Insect Mol Biol 16435ndash443

Sloggett C Wakefield M Philip G Pope B 2014 Rubramdashflexible distrib-uted pipelines figshare Available from httpdxdoiorg106084m9figshare895626

Smit AFA Hubley R Green P 2013-2015 RepeatMasker Open-403Available from httpwwwrepeatmaskerorg

Seurooderberg JA Birse RT Neuroassel DR 2011 Insulin production and signalingin renal tubules of Drosophila is under control of tachykinin-relatedpeptide and regulates stress resistance PLoS One 6e19866

Soslashrensen JG Nielsen MM Loeschcke V 2007 Gene expression profileanalysis of Drosophila melanogaster selected for resistance to envi-ronmental stressors J Evol Biol 201624ndash1636

St Pierre SE Ponting L Stefancsik R McQuilton P 2014 FlyBase102mdashadvanced approaches to interrogating FlyBase Nucleic AcidsRes 42D780ndashD788

Stevenson M Nunes T Heuser C Marshall J Sanchez J Thornton RReiczigel J Robison-Cox J Sebastiani P Solymos P et al 2015 epiRTools for the analysis of epidemiological data R package version09-62 Available from httpCRANR-projectorgpackagefrac14epiR

Storey JD Tibshirani R 2003 Statistical significance for genomewidestudies Proc Natl Acad Sci U S A 1009440ndash9445

Telonis-Scott M Gane M DeGaris S Sgro CM Hoffmann AA 2012 Highresolution mapping of candidate alleles for desiccation resistancein Drosophila melanogaster under selection Mol Biol Evol291335ndash1351

Telonis-Scott M Guthridge KM Hoffmann AA 2006 A new set oflaboratory-selected Drosophila melanogaster lines for the analysisof desiccation resistance response to selection physiology and cor-related responses J Exp Biol 2091837ndash1847

Terhzaz S Cabrero P Robben JH Radford JC Hudson BD Milligan GDow JA Davies SA 2012 Mechanism and function of Drosophilacapa GPCR a desiccation stress-responsive receptor with functionalhomology to human neuromedinU receptor PLoS One 7e29897

Terhzaz S Overend G Sebastian S Dow JA Davies SA 2014 The Dmelanogaster capa-1 neuropeptide activates renal NF-kB signalingPeptides 53218ndash224

Terhzaz S Teets NM Cabrero P Henderson L Ritchie MG Nachman RJDow JA Denlinger DL Davies SA 2015 Insect capa neuropeptidesimpact desiccation and cold tolerance Proc Natl Acad Sci U S A1122882ndash2887

Thorat LJ Gaikwad SM Nath BB 2012 Trehalose as an indicator ofdesiccation stress in Drosophila melanogaster larvae a potentialmarker of anhydrobiosis Biochem Biophys Res Commun419638ndash642

Tobler R Franssen SU Kofler R Orozco-terWengel P Nolte VHermisson J Schl eurootterer C 2014 Massive habitat-specific ge-nomic response in D melanogaster populations during exper-imental evolution in hot and cold environments Mol Biol Evol31364ndash375

Turner TL Bourne EC Von Wettberg EJ Hu TT Nuzhdin SV 2010Population resequencing reveals local adaptation of Arabidopsislyrata to serpentine soils Nat Genet 42260ndash263

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Vermeulen CJ Soslashrensen P Kirilova Gagalova K Loeschcke V 2013Transcriptomic analysis of inbreeding depression in cold-sensitiveDrosophila melanogaster shows upregulation of the immune re-sponse J Evol Biol 261890ndash1902

Wang B Goode J Best J Meltzer J Schilman PE Chen J Garza D ThomasJB Montminy M 2008 The insulin-regulated CREB coactivatorTORC promotes stress resistance in Drosophila Cell Metab7434ndash444

Wang J Kean L Yang J Allan AK Davies SA Herzyk P Dow JA 2004Function-informed transcriptome analysis of Drosophila renaltubule Genome Biol 5R69

Wang K Li M Hakonarson H 2010 Analysing biological pathways ingenome-wide association studies Nat Rev Genet 11843ndash854

Wang L Jia P Wolfinger RD Chen X Zhao Z 2011 Gene set analysis ofgenome-wide association studies methodological issues and per-spectives Genomics 981ndash8

Weber AL Khan GF Magwire MM Tabor CL Mackay TF Anholt RR2012 Genome-wide association analysis of oxidative stress resistancein Drosophila melanogaster PLoS One 7e34745

Weeks AR McKechnie SW Hoffmann AA 2002 Dissecting adaptiveclinal variation markers inversions and sizestress associations inDrosophila melanogaster from a central field population Ecol Lett5756ndash763

Zhu Y Bergland AO Gonzalez J Petrov DA 2012 Empirical validation ofpooled whole genome population re-sequencing in Drosophila mel-anogaster PLoS One 7e41901

15

Page 4: Cross-Study Comparison Reveals Common Genomic, Network, … · 2016-02-06 · resistance using a high-throughput sequencing Pool-GWAS (genome-wide association study) approach ( Bastide

the largest shifts tended to occur in common intermediatefrequencies (fig 1B)

Inversion Frequencies

To test whether inversions were associated with desiccationresistance allele frequencies were checked at SNPs diagnosticfor the following inversions In(3R)P (Anderson et al 2005Kapun et al 2014) In(2L)t (Andolfatto et al 1999 Kapun et al2014) In(2R)NS In(3L)P In(3R)C In(3R)K and In(3R)Mo(Kapun et al 2014) In(3L)P In(3R)K and In(3R)Mo werenot detected in this population and for all other inversionsinvestigated the control and desiccation-resistant pool fre-quencies did not differ more than 6

Genomic Distribution

We examined these candidates using a comprehensive ap-proach encompassing SNP gene functional ontology andnetwork levels First we examined SNP locations with respectto gene structure and putative effects of SNPs on genomefeatures The 648 SNPs mapped to 382 genes and were largelynoncoding (79 supplementary table S2 SupplementaryMaterial online) Coding variants comprised nearly 15 ofcandidate SNPs largely synonymous substitutions and theremainder nonsynonymous missense variants predicted toresult in amino acid substitution with length preservation(9 and 14 supplementary table S2 SupplementaryMaterial online) Based on proportions of features from allannotated SNPs compared with the candidates intron vari-ants were significantly underrepresented (X2

1 frac14706 Plt 001supplementary table S2 Supplementary Material online)while SNPs were enriched in 50UTRs 50UTR premature startcodon sites and splice regions (X2

1 frac14 820 Plt 001X2

1 frac14 3124 Plt 00001 X21 frac14 640 Plt 005 respectively

supplementary table S2 Supplementary Material online)

GO Functional Enrichment

For the GO analyses we found no overrepresented termsusing the stringent Gowinda gene mode but identified sev-eral categories using SNP mode with a 005 FDR thresholdGO terms were enriched for the broad categories of ldquochro-matinrdquo ldquochromosomerdquo and more specifically ldquodouble-stranded RNA bindingrdquo although significance of this categorywas due solely to nine differentiated SNPs in the DIP1 geneThe term ldquogamma-secretase complexrdquo was also enriched dueto three SNPs in the pen-2 gene

Desiccation Candidate Genes

Chown et al (2011) comprehensively summarized the pri-mary mechanisms underpinning desiccation resistancefrom the critical first phase of stress sensing and behavioralavoidance to the manifold physiological trajectories availableto counter water stress These include water content varia-tion water loss rates (desiccation resistance) and tolerance ofwater loss (dehydration tolerance) We expanded on thisreview to incorporate recent molecular research and summa-rized the primary mechanisms and known candidate genesrelevant to D melanogaster stress sensing and water balance

(table 1 and references therein) Informed by this body ofwork we applied a ldquotop-downrdquo mechanistic approach toscreening desiccation candidate genes in our data (egfrom initial hygrosensing to downstream key physiologicalpathways table 1) We did not observe differentiation inSNPs mapping to the TRP ion channels the key hygrosensingreceptor genes However we did map SNPs to genes involvedin stress sensing specific to the MT fluid secretion signalingpathways including two of the six cAMP and cyclic guanosinemonophosphate (cGMP) signaling phosphodiesterases dncand pde9 as well as klu indicated in NF-kB ortholog signalingin the renal tubules (table 1) A number of the stress-sensingSAPK pathway genes were differentiated (table 1) Key genesinvolved in the insulin signaling pathway implicated in met-abolic homeostasis and water balance were also differenti-ated including the insulin receptor InR and insulin receptorbinding Dok (table 1)

Network Analysis Modeling Genomic Differentiationfor Desiccation in a PPI Context

Given the genomic complexity underpinning desiccation re-sistance we also implemented a higher-order analysis to pro-vide an overview of potential architectures beyond the SNPand gene levels using PPI networks The SNPs mapped to 382genes which were annotated to 307 seed proteins occurring inthe full background D melanogaster PPI network The seedproteins produced a large first-order network with 3324nodes and 51299 edges

To determine if broader functional signatures could beascertained from the complex network we employed func-tional enrichment analyses on the 3324 nodes (table 2 andsupplementary table S3 Supplementary Material online)Strikingly the network was enriched for pathways involvedin gene expression from transcription through RNA process-ing to RNA transport metabolism and decay (table 2)The network analysis also revealed further complexity ofthe stress response with enrichment of proteins involved inregulating innate insect immunity (table 2) where multiplepathways were overrepresented including cytokine interleu-kin and toll-like receptor signaling cascades (table 2)Developmentalgene regulation signaling pathways were sig-nificant in the pathway analysis the KEGG database showedenrichment for Notch and Mitogen-activated protein kinase(MAPK) signaling while insulin and NGF pathways were high-lighted from the Reactome database (table 2)

Common Signatures of Desiccation Resistance acrossStudies

The key finding of this study is the degree of genomic overlapwith our previous mapping of alleles associated with artifi-cially selected desiccation phenotypes (Telonis-Scott et al2012) Although the flies were sampled from comparablesoutheastern Australian locations the study designs were dif-ferent Specifically the earlier study focused on resistancegenerated by artificial selection whereas this study focusedon natural variation in resistance Despite the differences instudy design we identified 45 genes common to both studies

4

Table 1 Summary of Primary Molecular Responses and Candidate Genes for Desiccation Response Mechanisms Identified in the Literature

Mechanism Molecular Function Genes Current Study(F1 wild-caught)

Telonis-Scott et al(2012) (F8 artificial

selection)

Hygrosensing Sensory moisture receptors (antennae legs gut mouthparts)

Temperature-activatedtransient receptor po-tential ionchannelsbcd

TRP ion channels trpl trp Trpg TrpA1 TrpmTrpml pain pyx wtrwnompC iav nan Amo

mdash trpl

Stress sensing Malpighian tubules (principal cells)

Tubule fluid secretion signaling pathways

cAMP signalingefg cAMP activation Dh44-R2 mdash mdash

cGMP signalingefg cGMP activation Capa CapaRcedilNplp1-4 mdash mdash

Receptors Gyc76c CG33958 CG34357 mdash CG34357

Kinases dg1-2 mdash mdash

cAMP + cGMPsignalingefgh

Hydrolyzing phosphodiesterases dnc pde1c pde6 pde9 pde11 dnc pde9 dnc pde1c pde9 pde11

Calcium signalingeg Calcium-sensitive nitric oxidesynthase

Nos mdash Nos

Desiccation specific Diuretic neuropeptide NFkBsignalingijk

Capa CapaR trpl Relish klu klu klu trpl

Anti-diureticneuropeptidel ITP sNPF Tk mdash ITP sNPF

Stress sensing Stress responsive pathway

Stress activatedprotein kinasepathwaymn

MAPKJun kinase

Aop Aplip1 Atf-2 bsk Btk29Acbt Cdc 42 Cka cno CYLDDok egr Gadd45 hep hppyJra kay medo Mkk4 msnMtl Pak puc pyd Rac 1-2Rho1 shark slpr Src42ASrc64B Tak1 Traf6

Aop Dokhepcedilkaymsn Mtlpuc Rac1Src64B

Mtl slpr

Metabolic homeostasis and water balance

Components of insu-lin signaling pathway

Receptorop InR InR mdash

Receptor binding chico IIp1-8 dock Dok poly Dok Poly

Resistance mechanisms Water loss barriers

Cuticularhydrocarbons

Fatty acid synthasesq desatura-sesr transporters oxidativedecarbonylaset

CG3523 CG3524 desat1-2 FatPCyp4g1

CG3523 mdash

Primary hemolymph sugartissue-protectant

Trehalose metabolism Trehalose-phosphatasebu treh-lase activitybu phosphoglyc-erate mutase

Tps1 Treh Pgm CG5171CG5177 CG6262 crc

Treh Pgm mdash

NOTEmdashCandidate genes differentiated in the current study and in Telonis-Scott et al (2012) are shown and are underlined where overlappingaFor brevity several comprehensive reviews are cited and readers are referred to references therein Note that water budgeting via discontinuous gas exchange was not includeddue to a lack of ldquobona fiderdquo molecular candidates and desiccation tolerance due to aquaporins was omitted as genes were not differentiated in either of our studiesbChown et al (2011)cJohnson and Carder (2012)dFowler and Montell (2013)eDay et al (2005)fDavies et al (2012)gDavies et al (2013)hDavies et al (2014)iTerhzaz et al (2012)jTerhzaz et al (2014)kTerhzaz et al (2015)lKahsai et al (2010)mHuang and Tunnacliffe (2004)nHuang and Tunnacliffe (2005)oSeurooderberg et al (2011)pLiu et al (2015)qChung et al (2014)rSinclair et al (2007)rFoley and Telonis-Scott (2011)tQiu et al (2012)uThorat et al (2012)

5

Conserved Genomic Signatures of Drosophila Desiccation Resistance doi101093molbevmsv349 MBE by guest on February 6 2016

httpmbeoxfordjournalsorg

Dow

nloaded from

out of the 382 and 416 genes for the current and 2012 studyrespectively (Fisherrsquos exact test for overlap P lt 2 1016odds ratio frac14 590 supplementary table S4 SupplementaryMaterial online) Bearing gene length bias in mind we exam-ined the average length of genes in both studies as well as thegene overlap to the average gene length in the Drosophilagenome (supplementary fig S3 Supplementary Materialonline) Although there was bias toward longer genes in theoverlap list this was also apparent in the full gene lists fromboth studies

The overlap genes mapped to all of the major chromo-somes and their differentiated variants were mostly noncod-ing SNPs located in introns (supplementary table S4Supplementary Material online) Two SNPs were synonymous(Strn-Mlick coding for a protein kinase and Cht7 chitinbinding supplementary table S4 Supplementary Materialonline) while a 30UTR variant was predicted to putativelyalter protein structure in a nondisruptive manner mostlikely impacting expression andor protein translational effi-ciency (jbug response to stimulus supplementary table S4Supplementary Material online) One nonsynonymous SNPwas predicted to alter protein function as a missense variant(fab1 signaling supplementary table S4 SupplementaryMaterial online) The increase in frequency of the favoreddesiccation alleles compared with the controls ranged from5 to 27 with a median increase of 164

Despite the strong overlap there were important differ-ences between the two studies For example our previouswork revealed evidence of a selective sweep in the 50 pro-moter region of the Dys gene in the artificially selected linesand we confirmed a higher frequency of the selected SNPs inan independent natural association study in resistant fliesfrom Coffs Harbour (Telonis-Scott et al 2012) In this studywe found no differentiation at the Dys locus between ourresistant and random controls However both resistant andcontrol pools had a high frequency of the selected allele de-tected in Telonis-Scott et al (2012) (supplementary table S5Supplementary Material online)

The 45 genes common to both studies were analyzed forbiological function while many genes are obviously pleiotro-pic (ie assigned multiple GO terms) almost half were in-volved in response to stimuli (cellular behavioral andregulatory) including signaling such as Rho protein signaltransduction and cAMP-mediated signaling (table 3) Othercategories of interest include spiracle morphogenesis sensoryorgan development including development of the compoundeye and stem cell and neuroblast differentiation (table 3)

With regard to specific stress responsedesiccation candi-dates (table 1) 4 of the 45 genes overlapped including thecAMPcGMP signaling phosphodiesterases pde9 and dncMAPK pathway genes Mtl and klu indicated in NF-kBorthologue signaling in the renal tubules (table 1)Furthermore there was overlap among mechanistic catego-ries where different genes in the same pathways weredetected For example additional key cAMPcGMP hydrolyz-ing phosphodiesterases pde1c and pde11 were significant inTelonis-Scott et al (2012) as well as slpr (MAPK pathway)and poly (insulin signaling) (table 1)

Table 2 Pathway Enrichment of the GWAS and Telonis-Scott et alFirst-Order PPI Networks Investigated Using FlyMine (FDR P lt 005)

Current Study GWAS Telonis-Scott et al(2012)

Database Pathway Pathway

KEGG Ribosome RibosomeNotch signaling pathway Notch signaling pathwayMAPK signaling pathwayDorso-ventral axis formationProgesterone-mediated oocyte

maturationWnt signaling pathwayEndocytosisProteasome

Reactome Immune system Immune systemAdaptive immune system Adaptive immune systemInnate immune system Innate immune systemSignaling by interleukins Signaling by interleukins(Toll-like receptor cascades) Toll-like receptor

cascadesTCR signaling(B cell receptor signaling)

Gene expression Gene expressionMetabolism of RNA Metabolism of RNANonsense-mediated decay Nonsense-mediated decayTranscriptionmRNA processing(RNA transport)Metabolism of proteins (Metabolism of

proteins)Translation (Translation)Cell cycle Cell cycle

Cell cycle checkpointsRegulation of mitotic cell

cycleMitotic metaphase and

anaphaseG0 and early G1 G0 and early G1

G1 phase(G1S transition)G2M transition(DNA replicationS

phase)DNA repairDouble-strand break

repairSignaling pathways Signaling pathwaysInsulin signaling pathwaySignaling by NGF Signaling by NGF

Signaling by FGFRSignaling by NOTCHSignaling by WntSignaling by EGFRDevelopmentAxon guidanceMembrane traffickingGap junction trafficking

and regulationProgramed cell deathApoptosisHemostasisHemostasis

NOTEmdashKEGG and Reactome pathway databases were queried All significant termsare reported with Reactome terms grouped together under parent terms in italictext (full list of terms available in supplementary table S3 Supplementary Materialonline) Parent terms without brackets are the names of pathways that were sig-nificantly enriched parent terms in parentheses were not listed as significantlyenriched themselves but are reported as a guide to the category contents

6

In addition to the significant gene-level overlap betweenthe two studies we observed significant overlap between thePPI networks constructed from each set of candidate genes(summarized in fig 2) We examined the overlap by compar-ing the first-order networks 1) at the level of node overlap and2) at the level of edge (interaction) overlap Three hundredseven GWAS seeds formed a network of 3324 nodes whileour previous set of candidates mapped to 337 seeds forming anetwork that comprised 3215 nodes One thousand eighthundred twenty-six or ~55 of the nodes overlapped be-tween networks which was significantly higher than the nulldistribution estimated from 1000 simulations (mean simu-lated overlap SD 1304 143 nodes observed overlap atthe 999th percentile fig 2A and C) The overlap at the inter-action level was not significantly higher than expected (meansimulated overlap SD 25704 4024 edges observed over-lap 30686 edges 89th percentile fig 2B and D) Hive plotswere used to visualize the separate networks (supplementaryfig S4A and B Supplementary Material online) as well as theoverlap between the networks (supplementary fig S4CSupplementary Material online)

Given the high degree of shared nodes from distinct setsof protein seeds we compared the networks for biologicalfunction (fig 2E) Both networks produced long lists of sig-nificantly enriched GO terms (supplementary table S3Supplementary Material online) and assessing overlap was

difficult due to the nonindependent nature of GO hierar-chies However a comparison of functional pathway enrich-ment revealed some differences Although both networksshowed enrichment in the immune system pathways theribosome and Notch and NGF signaling pathways as well assome involvement of gene expression and translation path-ways the network constructed from the Telonis-Scott et al(2012) gene list showed extensive involvement of cell cyclepathway proteins as well as significant enrichment for Wntsignaling EGRF signaling DNA double-strand break repairaxon guidance gap junction trafficking and apoptosis path-ways which were not enriched in the GWAS network TheGWAS network was specifically enriched for dorso-ventralaxis formation oocyte maturation and many more geneexpression pathways than the Telonis-Scott et al (2012)gene network (table 2 and supplementary table S3Supplementary Material online)

Discussion

Genomic Signature of Desiccation Resistance fromNatural Standing Genetic Variation

We report the first large-scale genome-wide screen for nat-ural alleles associated with survival of low humidity condi-tions in D melanogaster By combining a powerful naturalassociation study with high-throughput Pool-seq we

Table 3 Overrepresentation of GO Categories for the 45 Core Candidate Genes Overlapping between the Current and Our Previous(Telonis-Scott et al 2012) Genomic Study

Term ID TermSubterms FDR P valuea Genes

Cellularbehavioral response to stimulus defense and signaling

GO0050896 Response to stimulus (cellular response tostimulus regulation of response tostimulus)

0007 Abd-B dnc fz mam nkd slo ct fru kluRbf klg santa-maria jbug fab1CG33275 Mtl Trim9 cv-c Gprk2 Ect4CG43658

GO0048585 Negative regulation of response to stimu-lus (Rho protein signal transductionsignal transduction)

0042 dnc mam slo fru Trim9 cv-c

GO0023052 Signaling (cAMP-mediated signaling cellcommunication)

0031 Amph Pde9 dnc fz mam nkd ct kluRbf santa-maria fab1 CG33275 MtlTrim9 cv-c Gprk2 Ect4 CG43658

Development

GO0048863 Stem cell differentiation 0014 Abd-D mam nkd klu nub

GO0014016 Neuroblast differentiation 0048

GO0045165 Cell fate commitment 0010 Abd-D dnc fz mam nkd ct klu klg nub

GO0035277 Spiracle morphogenesis open trachealsystem

0010 Abd-D ct cv-c

GO0007423 Sensory organ development (eye wingdisc imaginal disc)

0008 fz mam ct klu klg sdk Amph Nckx30cMtl

GO0048736 Appendage development 0020 fz mam ct fru CG33275 Mtl Fili nubcv-c Gprk2 CG43658

GO0007552 Metamorphosis 0001 fz mam ct fru CG33275 Mtl Fili cv-cGprk2 CG43658

Cellular component organization

GO0030030 Cell projection organization 0020 dnc fz ct fru GluClalpha Nckx30c MtlTrim9 nub cv-c

NOTEmdashNote that most genes are assigned to multiple annotations and therefore are shown purely as a guide to putative functionaBonferronindashHochberg FDR and gene length corrected P values using the Flymine v400 database Gene Ontology Enrichment The terms were condensed and trimmed usingREVIGO

7

mapped SNPs associated with desiccation resistance in 500naturally derived genotypes Although Pool-seq has limitedpower to reveal very low frequency alleles to reconstructphase or adequately estimate allele frequencies in smallpools (Lynch et al 2014) the Pool-GWAS approach hasbeen validated by retrieval of known candidate genes in-volved in body pigmentation using similar numbers of ge-notypes as assessed here (Bastide et al 2013)

Although studies in controlled genetic backgrounds reportlarge effects of single genes on desiccation resistance (Table 1)our current and previous screens revealed similarly largenumbers of variants (over 600) mapping to over 300 genesA polygenic basis of desiccation resistance is consistent withquantitative genetic data for the trait (Foley and Telonis-Scott2011) and we also anticipate more complexity from naturalD melanogaster populations with greater standing geneticvariation than inbred laboratory strains However somefalse positive associations are likely in the GWAS due to thelimitations of quantitative trait mapping approaches Pooledevolve-and-resequence experiments routinely yield vast num-bers of candidate SNPs partly due to high levels of standinglinkage disequilibrium (LD) and hitchhiking neutral alleles

(Schleurootterer et al 2015) Large population sizes from specieswith low levels of LD such as D melanogaster somewhat cir-cumvent this issue and a notable feature of our experimentaldesign is our sampling Here we selected the most desicca-tion-resistant phenotypes directly from substantial standinggenetic variation where one generation of recombinationshould largely reflect natural haplotypes

We observed the largest candidate allele frequency differ-ences in loci that had intermediate allele frequencies in thecontrol pool This is in line with population genetics modelswhich predict that evolution from standing genetic variationwill initially be strongest for intermediate frequency alleleswhich can facilitate rapid change (Long et al 2015) Howeverwe also observed a number of low frequency alleles which ifbeneficial can inflate false positives due to long-range LD(Orozco-terWengel et al 2012 Nuzhdin and Turner 2013Tobler et al 2014) Nonetheless spurious LD associationscannot fully account for our candidate list as evidenced bythe desiccation functional candidates and cross-study over-lap Furthermore chromosome inversion dynamics can alsogenerate both short and long-range LD and contribute nu-merous false positive SNP phenotype associations (Hoffmann

Fig 2 Observed versus simulated network overlap between the GWAS first-order PPI network and the first-order PPI network from the Telonis-Scottet al (2012) gene list Overlap is presented in terms of (A) node number and (B) edge number Networks were simulated by resampling from the entireDrosophila melanogaster gene list (r553) as described in the text The histogram shows the distribution of overlap measures for 1000 simulations Theblack line represents the histogram as density and the blue line shows the corresponding normal distribution The red vertical line shows the observedoverlap from the real networks labeled as a percentile of the normal distribution Area-proportional Venn diagrams summarized the extent of overlapfor PPI networks constructed separately from the GWAS candidates and Telonis-Scott et al (2012) candidates (labeled ldquoprevious studyrdquo) (C) Overlapexpressed as the number of nodes Approximately 55 of the nodes overlapped between the two studies which was significantly higher than simulatedexpectations (999 percentile 2B) (D) Overlap expressed in the number of interactions This was not significantly higher than expected by simulation(89th percentile 2B) (E) Overlap at the level of GO biological process terms (directly compared GO terms by name this was not tested statisticallybecause of the complex hierarchical nature of GO terms and is presented for interest)

8

and Weeks 2007 Tobler et al 2014) but we found little as-sociation between the desiccation pool and inversion fre-quency changes consistent with the lack of latitudinalpatterns in desiccation resistanceand traitinversion fre-quency associations in east Australian D melanogaster(Hoffmann et al 2001 Weeks et al 2002)

Cross-Study Comparison Reveals a Core Set ofCandidates for a Complex Trait

Our data contain many candidate genes functions and path-ways for insect desiccation resistance but partitioning genu-ine adaptive loci from experimental noise remains asignificant challenge We therefore focus our efforts on ournovel comparative analysis where we identified 45 robustcandidates for further exploration Common genetic signa-tures are rarely documented in cross-study comparisons ofcomplex traits (Sarup et al 2011 Huang et al 2012 but cfVermeulen et al 2013) and never before for desiccation re-sistance Some degree of overlap between our studies likelyresults from a shared genetic background from similar sam-pling sites highlighting the replicability of these responsesOne limitation of this approach is the overpopulation ofthe ldquocorerdquo list by longer genes This result reflects the genelength bias inherent to GWAS approaches an issue that is notstraightforward to correct in a GWAS let alone in the com-parative framework applied here A focus on the commongenes functionally linked to the desiccation phenotype canprovide a meaningful framework for future research and alsosuggests that some large-effect candidate genes exhibit natu-ral variation contributing to the phenotype (table 1)

Other feasible candidates include genes expressed inthe MTs (Wang et al 2004) or essential for MT develop-ment (St Pierre et al 2014) providing further researchopportunity focused on the primary stress sensing andfluid secretion tissues Genes include one of the mosthighly expressed MT transcripts CG7084 as well as dncNckx30C slo cv-c and nkd Other aspects of stress resis-tance are also implicated CrebB for example is involvedin regulating insulin-regulated stress resistance and ther-mosensory behavior (Wang et al 2008) nkd also impactslarval cuticle development and shares a PDZ signalingdomain also present in the desiccation candidate desi(Kawano et al 2010 St Pierre et al 2014) Finally severalcore candidates were consistently reported in recent stud-ies suggesting generalized roles in stress responses andfitness including transcriptome studies on MT function(Wang et al 2004) inbreeding and cold resistance(Vermeulen et al 2013) oxidative stress (Weber et al2012) and genomic studies exploring age-specific SNPson fitness traits (Durham et al 2014) and genes likelyunder spatial selection in flies from climatically divergenthabitats (Fabian et al 2012)

Several of the enriched GO categories from the core listalso make biological sense including stimulus response(almost half the suite) defense and signaling particularlyin pathways involved in MT stress sensing (Davies et al 20122013 2014) Functional categories enriched for sensory

organ development were highlighted consistent with envi-ronmental sensing categories from desert Drosophila tran-scriptomics (Matzkin and Markow 2009 Rajpurohit et al2013) and stress detection via phototransduction pathwaysin D melanogaster under a range of stressors including des-iccation (Soslashrensen et al 2007)

Evidence for Conserved Signatures at Higher LevelOrganization

Beyond the gene level we observed a striking degree ofsimilarity between this study and that of Telonis-Scottet al (2012) at the network level where significantly over-lapping protein networks were constructed from largely dif-ferent seed protein sets This supports a scenario where theldquobiological information flow from DNA to phenotyperdquo(Civelek and Lusis 2014) contains inherent redundancywhere alternative genetic solutions underlie phenotypesand functions Resistance to perturbation by genetic or en-vironmental variation appears to increase with increasinghierarchical complexity that is phenotype ldquobufferingrdquo (Fuet al 2009) The same functional network in different pop-ulationsgenetic backgrounds can be impacted but theexact genes and SNPs responding to perturbation will notnecessarily overlap In Drosophila different sets of loci fromthe same founders mapped to the same PPI network in thecase of two complex traits chill coma recovery and startleresponse (Huang et al 2012) Furthermore interspecificshared networkspathways from different genes have beendocumented for numerous complex traits (Emilsson et al2008 Aytes et al 2014 Ichihashi et al 2014) providing op-portunities for investigating taxa where individual genefunction is poorly understood

Although the null distribution is not always obviouswhen comparing networks and their functions partly be-cause of reduced sensitivity from missing interactions(Leiserson et al 2013) here the shared network signaturesportray a plausible multifaceted stress response involvingstress sensing rapid gene accessibility and expression Thepredominance of immune system and stress responsepathways are of interest with reference immunitystresspathway ldquocross-talkrdquo particularly in the MTs whereabiotic stressors elicit strong immunity transcriptionalresponses (Davies et al 2012) The higher order architec-tures reflect crucial aspects of rapid gene expression con-trol in response to stress from chromatin organization(Alexander and Lomvardas 2014) to transcription RNAdecay and translation (reviwed in de Nadal et al 2011) InDrosophila variation in transcriptional regulation of indi-vidual genes can underpin divergent desiccation pheno-types (Chung et al 2014) or be functionally inferred frompatterns of many genes elicited during desiccation stress(Rajpurohit et al 2013) Variants with potential to mod-ulate expression at different stages of gene expressioncontrol were evident in our GWAS data alone for exam-ple enrichment for 50UTR and splice variants at the SNPlevel and for chromatin and chromosome structure termsat the functional level

9

Genetic Overlap Versus Genetic differencesmdashCausesand Implications

Despite the common signatures at the gene and higher orderlevels between our studies pervasive differences were ob-served These can arise from differences in standing geneticvariation in the natural populations where different alleliccompositions impact gene-by-environment interactions andepistasis Our network overlap analyses highlight the potentialfor different molecular trajectories to result in similar func-tions (Leiserson et al 2013) consistent with the multiplephysiological pathways potentially available to D melanoga-ster under water stress (Chown et al 2011) Experimentaldesign also presumably contributed to the study differencesStrong directional selection as applied in the 2012 study canincrease the frequency of rare beneficial alleles and likely re-duced overall diversity than did the single-generation GWASdesign Furthermore selection regimes often impose addi-tional selective pressures (eg food deprivation during desic-cation) resulting in correlated responses such as starvationresistance with desiccation resistance and these factors differfrom the laboratory to the field Finally evidence suggests innatural D melanogaster that standing genetic variation canvary extensively with sampling season (Itoh et al 2010Bergland et al 2014) Although we cannot compare levelsof standing variation between our two studies evidence sug-gests that this may have affected the Dys gene where ourGWAS approach revealed no differentiation between thecontrol and resistant pools at this locus but rather detectedthe alleles associated with the selective sweep in Telonis-Scottet al (2012) in both pools at high frequency Whether thisresulted from sampling the later collection after a longdrought (2004ndash2008 wwwbomgovauclimate last accessedJanuary 13 2016) is unclear but this observation highlightsthe relevance of temporal sampling to standing genetic var-iation when attempting to link complex phenotypes togenotypes

Materials and Methods

Natural Population Sampling

Drosophila melanogaster was collected from vineyard waste atKilchorn Winery Romsey Australia in April 2010 Over 1200isofemale lines were established in the laboratory from wild-caught females Species identification was conducted on maleF1 to remove Drosophila simulans contamination and over900 D melanogaster isofemale lines were retained All flieswere maintained in vials of dextrose cornmeal media at25 C and constant light

Desiccation Assay

From each isofemale line ten inseminated F1 females weresorted by aspiration without CO2 and held in vials until 4ndash5days old For the desiccation assay 10 females per line (over9000 flies in total) were screened by placing groups of femalesinto empty vials covered with gauze and randomly assigninglines into 5 large sealed glass tanks containing trays of silicadesiccant (relative humidity [RH] lt10) The tanks werescored for mortality hourly and the final 558 surviving

individuals were selected for the desiccation-resistant pool(558 flies lt6 of the total) The control pool constitutedthe same number of females sampled from over 200 ran-domly chosen isofemale lines that were frozen prior to thedesiccation assay We chose this random control approachinstead of comparing the phenotypic extremes (ie the latemortality tail vs the ldquoearly mortalityrdquo tail) because unfavor-able environmental conditions experienced by the wild-caught dam may inflate environmental variance due to car-ryover effects in the F1 offspring (Schiffer et al 2013)

DNA Extraction and Sequencing

We used pooled whole-genome sequencing (Pool-seq)(Futschik and Schleurootterer 2010) to estimate and compareallele frequency differences between the most desiccation-resistant and randomly sampled flies from a wild populationto associate naturally segregating candidate SNPs with theresponse to low humidity For each treatment (resistantand control) genomic DNA was extracted from 50 femaleheads using the Qiagen DNeasy Blood and Tissue Kit accord-ing to the manufacturerrsquos instructions (Qiagen HildenGermany) Two extractions were combined to create fivepools per treatment which were barcoded separately tocreate technical replicates (total 10 pools of 100 fliesn frac14 1000) The DNA was fragmented using a CovarisS2 ma-chine (Covaris Inc Woburn MA) and libraries were pre-pared from 1 mg genomic DNA (per pool of 100 flies) usingthe Illumina TruSeq Prep module (Illumina San Diego CA)To minimize the effect of variation across lanes the ten poolswere combined in equal concentration (fig 3) and run mul-tiplexed in each lane of a full flow cell of an Illumina HiSeq2000 sequencer Clusters were generated using the TruSeq PECluster Kit v5 on a cBot and sequenced using Illumina TruSeqSBS v3 chemistry (Illumina) Fragmentation library construc-tion and sequencing were performed at the Micromon Next-Gen Sequencing facility (Monash University ClaytonAustralia)

Fig 3 Experimental design See text for further explanation

10

Data Processing and Mapping

Processing was performed on a high-performance computingcluster using the Rubra pipeline system that makes use of theRuffus Python library (Goodstadt 2010) The final variant-calling pipeline was based on a pipeline developed by ClareSloggett (Sloggett et al 2014) and is publicly available athttpsgithubcomgriffinpGWAS_pipeline (last accessedJanuary 13 2016) Raw sequence reads were processed persample per lane with P1 and P2 read files processed sepa-rately initially Trimming was performed with trimmomatic v030 (Lohse et al 2012) Adapter sequences were also re-moved Leading and trailing bases with a quality scorebelow 30 were trimmed and a sliding window (width frac14 10quality threshold frac14 25) was used Reads shorter than 40 bpafter trimming were discarded Quality before and after trim-ming was examined using FastQC v 0101 (Andrews 2014)

Posttrimming reads that remained paired were used asinput into the alignment step Alignment to the D melano-gaster reference genome version r553 was performed withbwa v 062 (Li and Durbin 2009) using the bwa aln commandand the following options No seeding (-l 150) 1 rate ofmissing alignments given 2 uniform base error rate (-n001) a maximum of 2 gap opens per read (-o 2) a maximumof 12 gap extensions per read (-e 12) and long deletionswithin 12 bp of 30 end disallowed (-d 12) The default valueswere used for all other options Output files were then con-verted to sorted bam files using the bwa sampe commandand the ldquoSortSamrdquo option in Picard v 196

At this point the bam files were merged into one file persample using PicardMerge Duplicates were identified withPicard MarkDuplicates The tool RealignerTargetCreator inGATK v 26-5 (DePristo et al 2011) was applied to the tenbam files simultaneously producing a list of intervals contain-ing probable indels around which local realignment would beperformed This was used as input for the IndelRealigner toolwhich was also applied to all ten bam files simultaneously

To investigate the variation among technical replicatesallele frequency was estimated based on read counts foreach technical replicate separately for 100000 randomly cho-sen SNPs across the genome After excluding candidate des-iccation-resistance SNPs and SNPs that may have been falsepositives due to sequencing error (those with mean frequencylt004 across replicates) 1803 SNPs remained For each locuswe calculated the variance in allele frequency estimate amongthe five control replicates and the variance among the fivedesiccation-resistance replicates We compared this with thevariance due to pool category (control vs desiccation resis-tant) using analysis of variance (ANOVA) We also calculatedthe mean pairwise difference in allele frequency estimateamong control and among desiccation-resistance replicatesand the mean concordance correlation coefficient was calcu-lated over all pairwise comparisons within pool category(using the epiccc function in the epiR package Stevensonet al 2015)

The following approach was then taken to identify SNPsdiffering between the desiccation-resistant and controlgroups Based on the technical replicate analysis we merged

the five ldquodesiccation-resistantrdquo and the five random controlsamples into two bam files A pileup file was created withsamtools v 0119 (Li et al 2009) and converted to a ldquosyncrdquo fileusing the mpileup2sync script in Popoolation2 v 1201 (Kofleret al 2011) retaining reads with a mapping quality of at least20 and bases with a minimum quality score of 20 Reads withunmapped mates were also retained (option ndashA in samtoolsmpileup) For this step we used a repeat-masked version ofthe reference genome created with RepeatMasker 403 (Smitet al 2013-2015) and a repeat file containing all annotatedtransposons from D melanogaster including shared ancestralsequences (Flybase release r557) plus all repetitive elementsfrom RepBase release v1904 for D melanogaster Simple re-peats small RNA genes and bacterial insertions were notmasked (options ndashnolow ndashnorna ndashno_is) The rmblastsearch engine was used

Association Analysis

We then performed Fisherrsquos exact test with the Fisher testscript in Popoolation2 (Kofler et al 2011) We set the min-imum count (of the minor allele) frac14 10 minimum coverage(in both populations) frac14 20 and maximum coverage frac1410000 The test was performed for each SNP (slidingwindow off) Although there are numerous approaches tocalculating genome-wide significance thresholds in aPoolseq framework currently there is no consensus on anappropriate standardized statistical method Methods in-clude simulation approaches (Orozco-terWengel et al2012 Bastide et al 2013 Burke et al 2014) correctionusing a null distribution based on estimating genetic driftfrom replicated artificial selection line allele frequency vari-ance (Turner and Miller 2012) use of a simple thresholdbased on a minimum allele frequency difference (Turneret al 2010) and the FDR correction suggested by Storeyand Tibshirani (2003) (used by Fabian et al 2012)

We determined based on our design to calculate the FDRthreshold by comparison with a simulated distribution of nullP values following the approach of Bastide et al (2013) Foreach SNP locus a P value was calculated by simulating thedistribution of read counts between the major and minorallele according to a beta-binomial distribution with meana(a+b) and variance (ab)((a+b)2(a+b+1)) Thismodel allowed for two types of sampling variationStochastic variation in allele sampling across the phenotypesand the background variation in the representation of allelesin each pool caused by library preparation artifacts or sto-chastic variation driven by unequal coverage of the poolsThese sources of variation have also been recognized else-where as being important in interpreting pooled sequenceallele calls and identifying significant differentiation betweenpooled samples (Lynch et al 2014) The value of afrac14 37 waschosen by chi-square test as the best match to the observedP value distribution (afrac14 10 to afrac14 40 were tested) with acorresponding bfrac14 4315 to reflect the coverage depth differ-ence between the random control (487) and desiccation-resistant (568) pools These values were then used tosimulate a null data set 10 the size of the observed data

11

set Because the null P value distribution still did not fit theobserved P value distribution particularly well (supplemen-tary fig S2 Supplementary Material online) these values werenot used for a standard FDR correction Instead a P valuethreshold was calculated based on the lowest 005 of thenull P values similar in approach to Orozco-terWengel et al(2012)

Inversion Analysis

In D melanogaster several cosmopolitan inversion polymor-phisms show cross-continent latitudinal patterns that explaina substantial proportion of clinal variation for many traitsincluding thermal tolerance (reviewed in Hoffmann andWeeks 2007) Patterns of disequilibria generated betweenloci in the vicinity of the rearrangement can obscure signalsof selection (Hoffmann and Weeks 2007) and inversion fre-quencies are routinely considered in genomic studies ofnatural populations sampled from known latitudinal clines(Fabian et al 2012 Kapun et al 2014) We assessedassociations between inversion frequencies and our controland resistant pools at SNPs diagnostic for the following in-versions In3R(Payne) (Anderson et al 2005 Kapun et al2014) In(2L)t (Andolfatto et al 1999 Kapun et al 2014)In(2R)Ns In(3L)P In(3R)C In(3R)K and In(3R)Mo (Kapunet al 2014) Between one and five diagnostic SNPs were in-vestigated for each inversion (depending on availability) Weconsidered an inversion to be associated with desiccationresistance if the frequency of its diagnostic SNP differed sig-nificantly between the control and desiccation-resistant poolsby a Fisherrsquos exact test

Genome Feature Annotation

The list of candidate SNPs was converted to vcf format using acustom R script setting the alternate allele as the allele thatincreased in frequency from the control to the desiccation-resistant pool and setting the reference allele as the otherallele present in cases where both the major and minor allelesdiffered from the reference genome The genomic features inwhich the candidate SNPs occurred were annotated usingsnpEff v40 first creating a database from the D melanogasterv553 genome annotation with the default approach(Cingolani et al 2012) The default annotation settings wereused except for setting the parameter ldquo-ud 0rdquo to annotateSNPs to genes only if they were within the gene body Using2 tests on SNP counts we tested for over- or under-enrichment of feature types by calculating the proportionof lsquoallrsquo features from all SNPs detected in the study and theproportion of ldquocandidaterdquo SNPs identified as differentiated indesiccation-resistant flies (Fabian et al 2012)

GO Analysis

GO analyses was used to associate the candidates with theirbiological functions (Ashburner et al 2000) Previously devel-oped for biological pathway analysis of global gene expressionstudies (Wang et al 2010) typical GO analysis assumes that allgenes are sampled independently with equal probabilityGWAS violate these criteria where SNPs are more likely to

occur in longer genes than in shorter genes resulting in highertype I error rates in GO categories overpopulated with longgenes Gene length as well as overlapping gene biases (Wanget al 2011) were accounted for using Gowinda (Kofler andSchleurootterer 2012) which implements a permutation strategyto reduce false positives Gowinda does not reconstruct LDbetween SNPs but operates in two modes underpinned bytwo extreme assumptions 1) Gene mode (assumes all SNPs ina gene are in LD) and 2) SNP mode (assumes all SNPs in agene are completely independent)

The candidates were examined in both modes but giventhe stringency of gene mode high-resolution SNP modeoutput was utilized for the final analysis with the followingparameters 106 simulations minimum significance frac141 andminimum genes per GO categoryfrac14 3 (to exclude gene-poorcategories) P values were corrected using the Gowinda FDRalgorithm GO annotations from Flybase r557 were applied

Candidate Gene Analysis

In conjunction with the GO analysis we also examinedknown candidate genes associated with survival under lowRH to better characterize possible biological functions of ourGWAS candidates We took a conservative approach to can-didate gene assignment where the SNPs were mapped withinthe entire gene region without up- and downstream exten-sions Genes were curated from FlyBase and the literature(table 1) predominantly using detailed functional experimen-tal evidence as criteria for functional desiccation candidates

Network Analysis

We next obtained a higher level summary of the candidategene list using PPI network analysis The full candidate genelist was used to construct a first-order interaction networkusing all Drosophila PPIs listed in the Drosophila InteractionDatabase v 2014_10 (avoiding interologs) (Murali et al 2011)This ldquofullrdquo PPI network includes manually curated data frompublished literature and experimental data for 9633 proteinsand 93799 interactions For the network construction theigraph R package was used to build a subgraph from the fullPPI network containing the candidate proteins and their first-order interacting proteins (Csardi and Nepusz 2006) Self-connections were removed Functional enrichment analysiswas conducted on the list of network genes using FlyMinev400 (wwwflymineorg last accessed January 13 2016) withthe following parameters for GO enrichment and KEGGReactome pathway enrichment BenjaminindashHochberg FDRcorrection P lt 005 background frac14 all D melanogastergenes in our full PPI network and Ontology (for GO enrich-ment only) frac14 biological process or molecular function

Cross-Study Comparison Overlap with Our PreviousWork

Following on from the comparison between our gene list andthe candidate genes suggested from the literature we quan-titatively evaluated the degree of overlap between the candi-date genes identified in this study with alleles mapped in ourprevious study of flies collected from the same geographic

12

region (Telonis-Scott et al 2012) The ldquooverlaprdquo analysis wasperformed similarly for the GWAS candidate gene and net-work-level approaches First the gene list overlap was testedbetween the two studies using Fisherrsquos exact test where the382 genes identified from the candidate SNPs (this study)were compared with 416 genes identified from candidatesingle feature polymorphisms (Telonis-Scott et al 2012)The contingency table was constructed using the numberof annotated genes in the D melanogaster R553 genomebuilt as an approximation for the total number of genestested in each study and took the following form

where A frac14 gene list from Telonis-Scott et al (2012)n frac14 416 B frac14 gene list from this study n frac14 382 N frac14 totalnumber of genes in D melanogaster reference genome r553n frac14 17106

To annotate genome features in the overlap candidategene list we used the SNP data (this study) which betterresolves alleles than array-based genotyping Due to this lim-itation we did not compare exact genomic coordinates be-tween the studies but considered overlap to occur whendifferentiated alleles were detected in the same gene inboth studies We did however examine the allele frequenciesaround the Dys-RF promoter which was sequenced sepa-rately revealing a selective sweep in the experimental evolu-tion lines (Telonis-Scott et al 2012)

The overlapping candidates were tested for functional en-richment using a basic gene list approach As genes ratherthan SNPs were analyzed here Fly Mine v400 (wwwflymineorg) was applied with the following parameters Gene lengthnormalization BenjaminindashHochberg FDR correction Plt 005Ontology frac14 biological process and default background (allGO-annotated D melanogaster genes)

Finally we constructed a second PPI network from thecandidate genes mapped in Telonis-Scott et al (2012) to ex-amine system-level overlap between the two studies The 416genes mapped to 337 protein seeds and the network wasconstructed as described above The degree of network over-lap between the two studies was tested using simulation Ineach iteration a gene set of the same length as the list fromthis study and a gene set of the same length as the list fromTelonis-Scott et al (2012) were resampled randomly from theentire D melanogaster mapped gene annotation (r553) Eachresampled set was then used to build a first-order network aspreviously described The overlap network between the tworesampled sets was calculated using the graphintersectioncommand from the igraph package and zero-degree nodes(proteins with no interactions) were removed Overlap wasquantified by counting the number of nodes and the numberof edges in this overlap network The observed overlap wascompared with a simulated null distribution for both nodeand edge number where n frac14 1000 simulation iterations

Supplementary MaterialSupplementary figures S1ndashS4 and tables S1ndashS5 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

We thank Robert Good for the fly sampling and JenniferShirriffs and Lea and Alan Rako for outstanding assistancewith laboratory culture and substantial desiccation assaysWe are grateful to Melissa Davis for advice on network build-ing and comparison approaches and to five anonymous re-viewers whose comments improved the manuscript Thiswork was supported by a DECRA fellowship DE120102575and Monash University Technology voucher scheme toMTS an ARC Future Fellowship and Discovery Grants toCMS an ARC Laureate Fellowship to AAH and a Scienceand Industry Endowment Fund grant to CMS and AAH

ReferencesAlexander JM Lomvardas S 2014 Nuclear architecture as an epigenetic

regulator of neural development and function Neuroscience26439ndash50

Anderson AR Hoffmann AA McKechnie SW Umina PA Weeks AR2005 The latitudinal cline in the In(3R)Payne inversion polymor-phism has shifted in the last 20 years in Australian Drosophilamelanogaster populations Mol Ecol 14851ndash858

Andolfatto P Wall JD Kreitman M 1999 Unusual haplotype structureat the proximal breakpoint of In(2L)t in a natural population ofDrosophila melanogaster Genetics 1531297ndash1311

Andrews S 2014 FastQC Available from httpwwwbioinformaticsbbsrcacukprojectsfastqc

Ashburner M Ball CA Blake JA Botstein D Butler H Cherry JM DavisAP Dolinski K Dwight SS Eppig JT et al 2000 Gene ontology toolfor the unification of biology The Gene Ontology Consortium NatGenet 2525ndash29

Aytes A Mitrofanova A Lefebvre C Alvarez Mariano J Castillo-MartinM Zheng T Eastham James A Gopalan A Pienta Kenneth J ShenMM et al 2014 Cross-species regulatory network analysis identifiesa synergistic interaction between FOXM1 and CENPF that drivesprostate cancer malignancy Cancer Cell 25638ndash651

Bastide H Betancourt A Nolte V Tobler R Steuroobe P Futschik ASchleurootterer C 2013 A genome-wide fine-scale map of natural pig-mentation variation in Drosophila melanogaster PLoS Genet9e1003534

Bergland AO Behrman EL OrsquoBrien KR Schmidt PS Petrov DA 2014Genomic evidence of rapid and stable adaptive oscillations overseasonal time scales in Drosophila PLoS Genet 10e1004775

Bonebrake TC Mastrandrea MD 2010 Tolerance adaptation and pre-cipitation changes complicate latitudinal patterns of climate changeimpacts Proc Natl Acad Sci U S A 10712581ndash12586

Bradley TJ 2009 Animal osmoregulation Oxford Oxford UniversityPress

Burke MK Liti G Long AD 2014 Standing genetic variation drivesrepeatable experimental evolution in outcrossing populations ofSaccharomyces cerevisiae Mol Biol Evol 313228ndash3239

Chown SL 2012 Trait-based approaches to conservation physiologyforecasting environmental change risks from the bottom upPhilos Trans R Soc B 3671615ndash1627

Chown SL Nicolson SW 2004 Insect physiological ecology mechanismsand patterns Oxford Oxford University Press

Chown SL Soslashrensen JG Terblanche JS 2011 Water loss in insectsan environmental change perspective J Insect Physiol571070ndash1084

Not in GWAS In GWAS

Not in Telonis-Scott et al (2012) N ndash intersect-AndashB A-intersect

In Telonis-Scott et al (2012) B-intersect Intersect(AB)

13

Chung H Loehlin DW Dufour HD Vaccarro K Millar JG Carroll SB2014 A single gene affects both ecological divergence and matechoice in Drosophila Science 3431148ndash1151

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly(Austin) 680ndash92

Civelek M Lusis AJ 2014 Systems genetics approaches to understandcomplex traits Nat Rev Genet 1534ndash48

Clusella-Trullas S Blackburn TM Chown SL 2011 Climatic predictors oftemperature performance curve parameters in ectotherms implycomplex responses to climate change Am Nat 177738ndash751

Csardi G Nepusz T 2006 The igraph software package for complexnetwork research InterJournal Complex Systems 1695 Availablefrom httpigraphorg

Davies SA Cabrero P Overend G Aitchison L Sebastian S Terhzaz SDow JA 2014 Cell signalling mechanisms for insect stress toleranceJ Exp Biol 217119ndash128

Davies SA Cabrero P Povsic M Johnston NR Terhzaz S Dow JA 2013Signaling by Drosophila capa neuropeptides Gen Comp Endocrinol18860ndash66

Davies SA Overend G Sebastian S Cundall M Cabrero P Dow JATerhzaz S 2012 Immune and stress response lsquocross-talkrsquo in theDrosophila Malpighian tubule J Insect Physiol 58488ndash497

Day JP Dow JA Houslay MD Davies SA 2005 Cyclic nucleotide phos-phodiesterases in Drosophila melanogaster Biochem J 388333ndash342

de Nadal E Ammerer G Posas F 2011 Controlling gene expression inresponse to stress Nat Rev Genet 12833ndash845

DePristo MA Banks E Poplin R Garimella KV Maguire JR Hartl CPhilippakis AA del Angel G Rivas MA Hanna M et al 2011 Aframework for variation discovery and genotyping using next-gen-eration DNA sequencing data Nat Genet 43491ndash498

Djawdan M Chippindale AK Rose MR Bradley TJ 1998 Metabolic re-serves and evolved stress resistance in Drosophila melanogasterPhysiol Zool 71584ndash594

Durham MF Magwire MM Stone EA Leips J 2014 Genome-wide anal-ysis in Drosophila reveals age-specific effects of SNPs on fitness traitsNat Commun 54338

Emilsson V Thorleifsson G Zhang B Leonardson AS Zink F Zhu JCarlson S Helgason A Walters GB Gunnarsdottir S et al 2008Genetics of gene expression and its effect on disease Nature452423ndash428

Fabian DK Kapun M Nolte V Kofler R Schmidt PS Schleurootterer C FlattT 2012 Genome-wide patterns of latitudinal differentiation amongpopulations of Drosophila melanogaster from North America MolEcol 214748ndash4769

Foley BR Telonis-Scott M 2011 Quantitative genetic analysis suggestscausal association between cuticular hydrocarbon composition anddesiccation survival in Drosophila melanogaster Heredity 10668ndash77

Folk DG Han C Bradley TJ 2001 Water acquisition and partitioning inDrosophila melanogaster effects of selection for desiccation-resis-tance J Exp Biol 2043323ndash3331

Fowler MA Montell C 2013 Drosophila TRP channels and animal be-havior Life Sci 92394ndash403

Fu J Keurentjes JJB Bouwmeester H America T Verstappen FWA WardJL Beale MH de Vos RCH Dijkstra M Scheltema RA et al 2009System-wide molecular evidence for phenotypic buffering inArabidopsis Nat Genet 41166ndash167

Futschik A Schleurootterer C 2010 The next generation of molecular mar-kers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Gibbs AG Fukuzato F Matzkin LM 2003 Evolution of water conserva-tion mechanisms in Drosophila J Exp Biol 2061183ndash1192

Gibbs AG Matzkin LM 2001 Evolution of water balance in the genusDrosophila J Exp Biol 2042331ndash2338

Goodstadt L 2010 Ruffus a lightweight Python library for computa-tional pipelines Bioinformatics 262778ndash2779

Hadley NF 1994 Water relations of terrestrial arthropods San Diego(CA) Academic Press

Hoffmann AA Hallas R Sinclair C Mitrovski P 2001 Levels of variationin stress resistance in Drosophila among strains local populationsand geographic regions patterns for desiccation starvation coldresistance and associated traits Evolution 551621ndash1630

Hoffmann AA Parsons PA 1989a An integrated approach to environ-mental stress tolerance and life-history variation desiccation toler-ance in Drosophila Biol J Linn Soc 37117ndash136

Hoffmann AA Parsons PA 1989b Selection for increased desiccationresistance in Drosophila melanogaster additive genetic control andcorrelated responses for other stresses Genetics 122837ndash845

Hoffmann AA Sgro CM 2011 Climate change and evolutionary adap-tation Nature 470479ndash485

Hoffmann AA Weeks AR 2007 Climatic selection on genes and traitsafter a 100 year-old invasion a critical look at the temperate-tropicalclines in Drosophila melanogaster from eastern Australia Genetica129133ndash147

Huang W Richards S Carbone MA Zhu D Anholt RR Ayroles JFDuncan L Jordan KW Lawrence F Magwire MM et al 2012Epistasis dominates the genetic architecture of Drosophila quanti-tative traits Proc Natl Acad Sci U S A 10915553ndash15559

Huang Z Tunnacliffe A 2004 Response of human cells to desiccationcomparison with hyperosmotic stress response J Physiol558181ndash191

Huang Z Tunnacliffe A 2005 Gene induction by desiccation stress inhuman cell cultures FEBS Lett 5794973ndash4977

Ichihashi Y Aguilar-Martınez JA Farhi M Chitwood DH Kumar RMillon LV Peng J Maloof JN Sinha NR 2014 Evolutionary develop-mental transcriptomics reveals a gene network module regulatinginterspecific diversity in plant leaf shape Proc Natl Acad Sci U S A111E2616ndashE2621

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Johnson WA Carder JW 2012 Drosophila nociceptors mediate larvalaversion to dry surface environments utilizing both the painless TRPchannel and the DEGENaC subunit PPK1 PLoS One 7e32878

Kahsai L Kapan N Dircksen H Winther AM Nassel DR 2010 Metabolicstress responses in Drosophila are modulated by brain neurosecre-tory cells that produce multiple neuropeptides PLoS One 5e11480

Kapun M van Schalkwyk H McAllister B Flatt T Schleurootterer C 2014Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogasterMol Ecol 231813ndash1827

Kawano T Shimoda M Matsumoto H Ryuda M Tsuzuki S Hayakawa Y2010 Identification of a gene Desiccate contributing to desicca-tion resistance in Drosophila melanogaster J Biol Chem28538889ndash38897

Kellermann V van Heerwaarden B Sgro CM Hoffmann AA 2009Fundamental evolutionary limits in ecological traits driveDrosophila species distributions Science 3251244ndash1246

Kofler R Nolte V Schleurootterer C 2016 The impact of library preparationprotocols on the consistency of allele frequency estimates in Pool-Seq data Mol Ecol Resour 16118ndash122

Kofler R Pandey RV Schleurootterer C 2011 PoPoolation2 identifying dif-ferentiation between populations using sequencing of pooled DNAsamples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schleurootterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics282084ndash2085

Leiserson MDM Eldridge JV Ramachandran S Raphael BJ 2013Network analysis of GWAS data Curr Opin Genet Dev 23602ndash610

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

14

Liu Y Luo J Carlsson MA Neuroassel DR 2015 Serotonin and insulin-likepeptides modulate leucokinin-producing neurons that affectfeeding and water homeostasis in Drosophila J Comp Neurol5231840ndash1863

Lohse M Bolger AM Nagel A Fernie AR Lunn JE Stitt M Usadel B 2012RobiNA a user-friendly integrated software solution for RNA-Seq-based transcriptomics Nucleic Acids Res 40W622ndashW627

Long A Liti G Luptak A Tenaillon O 2015 Elucidating the moleculararchitecture of adaptation via evolve and resequence experimentsNat Rev Genet 16567ndash582

Lynch M Bost D Wilson S Maruki T Harrison S 2014 Population-genetic inference from pooled-sequencing data Genome Biol Evol61210ndash1218

Mackay TFC Stone EA Ayroles JF 2009 The genetics of quantitativetraits challenges and prospects Nat Rev Genet 10565ndash577

Marron MT Markow TA Kain KJ Gibbs AG 2003 Effects of starvationand desiccation on energy metabolism in desert and mesicDrosophila J Insect Physiol 49261ndash270

Matzkin LM Markow TA 2009 Transcriptional regulation of metabo-lism associated with the increased desiccation resistance of thecactophilic Drosophila mojavensis Genetics 1821279ndash1288

Murali T Pacifico S Yu J Guest S Roberts GG Finley RL Jr 2011 DroID2011 a comprehensive integrated resource for protein transcrip-tion factor RNA and gene interactions for Drosophila Nucleic AcidsRes 39D736ndashD743

Nuzhdin SV Turner TL 2013 Promises and limitations of hitchhikingmapping Curr Opin Genet Dev 23694ndash699

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Picard Picard A set of Java command line tools for manipulating high-throughput seqeuncing (HTS) data and formats Available fromhttpbroadinstitutegithubiopicard

Qiu Y Tittiger C Wicker-Thomas C Le Goff G Young S Wajnberg EFricaux T Taquet N Blomquist GJ Feyereisen R 2012 An insect-specific P450 oxidative decarbonylase for cuticular hydrocarbon bio-synthesis Proc Natl Acad Sci U S A 10914858ndash14863

Rajpurohit S Oliveira CC Etges WJ Gibbs AG 2013 Functional genomicand phenotypic responses to desiccation in natural populations of adesert Drosophilid Mol Ecol 222698ndash2715

Sarup P Soslashrensen JG Kristensen TN Hoffmann AA Loeschcke V PaigeKN Soslashrensen P 2011 Candidate genes detected in transcriptomestudies are strongly dependent on genetic background PLoS One6e15644

Schiffer M Hangartner S Hoffmann AA 2013 Assessing the relativeimportance of environmental effects carry-over effects and speciesdifferences in thermal stress resistance a comparison ofDrosophilids across field and laboratory generations J Exp Biol2163790ndash3798

Schleurootterer C Kofler R Versace E Tobler R Franssen SU 2015Combining experimental evolution with next-generation sequen-cing A powerful tool to study adaptation from standing geneticvariation Heredity 114431ndash440

Sinclair BJ Gibbs AG Roberts SP 2007 Gene transcription during ex-posure to and recovery from cold and desiccation stress inDrosophila melanogaster Insect Mol Biol 16435ndash443

Sloggett C Wakefield M Philip G Pope B 2014 Rubramdashflexible distrib-uted pipelines figshare Available from httpdxdoiorg106084m9figshare895626

Smit AFA Hubley R Green P 2013-2015 RepeatMasker Open-403Available from httpwwwrepeatmaskerorg

Seurooderberg JA Birse RT Neuroassel DR 2011 Insulin production and signalingin renal tubules of Drosophila is under control of tachykinin-relatedpeptide and regulates stress resistance PLoS One 6e19866

Soslashrensen JG Nielsen MM Loeschcke V 2007 Gene expression profileanalysis of Drosophila melanogaster selected for resistance to envi-ronmental stressors J Evol Biol 201624ndash1636

St Pierre SE Ponting L Stefancsik R McQuilton P 2014 FlyBase102mdashadvanced approaches to interrogating FlyBase Nucleic AcidsRes 42D780ndashD788

Stevenson M Nunes T Heuser C Marshall J Sanchez J Thornton RReiczigel J Robison-Cox J Sebastiani P Solymos P et al 2015 epiRTools for the analysis of epidemiological data R package version09-62 Available from httpCRANR-projectorgpackagefrac14epiR

Storey JD Tibshirani R 2003 Statistical significance for genomewidestudies Proc Natl Acad Sci U S A 1009440ndash9445

Telonis-Scott M Gane M DeGaris S Sgro CM Hoffmann AA 2012 Highresolution mapping of candidate alleles for desiccation resistancein Drosophila melanogaster under selection Mol Biol Evol291335ndash1351

Telonis-Scott M Guthridge KM Hoffmann AA 2006 A new set oflaboratory-selected Drosophila melanogaster lines for the analysisof desiccation resistance response to selection physiology and cor-related responses J Exp Biol 2091837ndash1847

Terhzaz S Cabrero P Robben JH Radford JC Hudson BD Milligan GDow JA Davies SA 2012 Mechanism and function of Drosophilacapa GPCR a desiccation stress-responsive receptor with functionalhomology to human neuromedinU receptor PLoS One 7e29897

Terhzaz S Overend G Sebastian S Dow JA Davies SA 2014 The Dmelanogaster capa-1 neuropeptide activates renal NF-kB signalingPeptides 53218ndash224

Terhzaz S Teets NM Cabrero P Henderson L Ritchie MG Nachman RJDow JA Denlinger DL Davies SA 2015 Insect capa neuropeptidesimpact desiccation and cold tolerance Proc Natl Acad Sci U S A1122882ndash2887

Thorat LJ Gaikwad SM Nath BB 2012 Trehalose as an indicator ofdesiccation stress in Drosophila melanogaster larvae a potentialmarker of anhydrobiosis Biochem Biophys Res Commun419638ndash642

Tobler R Franssen SU Kofler R Orozco-terWengel P Nolte VHermisson J Schl eurootterer C 2014 Massive habitat-specific ge-nomic response in D melanogaster populations during exper-imental evolution in hot and cold environments Mol Biol Evol31364ndash375

Turner TL Bourne EC Von Wettberg EJ Hu TT Nuzhdin SV 2010Population resequencing reveals local adaptation of Arabidopsislyrata to serpentine soils Nat Genet 42260ndash263

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Vermeulen CJ Soslashrensen P Kirilova Gagalova K Loeschcke V 2013Transcriptomic analysis of inbreeding depression in cold-sensitiveDrosophila melanogaster shows upregulation of the immune re-sponse J Evol Biol 261890ndash1902

Wang B Goode J Best J Meltzer J Schilman PE Chen J Garza D ThomasJB Montminy M 2008 The insulin-regulated CREB coactivatorTORC promotes stress resistance in Drosophila Cell Metab7434ndash444

Wang J Kean L Yang J Allan AK Davies SA Herzyk P Dow JA 2004Function-informed transcriptome analysis of Drosophila renaltubule Genome Biol 5R69

Wang K Li M Hakonarson H 2010 Analysing biological pathways ingenome-wide association studies Nat Rev Genet 11843ndash854

Wang L Jia P Wolfinger RD Chen X Zhao Z 2011 Gene set analysis ofgenome-wide association studies methodological issues and per-spectives Genomics 981ndash8

Weber AL Khan GF Magwire MM Tabor CL Mackay TF Anholt RR2012 Genome-wide association analysis of oxidative stress resistancein Drosophila melanogaster PLoS One 7e34745

Weeks AR McKechnie SW Hoffmann AA 2002 Dissecting adaptiveclinal variation markers inversions and sizestress associations inDrosophila melanogaster from a central field population Ecol Lett5756ndash763

Zhu Y Bergland AO Gonzalez J Petrov DA 2012 Empirical validation ofpooled whole genome population re-sequencing in Drosophila mel-anogaster PLoS One 7e41901

15

Page 5: Cross-Study Comparison Reveals Common Genomic, Network, … · 2016-02-06 · resistance using a high-throughput sequencing Pool-GWAS (genome-wide association study) approach ( Bastide

Table 1 Summary of Primary Molecular Responses and Candidate Genes for Desiccation Response Mechanisms Identified in the Literature

Mechanism Molecular Function Genes Current Study(F1 wild-caught)

Telonis-Scott et al(2012) (F8 artificial

selection)

Hygrosensing Sensory moisture receptors (antennae legs gut mouthparts)

Temperature-activatedtransient receptor po-tential ionchannelsbcd

TRP ion channels trpl trp Trpg TrpA1 TrpmTrpml pain pyx wtrwnompC iav nan Amo

mdash trpl

Stress sensing Malpighian tubules (principal cells)

Tubule fluid secretion signaling pathways

cAMP signalingefg cAMP activation Dh44-R2 mdash mdash

cGMP signalingefg cGMP activation Capa CapaRcedilNplp1-4 mdash mdash

Receptors Gyc76c CG33958 CG34357 mdash CG34357

Kinases dg1-2 mdash mdash

cAMP + cGMPsignalingefgh

Hydrolyzing phosphodiesterases dnc pde1c pde6 pde9 pde11 dnc pde9 dnc pde1c pde9 pde11

Calcium signalingeg Calcium-sensitive nitric oxidesynthase

Nos mdash Nos

Desiccation specific Diuretic neuropeptide NFkBsignalingijk

Capa CapaR trpl Relish klu klu klu trpl

Anti-diureticneuropeptidel ITP sNPF Tk mdash ITP sNPF

Stress sensing Stress responsive pathway

Stress activatedprotein kinasepathwaymn

MAPKJun kinase

Aop Aplip1 Atf-2 bsk Btk29Acbt Cdc 42 Cka cno CYLDDok egr Gadd45 hep hppyJra kay medo Mkk4 msnMtl Pak puc pyd Rac 1-2Rho1 shark slpr Src42ASrc64B Tak1 Traf6

Aop Dokhepcedilkaymsn Mtlpuc Rac1Src64B

Mtl slpr

Metabolic homeostasis and water balance

Components of insu-lin signaling pathway

Receptorop InR InR mdash

Receptor binding chico IIp1-8 dock Dok poly Dok Poly

Resistance mechanisms Water loss barriers

Cuticularhydrocarbons

Fatty acid synthasesq desatura-sesr transporters oxidativedecarbonylaset

CG3523 CG3524 desat1-2 FatPCyp4g1

CG3523 mdash

Primary hemolymph sugartissue-protectant

Trehalose metabolism Trehalose-phosphatasebu treh-lase activitybu phosphoglyc-erate mutase

Tps1 Treh Pgm CG5171CG5177 CG6262 crc

Treh Pgm mdash

NOTEmdashCandidate genes differentiated in the current study and in Telonis-Scott et al (2012) are shown and are underlined where overlappingaFor brevity several comprehensive reviews are cited and readers are referred to references therein Note that water budgeting via discontinuous gas exchange was not includeddue to a lack of ldquobona fiderdquo molecular candidates and desiccation tolerance due to aquaporins was omitted as genes were not differentiated in either of our studiesbChown et al (2011)cJohnson and Carder (2012)dFowler and Montell (2013)eDay et al (2005)fDavies et al (2012)gDavies et al (2013)hDavies et al (2014)iTerhzaz et al (2012)jTerhzaz et al (2014)kTerhzaz et al (2015)lKahsai et al (2010)mHuang and Tunnacliffe (2004)nHuang and Tunnacliffe (2005)oSeurooderberg et al (2011)pLiu et al (2015)qChung et al (2014)rSinclair et al (2007)rFoley and Telonis-Scott (2011)tQiu et al (2012)uThorat et al (2012)

5

Conserved Genomic Signatures of Drosophila Desiccation Resistance doi101093molbevmsv349 MBE by guest on February 6 2016

httpmbeoxfordjournalsorg

Dow

nloaded from

out of the 382 and 416 genes for the current and 2012 studyrespectively (Fisherrsquos exact test for overlap P lt 2 1016odds ratio frac14 590 supplementary table S4 SupplementaryMaterial online) Bearing gene length bias in mind we exam-ined the average length of genes in both studies as well as thegene overlap to the average gene length in the Drosophilagenome (supplementary fig S3 Supplementary Materialonline) Although there was bias toward longer genes in theoverlap list this was also apparent in the full gene lists fromboth studies

The overlap genes mapped to all of the major chromo-somes and their differentiated variants were mostly noncod-ing SNPs located in introns (supplementary table S4Supplementary Material online) Two SNPs were synonymous(Strn-Mlick coding for a protein kinase and Cht7 chitinbinding supplementary table S4 Supplementary Materialonline) while a 30UTR variant was predicted to putativelyalter protein structure in a nondisruptive manner mostlikely impacting expression andor protein translational effi-ciency (jbug response to stimulus supplementary table S4Supplementary Material online) One nonsynonymous SNPwas predicted to alter protein function as a missense variant(fab1 signaling supplementary table S4 SupplementaryMaterial online) The increase in frequency of the favoreddesiccation alleles compared with the controls ranged from5 to 27 with a median increase of 164

Despite the strong overlap there were important differ-ences between the two studies For example our previouswork revealed evidence of a selective sweep in the 50 pro-moter region of the Dys gene in the artificially selected linesand we confirmed a higher frequency of the selected SNPs inan independent natural association study in resistant fliesfrom Coffs Harbour (Telonis-Scott et al 2012) In this studywe found no differentiation at the Dys locus between ourresistant and random controls However both resistant andcontrol pools had a high frequency of the selected allele de-tected in Telonis-Scott et al (2012) (supplementary table S5Supplementary Material online)

The 45 genes common to both studies were analyzed forbiological function while many genes are obviously pleiotro-pic (ie assigned multiple GO terms) almost half were in-volved in response to stimuli (cellular behavioral andregulatory) including signaling such as Rho protein signaltransduction and cAMP-mediated signaling (table 3) Othercategories of interest include spiracle morphogenesis sensoryorgan development including development of the compoundeye and stem cell and neuroblast differentiation (table 3)

With regard to specific stress responsedesiccation candi-dates (table 1) 4 of the 45 genes overlapped including thecAMPcGMP signaling phosphodiesterases pde9 and dncMAPK pathway genes Mtl and klu indicated in NF-kBorthologue signaling in the renal tubules (table 1)Furthermore there was overlap among mechanistic catego-ries where different genes in the same pathways weredetected For example additional key cAMPcGMP hydrolyz-ing phosphodiesterases pde1c and pde11 were significant inTelonis-Scott et al (2012) as well as slpr (MAPK pathway)and poly (insulin signaling) (table 1)

Table 2 Pathway Enrichment of the GWAS and Telonis-Scott et alFirst-Order PPI Networks Investigated Using FlyMine (FDR P lt 005)

Current Study GWAS Telonis-Scott et al(2012)

Database Pathway Pathway

KEGG Ribosome RibosomeNotch signaling pathway Notch signaling pathwayMAPK signaling pathwayDorso-ventral axis formationProgesterone-mediated oocyte

maturationWnt signaling pathwayEndocytosisProteasome

Reactome Immune system Immune systemAdaptive immune system Adaptive immune systemInnate immune system Innate immune systemSignaling by interleukins Signaling by interleukins(Toll-like receptor cascades) Toll-like receptor

cascadesTCR signaling(B cell receptor signaling)

Gene expression Gene expressionMetabolism of RNA Metabolism of RNANonsense-mediated decay Nonsense-mediated decayTranscriptionmRNA processing(RNA transport)Metabolism of proteins (Metabolism of

proteins)Translation (Translation)Cell cycle Cell cycle

Cell cycle checkpointsRegulation of mitotic cell

cycleMitotic metaphase and

anaphaseG0 and early G1 G0 and early G1

G1 phase(G1S transition)G2M transition(DNA replicationS

phase)DNA repairDouble-strand break

repairSignaling pathways Signaling pathwaysInsulin signaling pathwaySignaling by NGF Signaling by NGF

Signaling by FGFRSignaling by NOTCHSignaling by WntSignaling by EGFRDevelopmentAxon guidanceMembrane traffickingGap junction trafficking

and regulationProgramed cell deathApoptosisHemostasisHemostasis

NOTEmdashKEGG and Reactome pathway databases were queried All significant termsare reported with Reactome terms grouped together under parent terms in italictext (full list of terms available in supplementary table S3 Supplementary Materialonline) Parent terms without brackets are the names of pathways that were sig-nificantly enriched parent terms in parentheses were not listed as significantlyenriched themselves but are reported as a guide to the category contents

6

In addition to the significant gene-level overlap betweenthe two studies we observed significant overlap between thePPI networks constructed from each set of candidate genes(summarized in fig 2) We examined the overlap by compar-ing the first-order networks 1) at the level of node overlap and2) at the level of edge (interaction) overlap Three hundredseven GWAS seeds formed a network of 3324 nodes whileour previous set of candidates mapped to 337 seeds forming anetwork that comprised 3215 nodes One thousand eighthundred twenty-six or ~55 of the nodes overlapped be-tween networks which was significantly higher than the nulldistribution estimated from 1000 simulations (mean simu-lated overlap SD 1304 143 nodes observed overlap atthe 999th percentile fig 2A and C) The overlap at the inter-action level was not significantly higher than expected (meansimulated overlap SD 25704 4024 edges observed over-lap 30686 edges 89th percentile fig 2B and D) Hive plotswere used to visualize the separate networks (supplementaryfig S4A and B Supplementary Material online) as well as theoverlap between the networks (supplementary fig S4CSupplementary Material online)

Given the high degree of shared nodes from distinct setsof protein seeds we compared the networks for biologicalfunction (fig 2E) Both networks produced long lists of sig-nificantly enriched GO terms (supplementary table S3Supplementary Material online) and assessing overlap was

difficult due to the nonindependent nature of GO hierar-chies However a comparison of functional pathway enrich-ment revealed some differences Although both networksshowed enrichment in the immune system pathways theribosome and Notch and NGF signaling pathways as well assome involvement of gene expression and translation path-ways the network constructed from the Telonis-Scott et al(2012) gene list showed extensive involvement of cell cyclepathway proteins as well as significant enrichment for Wntsignaling EGRF signaling DNA double-strand break repairaxon guidance gap junction trafficking and apoptosis path-ways which were not enriched in the GWAS network TheGWAS network was specifically enriched for dorso-ventralaxis formation oocyte maturation and many more geneexpression pathways than the Telonis-Scott et al (2012)gene network (table 2 and supplementary table S3Supplementary Material online)

Discussion

Genomic Signature of Desiccation Resistance fromNatural Standing Genetic Variation

We report the first large-scale genome-wide screen for nat-ural alleles associated with survival of low humidity condi-tions in D melanogaster By combining a powerful naturalassociation study with high-throughput Pool-seq we

Table 3 Overrepresentation of GO Categories for the 45 Core Candidate Genes Overlapping between the Current and Our Previous(Telonis-Scott et al 2012) Genomic Study

Term ID TermSubterms FDR P valuea Genes

Cellularbehavioral response to stimulus defense and signaling

GO0050896 Response to stimulus (cellular response tostimulus regulation of response tostimulus)

0007 Abd-B dnc fz mam nkd slo ct fru kluRbf klg santa-maria jbug fab1CG33275 Mtl Trim9 cv-c Gprk2 Ect4CG43658

GO0048585 Negative regulation of response to stimu-lus (Rho protein signal transductionsignal transduction)

0042 dnc mam slo fru Trim9 cv-c

GO0023052 Signaling (cAMP-mediated signaling cellcommunication)

0031 Amph Pde9 dnc fz mam nkd ct kluRbf santa-maria fab1 CG33275 MtlTrim9 cv-c Gprk2 Ect4 CG43658

Development

GO0048863 Stem cell differentiation 0014 Abd-D mam nkd klu nub

GO0014016 Neuroblast differentiation 0048

GO0045165 Cell fate commitment 0010 Abd-D dnc fz mam nkd ct klu klg nub

GO0035277 Spiracle morphogenesis open trachealsystem

0010 Abd-D ct cv-c

GO0007423 Sensory organ development (eye wingdisc imaginal disc)

0008 fz mam ct klu klg sdk Amph Nckx30cMtl

GO0048736 Appendage development 0020 fz mam ct fru CG33275 Mtl Fili nubcv-c Gprk2 CG43658

GO0007552 Metamorphosis 0001 fz mam ct fru CG33275 Mtl Fili cv-cGprk2 CG43658

Cellular component organization

GO0030030 Cell projection organization 0020 dnc fz ct fru GluClalpha Nckx30c MtlTrim9 nub cv-c

NOTEmdashNote that most genes are assigned to multiple annotations and therefore are shown purely as a guide to putative functionaBonferronindashHochberg FDR and gene length corrected P values using the Flymine v400 database Gene Ontology Enrichment The terms were condensed and trimmed usingREVIGO

7

mapped SNPs associated with desiccation resistance in 500naturally derived genotypes Although Pool-seq has limitedpower to reveal very low frequency alleles to reconstructphase or adequately estimate allele frequencies in smallpools (Lynch et al 2014) the Pool-GWAS approach hasbeen validated by retrieval of known candidate genes in-volved in body pigmentation using similar numbers of ge-notypes as assessed here (Bastide et al 2013)

Although studies in controlled genetic backgrounds reportlarge effects of single genes on desiccation resistance (Table 1)our current and previous screens revealed similarly largenumbers of variants (over 600) mapping to over 300 genesA polygenic basis of desiccation resistance is consistent withquantitative genetic data for the trait (Foley and Telonis-Scott2011) and we also anticipate more complexity from naturalD melanogaster populations with greater standing geneticvariation than inbred laboratory strains However somefalse positive associations are likely in the GWAS due to thelimitations of quantitative trait mapping approaches Pooledevolve-and-resequence experiments routinely yield vast num-bers of candidate SNPs partly due to high levels of standinglinkage disequilibrium (LD) and hitchhiking neutral alleles

(Schleurootterer et al 2015) Large population sizes from specieswith low levels of LD such as D melanogaster somewhat cir-cumvent this issue and a notable feature of our experimentaldesign is our sampling Here we selected the most desicca-tion-resistant phenotypes directly from substantial standinggenetic variation where one generation of recombinationshould largely reflect natural haplotypes

We observed the largest candidate allele frequency differ-ences in loci that had intermediate allele frequencies in thecontrol pool This is in line with population genetics modelswhich predict that evolution from standing genetic variationwill initially be strongest for intermediate frequency alleleswhich can facilitate rapid change (Long et al 2015) Howeverwe also observed a number of low frequency alleles which ifbeneficial can inflate false positives due to long-range LD(Orozco-terWengel et al 2012 Nuzhdin and Turner 2013Tobler et al 2014) Nonetheless spurious LD associationscannot fully account for our candidate list as evidenced bythe desiccation functional candidates and cross-study over-lap Furthermore chromosome inversion dynamics can alsogenerate both short and long-range LD and contribute nu-merous false positive SNP phenotype associations (Hoffmann

Fig 2 Observed versus simulated network overlap between the GWAS first-order PPI network and the first-order PPI network from the Telonis-Scottet al (2012) gene list Overlap is presented in terms of (A) node number and (B) edge number Networks were simulated by resampling from the entireDrosophila melanogaster gene list (r553) as described in the text The histogram shows the distribution of overlap measures for 1000 simulations Theblack line represents the histogram as density and the blue line shows the corresponding normal distribution The red vertical line shows the observedoverlap from the real networks labeled as a percentile of the normal distribution Area-proportional Venn diagrams summarized the extent of overlapfor PPI networks constructed separately from the GWAS candidates and Telonis-Scott et al (2012) candidates (labeled ldquoprevious studyrdquo) (C) Overlapexpressed as the number of nodes Approximately 55 of the nodes overlapped between the two studies which was significantly higher than simulatedexpectations (999 percentile 2B) (D) Overlap expressed in the number of interactions This was not significantly higher than expected by simulation(89th percentile 2B) (E) Overlap at the level of GO biological process terms (directly compared GO terms by name this was not tested statisticallybecause of the complex hierarchical nature of GO terms and is presented for interest)

8

and Weeks 2007 Tobler et al 2014) but we found little as-sociation between the desiccation pool and inversion fre-quency changes consistent with the lack of latitudinalpatterns in desiccation resistanceand traitinversion fre-quency associations in east Australian D melanogaster(Hoffmann et al 2001 Weeks et al 2002)

Cross-Study Comparison Reveals a Core Set ofCandidates for a Complex Trait

Our data contain many candidate genes functions and path-ways for insect desiccation resistance but partitioning genu-ine adaptive loci from experimental noise remains asignificant challenge We therefore focus our efforts on ournovel comparative analysis where we identified 45 robustcandidates for further exploration Common genetic signa-tures are rarely documented in cross-study comparisons ofcomplex traits (Sarup et al 2011 Huang et al 2012 but cfVermeulen et al 2013) and never before for desiccation re-sistance Some degree of overlap between our studies likelyresults from a shared genetic background from similar sam-pling sites highlighting the replicability of these responsesOne limitation of this approach is the overpopulation ofthe ldquocorerdquo list by longer genes This result reflects the genelength bias inherent to GWAS approaches an issue that is notstraightforward to correct in a GWAS let alone in the com-parative framework applied here A focus on the commongenes functionally linked to the desiccation phenotype canprovide a meaningful framework for future research and alsosuggests that some large-effect candidate genes exhibit natu-ral variation contributing to the phenotype (table 1)

Other feasible candidates include genes expressed inthe MTs (Wang et al 2004) or essential for MT develop-ment (St Pierre et al 2014) providing further researchopportunity focused on the primary stress sensing andfluid secretion tissues Genes include one of the mosthighly expressed MT transcripts CG7084 as well as dncNckx30C slo cv-c and nkd Other aspects of stress resis-tance are also implicated CrebB for example is involvedin regulating insulin-regulated stress resistance and ther-mosensory behavior (Wang et al 2008) nkd also impactslarval cuticle development and shares a PDZ signalingdomain also present in the desiccation candidate desi(Kawano et al 2010 St Pierre et al 2014) Finally severalcore candidates were consistently reported in recent stud-ies suggesting generalized roles in stress responses andfitness including transcriptome studies on MT function(Wang et al 2004) inbreeding and cold resistance(Vermeulen et al 2013) oxidative stress (Weber et al2012) and genomic studies exploring age-specific SNPson fitness traits (Durham et al 2014) and genes likelyunder spatial selection in flies from climatically divergenthabitats (Fabian et al 2012)

Several of the enriched GO categories from the core listalso make biological sense including stimulus response(almost half the suite) defense and signaling particularlyin pathways involved in MT stress sensing (Davies et al 20122013 2014) Functional categories enriched for sensory

organ development were highlighted consistent with envi-ronmental sensing categories from desert Drosophila tran-scriptomics (Matzkin and Markow 2009 Rajpurohit et al2013) and stress detection via phototransduction pathwaysin D melanogaster under a range of stressors including des-iccation (Soslashrensen et al 2007)

Evidence for Conserved Signatures at Higher LevelOrganization

Beyond the gene level we observed a striking degree ofsimilarity between this study and that of Telonis-Scottet al (2012) at the network level where significantly over-lapping protein networks were constructed from largely dif-ferent seed protein sets This supports a scenario where theldquobiological information flow from DNA to phenotyperdquo(Civelek and Lusis 2014) contains inherent redundancywhere alternative genetic solutions underlie phenotypesand functions Resistance to perturbation by genetic or en-vironmental variation appears to increase with increasinghierarchical complexity that is phenotype ldquobufferingrdquo (Fuet al 2009) The same functional network in different pop-ulationsgenetic backgrounds can be impacted but theexact genes and SNPs responding to perturbation will notnecessarily overlap In Drosophila different sets of loci fromthe same founders mapped to the same PPI network in thecase of two complex traits chill coma recovery and startleresponse (Huang et al 2012) Furthermore interspecificshared networkspathways from different genes have beendocumented for numerous complex traits (Emilsson et al2008 Aytes et al 2014 Ichihashi et al 2014) providing op-portunities for investigating taxa where individual genefunction is poorly understood

Although the null distribution is not always obviouswhen comparing networks and their functions partly be-cause of reduced sensitivity from missing interactions(Leiserson et al 2013) here the shared network signaturesportray a plausible multifaceted stress response involvingstress sensing rapid gene accessibility and expression Thepredominance of immune system and stress responsepathways are of interest with reference immunitystresspathway ldquocross-talkrdquo particularly in the MTs whereabiotic stressors elicit strong immunity transcriptionalresponses (Davies et al 2012) The higher order architec-tures reflect crucial aspects of rapid gene expression con-trol in response to stress from chromatin organization(Alexander and Lomvardas 2014) to transcription RNAdecay and translation (reviwed in de Nadal et al 2011) InDrosophila variation in transcriptional regulation of indi-vidual genes can underpin divergent desiccation pheno-types (Chung et al 2014) or be functionally inferred frompatterns of many genes elicited during desiccation stress(Rajpurohit et al 2013) Variants with potential to mod-ulate expression at different stages of gene expressioncontrol were evident in our GWAS data alone for exam-ple enrichment for 50UTR and splice variants at the SNPlevel and for chromatin and chromosome structure termsat the functional level

9

Genetic Overlap Versus Genetic differencesmdashCausesand Implications

Despite the common signatures at the gene and higher orderlevels between our studies pervasive differences were ob-served These can arise from differences in standing geneticvariation in the natural populations where different alleliccompositions impact gene-by-environment interactions andepistasis Our network overlap analyses highlight the potentialfor different molecular trajectories to result in similar func-tions (Leiserson et al 2013) consistent with the multiplephysiological pathways potentially available to D melanoga-ster under water stress (Chown et al 2011) Experimentaldesign also presumably contributed to the study differencesStrong directional selection as applied in the 2012 study canincrease the frequency of rare beneficial alleles and likely re-duced overall diversity than did the single-generation GWASdesign Furthermore selection regimes often impose addi-tional selective pressures (eg food deprivation during desic-cation) resulting in correlated responses such as starvationresistance with desiccation resistance and these factors differfrom the laboratory to the field Finally evidence suggests innatural D melanogaster that standing genetic variation canvary extensively with sampling season (Itoh et al 2010Bergland et al 2014) Although we cannot compare levelsof standing variation between our two studies evidence sug-gests that this may have affected the Dys gene where ourGWAS approach revealed no differentiation between thecontrol and resistant pools at this locus but rather detectedthe alleles associated with the selective sweep in Telonis-Scottet al (2012) in both pools at high frequency Whether thisresulted from sampling the later collection after a longdrought (2004ndash2008 wwwbomgovauclimate last accessedJanuary 13 2016) is unclear but this observation highlightsthe relevance of temporal sampling to standing genetic var-iation when attempting to link complex phenotypes togenotypes

Materials and Methods

Natural Population Sampling

Drosophila melanogaster was collected from vineyard waste atKilchorn Winery Romsey Australia in April 2010 Over 1200isofemale lines were established in the laboratory from wild-caught females Species identification was conducted on maleF1 to remove Drosophila simulans contamination and over900 D melanogaster isofemale lines were retained All flieswere maintained in vials of dextrose cornmeal media at25 C and constant light

Desiccation Assay

From each isofemale line ten inseminated F1 females weresorted by aspiration without CO2 and held in vials until 4ndash5days old For the desiccation assay 10 females per line (over9000 flies in total) were screened by placing groups of femalesinto empty vials covered with gauze and randomly assigninglines into 5 large sealed glass tanks containing trays of silicadesiccant (relative humidity [RH] lt10) The tanks werescored for mortality hourly and the final 558 surviving

individuals were selected for the desiccation-resistant pool(558 flies lt6 of the total) The control pool constitutedthe same number of females sampled from over 200 ran-domly chosen isofemale lines that were frozen prior to thedesiccation assay We chose this random control approachinstead of comparing the phenotypic extremes (ie the latemortality tail vs the ldquoearly mortalityrdquo tail) because unfavor-able environmental conditions experienced by the wild-caught dam may inflate environmental variance due to car-ryover effects in the F1 offspring (Schiffer et al 2013)

DNA Extraction and Sequencing

We used pooled whole-genome sequencing (Pool-seq)(Futschik and Schleurootterer 2010) to estimate and compareallele frequency differences between the most desiccation-resistant and randomly sampled flies from a wild populationto associate naturally segregating candidate SNPs with theresponse to low humidity For each treatment (resistantand control) genomic DNA was extracted from 50 femaleheads using the Qiagen DNeasy Blood and Tissue Kit accord-ing to the manufacturerrsquos instructions (Qiagen HildenGermany) Two extractions were combined to create fivepools per treatment which were barcoded separately tocreate technical replicates (total 10 pools of 100 fliesn frac14 1000) The DNA was fragmented using a CovarisS2 ma-chine (Covaris Inc Woburn MA) and libraries were pre-pared from 1 mg genomic DNA (per pool of 100 flies) usingthe Illumina TruSeq Prep module (Illumina San Diego CA)To minimize the effect of variation across lanes the ten poolswere combined in equal concentration (fig 3) and run mul-tiplexed in each lane of a full flow cell of an Illumina HiSeq2000 sequencer Clusters were generated using the TruSeq PECluster Kit v5 on a cBot and sequenced using Illumina TruSeqSBS v3 chemistry (Illumina) Fragmentation library construc-tion and sequencing were performed at the Micromon Next-Gen Sequencing facility (Monash University ClaytonAustralia)

Fig 3 Experimental design See text for further explanation

10

Data Processing and Mapping

Processing was performed on a high-performance computingcluster using the Rubra pipeline system that makes use of theRuffus Python library (Goodstadt 2010) The final variant-calling pipeline was based on a pipeline developed by ClareSloggett (Sloggett et al 2014) and is publicly available athttpsgithubcomgriffinpGWAS_pipeline (last accessedJanuary 13 2016) Raw sequence reads were processed persample per lane with P1 and P2 read files processed sepa-rately initially Trimming was performed with trimmomatic v030 (Lohse et al 2012) Adapter sequences were also re-moved Leading and trailing bases with a quality scorebelow 30 were trimmed and a sliding window (width frac14 10quality threshold frac14 25) was used Reads shorter than 40 bpafter trimming were discarded Quality before and after trim-ming was examined using FastQC v 0101 (Andrews 2014)

Posttrimming reads that remained paired were used asinput into the alignment step Alignment to the D melano-gaster reference genome version r553 was performed withbwa v 062 (Li and Durbin 2009) using the bwa aln commandand the following options No seeding (-l 150) 1 rate ofmissing alignments given 2 uniform base error rate (-n001) a maximum of 2 gap opens per read (-o 2) a maximumof 12 gap extensions per read (-e 12) and long deletionswithin 12 bp of 30 end disallowed (-d 12) The default valueswere used for all other options Output files were then con-verted to sorted bam files using the bwa sampe commandand the ldquoSortSamrdquo option in Picard v 196

At this point the bam files were merged into one file persample using PicardMerge Duplicates were identified withPicard MarkDuplicates The tool RealignerTargetCreator inGATK v 26-5 (DePristo et al 2011) was applied to the tenbam files simultaneously producing a list of intervals contain-ing probable indels around which local realignment would beperformed This was used as input for the IndelRealigner toolwhich was also applied to all ten bam files simultaneously

To investigate the variation among technical replicatesallele frequency was estimated based on read counts foreach technical replicate separately for 100000 randomly cho-sen SNPs across the genome After excluding candidate des-iccation-resistance SNPs and SNPs that may have been falsepositives due to sequencing error (those with mean frequencylt004 across replicates) 1803 SNPs remained For each locuswe calculated the variance in allele frequency estimate amongthe five control replicates and the variance among the fivedesiccation-resistance replicates We compared this with thevariance due to pool category (control vs desiccation resis-tant) using analysis of variance (ANOVA) We also calculatedthe mean pairwise difference in allele frequency estimateamong control and among desiccation-resistance replicatesand the mean concordance correlation coefficient was calcu-lated over all pairwise comparisons within pool category(using the epiccc function in the epiR package Stevensonet al 2015)

The following approach was then taken to identify SNPsdiffering between the desiccation-resistant and controlgroups Based on the technical replicate analysis we merged

the five ldquodesiccation-resistantrdquo and the five random controlsamples into two bam files A pileup file was created withsamtools v 0119 (Li et al 2009) and converted to a ldquosyncrdquo fileusing the mpileup2sync script in Popoolation2 v 1201 (Kofleret al 2011) retaining reads with a mapping quality of at least20 and bases with a minimum quality score of 20 Reads withunmapped mates were also retained (option ndashA in samtoolsmpileup) For this step we used a repeat-masked version ofthe reference genome created with RepeatMasker 403 (Smitet al 2013-2015) and a repeat file containing all annotatedtransposons from D melanogaster including shared ancestralsequences (Flybase release r557) plus all repetitive elementsfrom RepBase release v1904 for D melanogaster Simple re-peats small RNA genes and bacterial insertions were notmasked (options ndashnolow ndashnorna ndashno_is) The rmblastsearch engine was used

Association Analysis

We then performed Fisherrsquos exact test with the Fisher testscript in Popoolation2 (Kofler et al 2011) We set the min-imum count (of the minor allele) frac14 10 minimum coverage(in both populations) frac14 20 and maximum coverage frac1410000 The test was performed for each SNP (slidingwindow off) Although there are numerous approaches tocalculating genome-wide significance thresholds in aPoolseq framework currently there is no consensus on anappropriate standardized statistical method Methods in-clude simulation approaches (Orozco-terWengel et al2012 Bastide et al 2013 Burke et al 2014) correctionusing a null distribution based on estimating genetic driftfrom replicated artificial selection line allele frequency vari-ance (Turner and Miller 2012) use of a simple thresholdbased on a minimum allele frequency difference (Turneret al 2010) and the FDR correction suggested by Storeyand Tibshirani (2003) (used by Fabian et al 2012)

We determined based on our design to calculate the FDRthreshold by comparison with a simulated distribution of nullP values following the approach of Bastide et al (2013) Foreach SNP locus a P value was calculated by simulating thedistribution of read counts between the major and minorallele according to a beta-binomial distribution with meana(a+b) and variance (ab)((a+b)2(a+b+1)) Thismodel allowed for two types of sampling variationStochastic variation in allele sampling across the phenotypesand the background variation in the representation of allelesin each pool caused by library preparation artifacts or sto-chastic variation driven by unequal coverage of the poolsThese sources of variation have also been recognized else-where as being important in interpreting pooled sequenceallele calls and identifying significant differentiation betweenpooled samples (Lynch et al 2014) The value of afrac14 37 waschosen by chi-square test as the best match to the observedP value distribution (afrac14 10 to afrac14 40 were tested) with acorresponding bfrac14 4315 to reflect the coverage depth differ-ence between the random control (487) and desiccation-resistant (568) pools These values were then used tosimulate a null data set 10 the size of the observed data

11

set Because the null P value distribution still did not fit theobserved P value distribution particularly well (supplemen-tary fig S2 Supplementary Material online) these values werenot used for a standard FDR correction Instead a P valuethreshold was calculated based on the lowest 005 of thenull P values similar in approach to Orozco-terWengel et al(2012)

Inversion Analysis

In D melanogaster several cosmopolitan inversion polymor-phisms show cross-continent latitudinal patterns that explaina substantial proportion of clinal variation for many traitsincluding thermal tolerance (reviewed in Hoffmann andWeeks 2007) Patterns of disequilibria generated betweenloci in the vicinity of the rearrangement can obscure signalsof selection (Hoffmann and Weeks 2007) and inversion fre-quencies are routinely considered in genomic studies ofnatural populations sampled from known latitudinal clines(Fabian et al 2012 Kapun et al 2014) We assessedassociations between inversion frequencies and our controland resistant pools at SNPs diagnostic for the following in-versions In3R(Payne) (Anderson et al 2005 Kapun et al2014) In(2L)t (Andolfatto et al 1999 Kapun et al 2014)In(2R)Ns In(3L)P In(3R)C In(3R)K and In(3R)Mo (Kapunet al 2014) Between one and five diagnostic SNPs were in-vestigated for each inversion (depending on availability) Weconsidered an inversion to be associated with desiccationresistance if the frequency of its diagnostic SNP differed sig-nificantly between the control and desiccation-resistant poolsby a Fisherrsquos exact test

Genome Feature Annotation

The list of candidate SNPs was converted to vcf format using acustom R script setting the alternate allele as the allele thatincreased in frequency from the control to the desiccation-resistant pool and setting the reference allele as the otherallele present in cases where both the major and minor allelesdiffered from the reference genome The genomic features inwhich the candidate SNPs occurred were annotated usingsnpEff v40 first creating a database from the D melanogasterv553 genome annotation with the default approach(Cingolani et al 2012) The default annotation settings wereused except for setting the parameter ldquo-ud 0rdquo to annotateSNPs to genes only if they were within the gene body Using2 tests on SNP counts we tested for over- or under-enrichment of feature types by calculating the proportionof lsquoallrsquo features from all SNPs detected in the study and theproportion of ldquocandidaterdquo SNPs identified as differentiated indesiccation-resistant flies (Fabian et al 2012)

GO Analysis

GO analyses was used to associate the candidates with theirbiological functions (Ashburner et al 2000) Previously devel-oped for biological pathway analysis of global gene expressionstudies (Wang et al 2010) typical GO analysis assumes that allgenes are sampled independently with equal probabilityGWAS violate these criteria where SNPs are more likely to

occur in longer genes than in shorter genes resulting in highertype I error rates in GO categories overpopulated with longgenes Gene length as well as overlapping gene biases (Wanget al 2011) were accounted for using Gowinda (Kofler andSchleurootterer 2012) which implements a permutation strategyto reduce false positives Gowinda does not reconstruct LDbetween SNPs but operates in two modes underpinned bytwo extreme assumptions 1) Gene mode (assumes all SNPs ina gene are in LD) and 2) SNP mode (assumes all SNPs in agene are completely independent)

The candidates were examined in both modes but giventhe stringency of gene mode high-resolution SNP modeoutput was utilized for the final analysis with the followingparameters 106 simulations minimum significance frac141 andminimum genes per GO categoryfrac14 3 (to exclude gene-poorcategories) P values were corrected using the Gowinda FDRalgorithm GO annotations from Flybase r557 were applied

Candidate Gene Analysis

In conjunction with the GO analysis we also examinedknown candidate genes associated with survival under lowRH to better characterize possible biological functions of ourGWAS candidates We took a conservative approach to can-didate gene assignment where the SNPs were mapped withinthe entire gene region without up- and downstream exten-sions Genes were curated from FlyBase and the literature(table 1) predominantly using detailed functional experimen-tal evidence as criteria for functional desiccation candidates

Network Analysis

We next obtained a higher level summary of the candidategene list using PPI network analysis The full candidate genelist was used to construct a first-order interaction networkusing all Drosophila PPIs listed in the Drosophila InteractionDatabase v 2014_10 (avoiding interologs) (Murali et al 2011)This ldquofullrdquo PPI network includes manually curated data frompublished literature and experimental data for 9633 proteinsand 93799 interactions For the network construction theigraph R package was used to build a subgraph from the fullPPI network containing the candidate proteins and their first-order interacting proteins (Csardi and Nepusz 2006) Self-connections were removed Functional enrichment analysiswas conducted on the list of network genes using FlyMinev400 (wwwflymineorg last accessed January 13 2016) withthe following parameters for GO enrichment and KEGGReactome pathway enrichment BenjaminindashHochberg FDRcorrection P lt 005 background frac14 all D melanogastergenes in our full PPI network and Ontology (for GO enrich-ment only) frac14 biological process or molecular function

Cross-Study Comparison Overlap with Our PreviousWork

Following on from the comparison between our gene list andthe candidate genes suggested from the literature we quan-titatively evaluated the degree of overlap between the candi-date genes identified in this study with alleles mapped in ourprevious study of flies collected from the same geographic

12

region (Telonis-Scott et al 2012) The ldquooverlaprdquo analysis wasperformed similarly for the GWAS candidate gene and net-work-level approaches First the gene list overlap was testedbetween the two studies using Fisherrsquos exact test where the382 genes identified from the candidate SNPs (this study)were compared with 416 genes identified from candidatesingle feature polymorphisms (Telonis-Scott et al 2012)The contingency table was constructed using the numberof annotated genes in the D melanogaster R553 genomebuilt as an approximation for the total number of genestested in each study and took the following form

where A frac14 gene list from Telonis-Scott et al (2012)n frac14 416 B frac14 gene list from this study n frac14 382 N frac14 totalnumber of genes in D melanogaster reference genome r553n frac14 17106

To annotate genome features in the overlap candidategene list we used the SNP data (this study) which betterresolves alleles than array-based genotyping Due to this lim-itation we did not compare exact genomic coordinates be-tween the studies but considered overlap to occur whendifferentiated alleles were detected in the same gene inboth studies We did however examine the allele frequenciesaround the Dys-RF promoter which was sequenced sepa-rately revealing a selective sweep in the experimental evolu-tion lines (Telonis-Scott et al 2012)

The overlapping candidates were tested for functional en-richment using a basic gene list approach As genes ratherthan SNPs were analyzed here Fly Mine v400 (wwwflymineorg) was applied with the following parameters Gene lengthnormalization BenjaminindashHochberg FDR correction Plt 005Ontology frac14 biological process and default background (allGO-annotated D melanogaster genes)

Finally we constructed a second PPI network from thecandidate genes mapped in Telonis-Scott et al (2012) to ex-amine system-level overlap between the two studies The 416genes mapped to 337 protein seeds and the network wasconstructed as described above The degree of network over-lap between the two studies was tested using simulation Ineach iteration a gene set of the same length as the list fromthis study and a gene set of the same length as the list fromTelonis-Scott et al (2012) were resampled randomly from theentire D melanogaster mapped gene annotation (r553) Eachresampled set was then used to build a first-order network aspreviously described The overlap network between the tworesampled sets was calculated using the graphintersectioncommand from the igraph package and zero-degree nodes(proteins with no interactions) were removed Overlap wasquantified by counting the number of nodes and the numberof edges in this overlap network The observed overlap wascompared with a simulated null distribution for both nodeand edge number where n frac14 1000 simulation iterations

Supplementary MaterialSupplementary figures S1ndashS4 and tables S1ndashS5 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

We thank Robert Good for the fly sampling and JenniferShirriffs and Lea and Alan Rako for outstanding assistancewith laboratory culture and substantial desiccation assaysWe are grateful to Melissa Davis for advice on network build-ing and comparison approaches and to five anonymous re-viewers whose comments improved the manuscript Thiswork was supported by a DECRA fellowship DE120102575and Monash University Technology voucher scheme toMTS an ARC Future Fellowship and Discovery Grants toCMS an ARC Laureate Fellowship to AAH and a Scienceand Industry Endowment Fund grant to CMS and AAH

ReferencesAlexander JM Lomvardas S 2014 Nuclear architecture as an epigenetic

regulator of neural development and function Neuroscience26439ndash50

Anderson AR Hoffmann AA McKechnie SW Umina PA Weeks AR2005 The latitudinal cline in the In(3R)Payne inversion polymor-phism has shifted in the last 20 years in Australian Drosophilamelanogaster populations Mol Ecol 14851ndash858

Andolfatto P Wall JD Kreitman M 1999 Unusual haplotype structureat the proximal breakpoint of In(2L)t in a natural population ofDrosophila melanogaster Genetics 1531297ndash1311

Andrews S 2014 FastQC Available from httpwwwbioinformaticsbbsrcacukprojectsfastqc

Ashburner M Ball CA Blake JA Botstein D Butler H Cherry JM DavisAP Dolinski K Dwight SS Eppig JT et al 2000 Gene ontology toolfor the unification of biology The Gene Ontology Consortium NatGenet 2525ndash29

Aytes A Mitrofanova A Lefebvre C Alvarez Mariano J Castillo-MartinM Zheng T Eastham James A Gopalan A Pienta Kenneth J ShenMM et al 2014 Cross-species regulatory network analysis identifiesa synergistic interaction between FOXM1 and CENPF that drivesprostate cancer malignancy Cancer Cell 25638ndash651

Bastide H Betancourt A Nolte V Tobler R Steuroobe P Futschik ASchleurootterer C 2013 A genome-wide fine-scale map of natural pig-mentation variation in Drosophila melanogaster PLoS Genet9e1003534

Bergland AO Behrman EL OrsquoBrien KR Schmidt PS Petrov DA 2014Genomic evidence of rapid and stable adaptive oscillations overseasonal time scales in Drosophila PLoS Genet 10e1004775

Bonebrake TC Mastrandrea MD 2010 Tolerance adaptation and pre-cipitation changes complicate latitudinal patterns of climate changeimpacts Proc Natl Acad Sci U S A 10712581ndash12586

Bradley TJ 2009 Animal osmoregulation Oxford Oxford UniversityPress

Burke MK Liti G Long AD 2014 Standing genetic variation drivesrepeatable experimental evolution in outcrossing populations ofSaccharomyces cerevisiae Mol Biol Evol 313228ndash3239

Chown SL 2012 Trait-based approaches to conservation physiologyforecasting environmental change risks from the bottom upPhilos Trans R Soc B 3671615ndash1627

Chown SL Nicolson SW 2004 Insect physiological ecology mechanismsand patterns Oxford Oxford University Press

Chown SL Soslashrensen JG Terblanche JS 2011 Water loss in insectsan environmental change perspective J Insect Physiol571070ndash1084

Not in GWAS In GWAS

Not in Telonis-Scott et al (2012) N ndash intersect-AndashB A-intersect

In Telonis-Scott et al (2012) B-intersect Intersect(AB)

13

Chung H Loehlin DW Dufour HD Vaccarro K Millar JG Carroll SB2014 A single gene affects both ecological divergence and matechoice in Drosophila Science 3431148ndash1151

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly(Austin) 680ndash92

Civelek M Lusis AJ 2014 Systems genetics approaches to understandcomplex traits Nat Rev Genet 1534ndash48

Clusella-Trullas S Blackburn TM Chown SL 2011 Climatic predictors oftemperature performance curve parameters in ectotherms implycomplex responses to climate change Am Nat 177738ndash751

Csardi G Nepusz T 2006 The igraph software package for complexnetwork research InterJournal Complex Systems 1695 Availablefrom httpigraphorg

Davies SA Cabrero P Overend G Aitchison L Sebastian S Terhzaz SDow JA 2014 Cell signalling mechanisms for insect stress toleranceJ Exp Biol 217119ndash128

Davies SA Cabrero P Povsic M Johnston NR Terhzaz S Dow JA 2013Signaling by Drosophila capa neuropeptides Gen Comp Endocrinol18860ndash66

Davies SA Overend G Sebastian S Cundall M Cabrero P Dow JATerhzaz S 2012 Immune and stress response lsquocross-talkrsquo in theDrosophila Malpighian tubule J Insect Physiol 58488ndash497

Day JP Dow JA Houslay MD Davies SA 2005 Cyclic nucleotide phos-phodiesterases in Drosophila melanogaster Biochem J 388333ndash342

de Nadal E Ammerer G Posas F 2011 Controlling gene expression inresponse to stress Nat Rev Genet 12833ndash845

DePristo MA Banks E Poplin R Garimella KV Maguire JR Hartl CPhilippakis AA del Angel G Rivas MA Hanna M et al 2011 Aframework for variation discovery and genotyping using next-gen-eration DNA sequencing data Nat Genet 43491ndash498

Djawdan M Chippindale AK Rose MR Bradley TJ 1998 Metabolic re-serves and evolved stress resistance in Drosophila melanogasterPhysiol Zool 71584ndash594

Durham MF Magwire MM Stone EA Leips J 2014 Genome-wide anal-ysis in Drosophila reveals age-specific effects of SNPs on fitness traitsNat Commun 54338

Emilsson V Thorleifsson G Zhang B Leonardson AS Zink F Zhu JCarlson S Helgason A Walters GB Gunnarsdottir S et al 2008Genetics of gene expression and its effect on disease Nature452423ndash428

Fabian DK Kapun M Nolte V Kofler R Schmidt PS Schleurootterer C FlattT 2012 Genome-wide patterns of latitudinal differentiation amongpopulations of Drosophila melanogaster from North America MolEcol 214748ndash4769

Foley BR Telonis-Scott M 2011 Quantitative genetic analysis suggestscausal association between cuticular hydrocarbon composition anddesiccation survival in Drosophila melanogaster Heredity 10668ndash77

Folk DG Han C Bradley TJ 2001 Water acquisition and partitioning inDrosophila melanogaster effects of selection for desiccation-resis-tance J Exp Biol 2043323ndash3331

Fowler MA Montell C 2013 Drosophila TRP channels and animal be-havior Life Sci 92394ndash403

Fu J Keurentjes JJB Bouwmeester H America T Verstappen FWA WardJL Beale MH de Vos RCH Dijkstra M Scheltema RA et al 2009System-wide molecular evidence for phenotypic buffering inArabidopsis Nat Genet 41166ndash167

Futschik A Schleurootterer C 2010 The next generation of molecular mar-kers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Gibbs AG Fukuzato F Matzkin LM 2003 Evolution of water conserva-tion mechanisms in Drosophila J Exp Biol 2061183ndash1192

Gibbs AG Matzkin LM 2001 Evolution of water balance in the genusDrosophila J Exp Biol 2042331ndash2338

Goodstadt L 2010 Ruffus a lightweight Python library for computa-tional pipelines Bioinformatics 262778ndash2779

Hadley NF 1994 Water relations of terrestrial arthropods San Diego(CA) Academic Press

Hoffmann AA Hallas R Sinclair C Mitrovski P 2001 Levels of variationin stress resistance in Drosophila among strains local populationsand geographic regions patterns for desiccation starvation coldresistance and associated traits Evolution 551621ndash1630

Hoffmann AA Parsons PA 1989a An integrated approach to environ-mental stress tolerance and life-history variation desiccation toler-ance in Drosophila Biol J Linn Soc 37117ndash136

Hoffmann AA Parsons PA 1989b Selection for increased desiccationresistance in Drosophila melanogaster additive genetic control andcorrelated responses for other stresses Genetics 122837ndash845

Hoffmann AA Sgro CM 2011 Climate change and evolutionary adap-tation Nature 470479ndash485

Hoffmann AA Weeks AR 2007 Climatic selection on genes and traitsafter a 100 year-old invasion a critical look at the temperate-tropicalclines in Drosophila melanogaster from eastern Australia Genetica129133ndash147

Huang W Richards S Carbone MA Zhu D Anholt RR Ayroles JFDuncan L Jordan KW Lawrence F Magwire MM et al 2012Epistasis dominates the genetic architecture of Drosophila quanti-tative traits Proc Natl Acad Sci U S A 10915553ndash15559

Huang Z Tunnacliffe A 2004 Response of human cells to desiccationcomparison with hyperosmotic stress response J Physiol558181ndash191

Huang Z Tunnacliffe A 2005 Gene induction by desiccation stress inhuman cell cultures FEBS Lett 5794973ndash4977

Ichihashi Y Aguilar-Martınez JA Farhi M Chitwood DH Kumar RMillon LV Peng J Maloof JN Sinha NR 2014 Evolutionary develop-mental transcriptomics reveals a gene network module regulatinginterspecific diversity in plant leaf shape Proc Natl Acad Sci U S A111E2616ndashE2621

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Johnson WA Carder JW 2012 Drosophila nociceptors mediate larvalaversion to dry surface environments utilizing both the painless TRPchannel and the DEGENaC subunit PPK1 PLoS One 7e32878

Kahsai L Kapan N Dircksen H Winther AM Nassel DR 2010 Metabolicstress responses in Drosophila are modulated by brain neurosecre-tory cells that produce multiple neuropeptides PLoS One 5e11480

Kapun M van Schalkwyk H McAllister B Flatt T Schleurootterer C 2014Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogasterMol Ecol 231813ndash1827

Kawano T Shimoda M Matsumoto H Ryuda M Tsuzuki S Hayakawa Y2010 Identification of a gene Desiccate contributing to desicca-tion resistance in Drosophila melanogaster J Biol Chem28538889ndash38897

Kellermann V van Heerwaarden B Sgro CM Hoffmann AA 2009Fundamental evolutionary limits in ecological traits driveDrosophila species distributions Science 3251244ndash1246

Kofler R Nolte V Schleurootterer C 2016 The impact of library preparationprotocols on the consistency of allele frequency estimates in Pool-Seq data Mol Ecol Resour 16118ndash122

Kofler R Pandey RV Schleurootterer C 2011 PoPoolation2 identifying dif-ferentiation between populations using sequencing of pooled DNAsamples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schleurootterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics282084ndash2085

Leiserson MDM Eldridge JV Ramachandran S Raphael BJ 2013Network analysis of GWAS data Curr Opin Genet Dev 23602ndash610

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

14

Liu Y Luo J Carlsson MA Neuroassel DR 2015 Serotonin and insulin-likepeptides modulate leucokinin-producing neurons that affectfeeding and water homeostasis in Drosophila J Comp Neurol5231840ndash1863

Lohse M Bolger AM Nagel A Fernie AR Lunn JE Stitt M Usadel B 2012RobiNA a user-friendly integrated software solution for RNA-Seq-based transcriptomics Nucleic Acids Res 40W622ndashW627

Long A Liti G Luptak A Tenaillon O 2015 Elucidating the moleculararchitecture of adaptation via evolve and resequence experimentsNat Rev Genet 16567ndash582

Lynch M Bost D Wilson S Maruki T Harrison S 2014 Population-genetic inference from pooled-sequencing data Genome Biol Evol61210ndash1218

Mackay TFC Stone EA Ayroles JF 2009 The genetics of quantitativetraits challenges and prospects Nat Rev Genet 10565ndash577

Marron MT Markow TA Kain KJ Gibbs AG 2003 Effects of starvationand desiccation on energy metabolism in desert and mesicDrosophila J Insect Physiol 49261ndash270

Matzkin LM Markow TA 2009 Transcriptional regulation of metabo-lism associated with the increased desiccation resistance of thecactophilic Drosophila mojavensis Genetics 1821279ndash1288

Murali T Pacifico S Yu J Guest S Roberts GG Finley RL Jr 2011 DroID2011 a comprehensive integrated resource for protein transcrip-tion factor RNA and gene interactions for Drosophila Nucleic AcidsRes 39D736ndashD743

Nuzhdin SV Turner TL 2013 Promises and limitations of hitchhikingmapping Curr Opin Genet Dev 23694ndash699

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Picard Picard A set of Java command line tools for manipulating high-throughput seqeuncing (HTS) data and formats Available fromhttpbroadinstitutegithubiopicard

Qiu Y Tittiger C Wicker-Thomas C Le Goff G Young S Wajnberg EFricaux T Taquet N Blomquist GJ Feyereisen R 2012 An insect-specific P450 oxidative decarbonylase for cuticular hydrocarbon bio-synthesis Proc Natl Acad Sci U S A 10914858ndash14863

Rajpurohit S Oliveira CC Etges WJ Gibbs AG 2013 Functional genomicand phenotypic responses to desiccation in natural populations of adesert Drosophilid Mol Ecol 222698ndash2715

Sarup P Soslashrensen JG Kristensen TN Hoffmann AA Loeschcke V PaigeKN Soslashrensen P 2011 Candidate genes detected in transcriptomestudies are strongly dependent on genetic background PLoS One6e15644

Schiffer M Hangartner S Hoffmann AA 2013 Assessing the relativeimportance of environmental effects carry-over effects and speciesdifferences in thermal stress resistance a comparison ofDrosophilids across field and laboratory generations J Exp Biol2163790ndash3798

Schleurootterer C Kofler R Versace E Tobler R Franssen SU 2015Combining experimental evolution with next-generation sequen-cing A powerful tool to study adaptation from standing geneticvariation Heredity 114431ndash440

Sinclair BJ Gibbs AG Roberts SP 2007 Gene transcription during ex-posure to and recovery from cold and desiccation stress inDrosophila melanogaster Insect Mol Biol 16435ndash443

Sloggett C Wakefield M Philip G Pope B 2014 Rubramdashflexible distrib-uted pipelines figshare Available from httpdxdoiorg106084m9figshare895626

Smit AFA Hubley R Green P 2013-2015 RepeatMasker Open-403Available from httpwwwrepeatmaskerorg

Seurooderberg JA Birse RT Neuroassel DR 2011 Insulin production and signalingin renal tubules of Drosophila is under control of tachykinin-relatedpeptide and regulates stress resistance PLoS One 6e19866

Soslashrensen JG Nielsen MM Loeschcke V 2007 Gene expression profileanalysis of Drosophila melanogaster selected for resistance to envi-ronmental stressors J Evol Biol 201624ndash1636

St Pierre SE Ponting L Stefancsik R McQuilton P 2014 FlyBase102mdashadvanced approaches to interrogating FlyBase Nucleic AcidsRes 42D780ndashD788

Stevenson M Nunes T Heuser C Marshall J Sanchez J Thornton RReiczigel J Robison-Cox J Sebastiani P Solymos P et al 2015 epiRTools for the analysis of epidemiological data R package version09-62 Available from httpCRANR-projectorgpackagefrac14epiR

Storey JD Tibshirani R 2003 Statistical significance for genomewidestudies Proc Natl Acad Sci U S A 1009440ndash9445

Telonis-Scott M Gane M DeGaris S Sgro CM Hoffmann AA 2012 Highresolution mapping of candidate alleles for desiccation resistancein Drosophila melanogaster under selection Mol Biol Evol291335ndash1351

Telonis-Scott M Guthridge KM Hoffmann AA 2006 A new set oflaboratory-selected Drosophila melanogaster lines for the analysisof desiccation resistance response to selection physiology and cor-related responses J Exp Biol 2091837ndash1847

Terhzaz S Cabrero P Robben JH Radford JC Hudson BD Milligan GDow JA Davies SA 2012 Mechanism and function of Drosophilacapa GPCR a desiccation stress-responsive receptor with functionalhomology to human neuromedinU receptor PLoS One 7e29897

Terhzaz S Overend G Sebastian S Dow JA Davies SA 2014 The Dmelanogaster capa-1 neuropeptide activates renal NF-kB signalingPeptides 53218ndash224

Terhzaz S Teets NM Cabrero P Henderson L Ritchie MG Nachman RJDow JA Denlinger DL Davies SA 2015 Insect capa neuropeptidesimpact desiccation and cold tolerance Proc Natl Acad Sci U S A1122882ndash2887

Thorat LJ Gaikwad SM Nath BB 2012 Trehalose as an indicator ofdesiccation stress in Drosophila melanogaster larvae a potentialmarker of anhydrobiosis Biochem Biophys Res Commun419638ndash642

Tobler R Franssen SU Kofler R Orozco-terWengel P Nolte VHermisson J Schl eurootterer C 2014 Massive habitat-specific ge-nomic response in D melanogaster populations during exper-imental evolution in hot and cold environments Mol Biol Evol31364ndash375

Turner TL Bourne EC Von Wettberg EJ Hu TT Nuzhdin SV 2010Population resequencing reveals local adaptation of Arabidopsislyrata to serpentine soils Nat Genet 42260ndash263

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Vermeulen CJ Soslashrensen P Kirilova Gagalova K Loeschcke V 2013Transcriptomic analysis of inbreeding depression in cold-sensitiveDrosophila melanogaster shows upregulation of the immune re-sponse J Evol Biol 261890ndash1902

Wang B Goode J Best J Meltzer J Schilman PE Chen J Garza D ThomasJB Montminy M 2008 The insulin-regulated CREB coactivatorTORC promotes stress resistance in Drosophila Cell Metab7434ndash444

Wang J Kean L Yang J Allan AK Davies SA Herzyk P Dow JA 2004Function-informed transcriptome analysis of Drosophila renaltubule Genome Biol 5R69

Wang K Li M Hakonarson H 2010 Analysing biological pathways ingenome-wide association studies Nat Rev Genet 11843ndash854

Wang L Jia P Wolfinger RD Chen X Zhao Z 2011 Gene set analysis ofgenome-wide association studies methodological issues and per-spectives Genomics 981ndash8

Weber AL Khan GF Magwire MM Tabor CL Mackay TF Anholt RR2012 Genome-wide association analysis of oxidative stress resistancein Drosophila melanogaster PLoS One 7e34745

Weeks AR McKechnie SW Hoffmann AA 2002 Dissecting adaptiveclinal variation markers inversions and sizestress associations inDrosophila melanogaster from a central field population Ecol Lett5756ndash763

Zhu Y Bergland AO Gonzalez J Petrov DA 2012 Empirical validation ofpooled whole genome population re-sequencing in Drosophila mel-anogaster PLoS One 7e41901

15

Page 6: Cross-Study Comparison Reveals Common Genomic, Network, … · 2016-02-06 · resistance using a high-throughput sequencing Pool-GWAS (genome-wide association study) approach ( Bastide

out of the 382 and 416 genes for the current and 2012 studyrespectively (Fisherrsquos exact test for overlap P lt 2 1016odds ratio frac14 590 supplementary table S4 SupplementaryMaterial online) Bearing gene length bias in mind we exam-ined the average length of genes in both studies as well as thegene overlap to the average gene length in the Drosophilagenome (supplementary fig S3 Supplementary Materialonline) Although there was bias toward longer genes in theoverlap list this was also apparent in the full gene lists fromboth studies

The overlap genes mapped to all of the major chromo-somes and their differentiated variants were mostly noncod-ing SNPs located in introns (supplementary table S4Supplementary Material online) Two SNPs were synonymous(Strn-Mlick coding for a protein kinase and Cht7 chitinbinding supplementary table S4 Supplementary Materialonline) while a 30UTR variant was predicted to putativelyalter protein structure in a nondisruptive manner mostlikely impacting expression andor protein translational effi-ciency (jbug response to stimulus supplementary table S4Supplementary Material online) One nonsynonymous SNPwas predicted to alter protein function as a missense variant(fab1 signaling supplementary table S4 SupplementaryMaterial online) The increase in frequency of the favoreddesiccation alleles compared with the controls ranged from5 to 27 with a median increase of 164

Despite the strong overlap there were important differ-ences between the two studies For example our previouswork revealed evidence of a selective sweep in the 50 pro-moter region of the Dys gene in the artificially selected linesand we confirmed a higher frequency of the selected SNPs inan independent natural association study in resistant fliesfrom Coffs Harbour (Telonis-Scott et al 2012) In this studywe found no differentiation at the Dys locus between ourresistant and random controls However both resistant andcontrol pools had a high frequency of the selected allele de-tected in Telonis-Scott et al (2012) (supplementary table S5Supplementary Material online)

The 45 genes common to both studies were analyzed forbiological function while many genes are obviously pleiotro-pic (ie assigned multiple GO terms) almost half were in-volved in response to stimuli (cellular behavioral andregulatory) including signaling such as Rho protein signaltransduction and cAMP-mediated signaling (table 3) Othercategories of interest include spiracle morphogenesis sensoryorgan development including development of the compoundeye and stem cell and neuroblast differentiation (table 3)

With regard to specific stress responsedesiccation candi-dates (table 1) 4 of the 45 genes overlapped including thecAMPcGMP signaling phosphodiesterases pde9 and dncMAPK pathway genes Mtl and klu indicated in NF-kBorthologue signaling in the renal tubules (table 1)Furthermore there was overlap among mechanistic catego-ries where different genes in the same pathways weredetected For example additional key cAMPcGMP hydrolyz-ing phosphodiesterases pde1c and pde11 were significant inTelonis-Scott et al (2012) as well as slpr (MAPK pathway)and poly (insulin signaling) (table 1)

Table 2 Pathway Enrichment of the GWAS and Telonis-Scott et alFirst-Order PPI Networks Investigated Using FlyMine (FDR P lt 005)

Current Study GWAS Telonis-Scott et al(2012)

Database Pathway Pathway

KEGG Ribosome RibosomeNotch signaling pathway Notch signaling pathwayMAPK signaling pathwayDorso-ventral axis formationProgesterone-mediated oocyte

maturationWnt signaling pathwayEndocytosisProteasome

Reactome Immune system Immune systemAdaptive immune system Adaptive immune systemInnate immune system Innate immune systemSignaling by interleukins Signaling by interleukins(Toll-like receptor cascades) Toll-like receptor

cascadesTCR signaling(B cell receptor signaling)

Gene expression Gene expressionMetabolism of RNA Metabolism of RNANonsense-mediated decay Nonsense-mediated decayTranscriptionmRNA processing(RNA transport)Metabolism of proteins (Metabolism of

proteins)Translation (Translation)Cell cycle Cell cycle

Cell cycle checkpointsRegulation of mitotic cell

cycleMitotic metaphase and

anaphaseG0 and early G1 G0 and early G1

G1 phase(G1S transition)G2M transition(DNA replicationS

phase)DNA repairDouble-strand break

repairSignaling pathways Signaling pathwaysInsulin signaling pathwaySignaling by NGF Signaling by NGF

Signaling by FGFRSignaling by NOTCHSignaling by WntSignaling by EGFRDevelopmentAxon guidanceMembrane traffickingGap junction trafficking

and regulationProgramed cell deathApoptosisHemostasisHemostasis

NOTEmdashKEGG and Reactome pathway databases were queried All significant termsare reported with Reactome terms grouped together under parent terms in italictext (full list of terms available in supplementary table S3 Supplementary Materialonline) Parent terms without brackets are the names of pathways that were sig-nificantly enriched parent terms in parentheses were not listed as significantlyenriched themselves but are reported as a guide to the category contents

6

In addition to the significant gene-level overlap betweenthe two studies we observed significant overlap between thePPI networks constructed from each set of candidate genes(summarized in fig 2) We examined the overlap by compar-ing the first-order networks 1) at the level of node overlap and2) at the level of edge (interaction) overlap Three hundredseven GWAS seeds formed a network of 3324 nodes whileour previous set of candidates mapped to 337 seeds forming anetwork that comprised 3215 nodes One thousand eighthundred twenty-six or ~55 of the nodes overlapped be-tween networks which was significantly higher than the nulldistribution estimated from 1000 simulations (mean simu-lated overlap SD 1304 143 nodes observed overlap atthe 999th percentile fig 2A and C) The overlap at the inter-action level was not significantly higher than expected (meansimulated overlap SD 25704 4024 edges observed over-lap 30686 edges 89th percentile fig 2B and D) Hive plotswere used to visualize the separate networks (supplementaryfig S4A and B Supplementary Material online) as well as theoverlap between the networks (supplementary fig S4CSupplementary Material online)

Given the high degree of shared nodes from distinct setsof protein seeds we compared the networks for biologicalfunction (fig 2E) Both networks produced long lists of sig-nificantly enriched GO terms (supplementary table S3Supplementary Material online) and assessing overlap was

difficult due to the nonindependent nature of GO hierar-chies However a comparison of functional pathway enrich-ment revealed some differences Although both networksshowed enrichment in the immune system pathways theribosome and Notch and NGF signaling pathways as well assome involvement of gene expression and translation path-ways the network constructed from the Telonis-Scott et al(2012) gene list showed extensive involvement of cell cyclepathway proteins as well as significant enrichment for Wntsignaling EGRF signaling DNA double-strand break repairaxon guidance gap junction trafficking and apoptosis path-ways which were not enriched in the GWAS network TheGWAS network was specifically enriched for dorso-ventralaxis formation oocyte maturation and many more geneexpression pathways than the Telonis-Scott et al (2012)gene network (table 2 and supplementary table S3Supplementary Material online)

Discussion

Genomic Signature of Desiccation Resistance fromNatural Standing Genetic Variation

We report the first large-scale genome-wide screen for nat-ural alleles associated with survival of low humidity condi-tions in D melanogaster By combining a powerful naturalassociation study with high-throughput Pool-seq we

Table 3 Overrepresentation of GO Categories for the 45 Core Candidate Genes Overlapping between the Current and Our Previous(Telonis-Scott et al 2012) Genomic Study

Term ID TermSubterms FDR P valuea Genes

Cellularbehavioral response to stimulus defense and signaling

GO0050896 Response to stimulus (cellular response tostimulus regulation of response tostimulus)

0007 Abd-B dnc fz mam nkd slo ct fru kluRbf klg santa-maria jbug fab1CG33275 Mtl Trim9 cv-c Gprk2 Ect4CG43658

GO0048585 Negative regulation of response to stimu-lus (Rho protein signal transductionsignal transduction)

0042 dnc mam slo fru Trim9 cv-c

GO0023052 Signaling (cAMP-mediated signaling cellcommunication)

0031 Amph Pde9 dnc fz mam nkd ct kluRbf santa-maria fab1 CG33275 MtlTrim9 cv-c Gprk2 Ect4 CG43658

Development

GO0048863 Stem cell differentiation 0014 Abd-D mam nkd klu nub

GO0014016 Neuroblast differentiation 0048

GO0045165 Cell fate commitment 0010 Abd-D dnc fz mam nkd ct klu klg nub

GO0035277 Spiracle morphogenesis open trachealsystem

0010 Abd-D ct cv-c

GO0007423 Sensory organ development (eye wingdisc imaginal disc)

0008 fz mam ct klu klg sdk Amph Nckx30cMtl

GO0048736 Appendage development 0020 fz mam ct fru CG33275 Mtl Fili nubcv-c Gprk2 CG43658

GO0007552 Metamorphosis 0001 fz mam ct fru CG33275 Mtl Fili cv-cGprk2 CG43658

Cellular component organization

GO0030030 Cell projection organization 0020 dnc fz ct fru GluClalpha Nckx30c MtlTrim9 nub cv-c

NOTEmdashNote that most genes are assigned to multiple annotations and therefore are shown purely as a guide to putative functionaBonferronindashHochberg FDR and gene length corrected P values using the Flymine v400 database Gene Ontology Enrichment The terms were condensed and trimmed usingREVIGO

7

mapped SNPs associated with desiccation resistance in 500naturally derived genotypes Although Pool-seq has limitedpower to reveal very low frequency alleles to reconstructphase or adequately estimate allele frequencies in smallpools (Lynch et al 2014) the Pool-GWAS approach hasbeen validated by retrieval of known candidate genes in-volved in body pigmentation using similar numbers of ge-notypes as assessed here (Bastide et al 2013)

Although studies in controlled genetic backgrounds reportlarge effects of single genes on desiccation resistance (Table 1)our current and previous screens revealed similarly largenumbers of variants (over 600) mapping to over 300 genesA polygenic basis of desiccation resistance is consistent withquantitative genetic data for the trait (Foley and Telonis-Scott2011) and we also anticipate more complexity from naturalD melanogaster populations with greater standing geneticvariation than inbred laboratory strains However somefalse positive associations are likely in the GWAS due to thelimitations of quantitative trait mapping approaches Pooledevolve-and-resequence experiments routinely yield vast num-bers of candidate SNPs partly due to high levels of standinglinkage disequilibrium (LD) and hitchhiking neutral alleles

(Schleurootterer et al 2015) Large population sizes from specieswith low levels of LD such as D melanogaster somewhat cir-cumvent this issue and a notable feature of our experimentaldesign is our sampling Here we selected the most desicca-tion-resistant phenotypes directly from substantial standinggenetic variation where one generation of recombinationshould largely reflect natural haplotypes

We observed the largest candidate allele frequency differ-ences in loci that had intermediate allele frequencies in thecontrol pool This is in line with population genetics modelswhich predict that evolution from standing genetic variationwill initially be strongest for intermediate frequency alleleswhich can facilitate rapid change (Long et al 2015) Howeverwe also observed a number of low frequency alleles which ifbeneficial can inflate false positives due to long-range LD(Orozco-terWengel et al 2012 Nuzhdin and Turner 2013Tobler et al 2014) Nonetheless spurious LD associationscannot fully account for our candidate list as evidenced bythe desiccation functional candidates and cross-study over-lap Furthermore chromosome inversion dynamics can alsogenerate both short and long-range LD and contribute nu-merous false positive SNP phenotype associations (Hoffmann

Fig 2 Observed versus simulated network overlap between the GWAS first-order PPI network and the first-order PPI network from the Telonis-Scottet al (2012) gene list Overlap is presented in terms of (A) node number and (B) edge number Networks were simulated by resampling from the entireDrosophila melanogaster gene list (r553) as described in the text The histogram shows the distribution of overlap measures for 1000 simulations Theblack line represents the histogram as density and the blue line shows the corresponding normal distribution The red vertical line shows the observedoverlap from the real networks labeled as a percentile of the normal distribution Area-proportional Venn diagrams summarized the extent of overlapfor PPI networks constructed separately from the GWAS candidates and Telonis-Scott et al (2012) candidates (labeled ldquoprevious studyrdquo) (C) Overlapexpressed as the number of nodes Approximately 55 of the nodes overlapped between the two studies which was significantly higher than simulatedexpectations (999 percentile 2B) (D) Overlap expressed in the number of interactions This was not significantly higher than expected by simulation(89th percentile 2B) (E) Overlap at the level of GO biological process terms (directly compared GO terms by name this was not tested statisticallybecause of the complex hierarchical nature of GO terms and is presented for interest)

8

and Weeks 2007 Tobler et al 2014) but we found little as-sociation between the desiccation pool and inversion fre-quency changes consistent with the lack of latitudinalpatterns in desiccation resistanceand traitinversion fre-quency associations in east Australian D melanogaster(Hoffmann et al 2001 Weeks et al 2002)

Cross-Study Comparison Reveals a Core Set ofCandidates for a Complex Trait

Our data contain many candidate genes functions and path-ways for insect desiccation resistance but partitioning genu-ine adaptive loci from experimental noise remains asignificant challenge We therefore focus our efforts on ournovel comparative analysis where we identified 45 robustcandidates for further exploration Common genetic signa-tures are rarely documented in cross-study comparisons ofcomplex traits (Sarup et al 2011 Huang et al 2012 but cfVermeulen et al 2013) and never before for desiccation re-sistance Some degree of overlap between our studies likelyresults from a shared genetic background from similar sam-pling sites highlighting the replicability of these responsesOne limitation of this approach is the overpopulation ofthe ldquocorerdquo list by longer genes This result reflects the genelength bias inherent to GWAS approaches an issue that is notstraightforward to correct in a GWAS let alone in the com-parative framework applied here A focus on the commongenes functionally linked to the desiccation phenotype canprovide a meaningful framework for future research and alsosuggests that some large-effect candidate genes exhibit natu-ral variation contributing to the phenotype (table 1)

Other feasible candidates include genes expressed inthe MTs (Wang et al 2004) or essential for MT develop-ment (St Pierre et al 2014) providing further researchopportunity focused on the primary stress sensing andfluid secretion tissues Genes include one of the mosthighly expressed MT transcripts CG7084 as well as dncNckx30C slo cv-c and nkd Other aspects of stress resis-tance are also implicated CrebB for example is involvedin regulating insulin-regulated stress resistance and ther-mosensory behavior (Wang et al 2008) nkd also impactslarval cuticle development and shares a PDZ signalingdomain also present in the desiccation candidate desi(Kawano et al 2010 St Pierre et al 2014) Finally severalcore candidates were consistently reported in recent stud-ies suggesting generalized roles in stress responses andfitness including transcriptome studies on MT function(Wang et al 2004) inbreeding and cold resistance(Vermeulen et al 2013) oxidative stress (Weber et al2012) and genomic studies exploring age-specific SNPson fitness traits (Durham et al 2014) and genes likelyunder spatial selection in flies from climatically divergenthabitats (Fabian et al 2012)

Several of the enriched GO categories from the core listalso make biological sense including stimulus response(almost half the suite) defense and signaling particularlyin pathways involved in MT stress sensing (Davies et al 20122013 2014) Functional categories enriched for sensory

organ development were highlighted consistent with envi-ronmental sensing categories from desert Drosophila tran-scriptomics (Matzkin and Markow 2009 Rajpurohit et al2013) and stress detection via phototransduction pathwaysin D melanogaster under a range of stressors including des-iccation (Soslashrensen et al 2007)

Evidence for Conserved Signatures at Higher LevelOrganization

Beyond the gene level we observed a striking degree ofsimilarity between this study and that of Telonis-Scottet al (2012) at the network level where significantly over-lapping protein networks were constructed from largely dif-ferent seed protein sets This supports a scenario where theldquobiological information flow from DNA to phenotyperdquo(Civelek and Lusis 2014) contains inherent redundancywhere alternative genetic solutions underlie phenotypesand functions Resistance to perturbation by genetic or en-vironmental variation appears to increase with increasinghierarchical complexity that is phenotype ldquobufferingrdquo (Fuet al 2009) The same functional network in different pop-ulationsgenetic backgrounds can be impacted but theexact genes and SNPs responding to perturbation will notnecessarily overlap In Drosophila different sets of loci fromthe same founders mapped to the same PPI network in thecase of two complex traits chill coma recovery and startleresponse (Huang et al 2012) Furthermore interspecificshared networkspathways from different genes have beendocumented for numerous complex traits (Emilsson et al2008 Aytes et al 2014 Ichihashi et al 2014) providing op-portunities for investigating taxa where individual genefunction is poorly understood

Although the null distribution is not always obviouswhen comparing networks and their functions partly be-cause of reduced sensitivity from missing interactions(Leiserson et al 2013) here the shared network signaturesportray a plausible multifaceted stress response involvingstress sensing rapid gene accessibility and expression Thepredominance of immune system and stress responsepathways are of interest with reference immunitystresspathway ldquocross-talkrdquo particularly in the MTs whereabiotic stressors elicit strong immunity transcriptionalresponses (Davies et al 2012) The higher order architec-tures reflect crucial aspects of rapid gene expression con-trol in response to stress from chromatin organization(Alexander and Lomvardas 2014) to transcription RNAdecay and translation (reviwed in de Nadal et al 2011) InDrosophila variation in transcriptional regulation of indi-vidual genes can underpin divergent desiccation pheno-types (Chung et al 2014) or be functionally inferred frompatterns of many genes elicited during desiccation stress(Rajpurohit et al 2013) Variants with potential to mod-ulate expression at different stages of gene expressioncontrol were evident in our GWAS data alone for exam-ple enrichment for 50UTR and splice variants at the SNPlevel and for chromatin and chromosome structure termsat the functional level

9

Genetic Overlap Versus Genetic differencesmdashCausesand Implications

Despite the common signatures at the gene and higher orderlevels between our studies pervasive differences were ob-served These can arise from differences in standing geneticvariation in the natural populations where different alleliccompositions impact gene-by-environment interactions andepistasis Our network overlap analyses highlight the potentialfor different molecular trajectories to result in similar func-tions (Leiserson et al 2013) consistent with the multiplephysiological pathways potentially available to D melanoga-ster under water stress (Chown et al 2011) Experimentaldesign also presumably contributed to the study differencesStrong directional selection as applied in the 2012 study canincrease the frequency of rare beneficial alleles and likely re-duced overall diversity than did the single-generation GWASdesign Furthermore selection regimes often impose addi-tional selective pressures (eg food deprivation during desic-cation) resulting in correlated responses such as starvationresistance with desiccation resistance and these factors differfrom the laboratory to the field Finally evidence suggests innatural D melanogaster that standing genetic variation canvary extensively with sampling season (Itoh et al 2010Bergland et al 2014) Although we cannot compare levelsof standing variation between our two studies evidence sug-gests that this may have affected the Dys gene where ourGWAS approach revealed no differentiation between thecontrol and resistant pools at this locus but rather detectedthe alleles associated with the selective sweep in Telonis-Scottet al (2012) in both pools at high frequency Whether thisresulted from sampling the later collection after a longdrought (2004ndash2008 wwwbomgovauclimate last accessedJanuary 13 2016) is unclear but this observation highlightsthe relevance of temporal sampling to standing genetic var-iation when attempting to link complex phenotypes togenotypes

Materials and Methods

Natural Population Sampling

Drosophila melanogaster was collected from vineyard waste atKilchorn Winery Romsey Australia in April 2010 Over 1200isofemale lines were established in the laboratory from wild-caught females Species identification was conducted on maleF1 to remove Drosophila simulans contamination and over900 D melanogaster isofemale lines were retained All flieswere maintained in vials of dextrose cornmeal media at25 C and constant light

Desiccation Assay

From each isofemale line ten inseminated F1 females weresorted by aspiration without CO2 and held in vials until 4ndash5days old For the desiccation assay 10 females per line (over9000 flies in total) were screened by placing groups of femalesinto empty vials covered with gauze and randomly assigninglines into 5 large sealed glass tanks containing trays of silicadesiccant (relative humidity [RH] lt10) The tanks werescored for mortality hourly and the final 558 surviving

individuals were selected for the desiccation-resistant pool(558 flies lt6 of the total) The control pool constitutedthe same number of females sampled from over 200 ran-domly chosen isofemale lines that were frozen prior to thedesiccation assay We chose this random control approachinstead of comparing the phenotypic extremes (ie the latemortality tail vs the ldquoearly mortalityrdquo tail) because unfavor-able environmental conditions experienced by the wild-caught dam may inflate environmental variance due to car-ryover effects in the F1 offspring (Schiffer et al 2013)

DNA Extraction and Sequencing

We used pooled whole-genome sequencing (Pool-seq)(Futschik and Schleurootterer 2010) to estimate and compareallele frequency differences between the most desiccation-resistant and randomly sampled flies from a wild populationto associate naturally segregating candidate SNPs with theresponse to low humidity For each treatment (resistantand control) genomic DNA was extracted from 50 femaleheads using the Qiagen DNeasy Blood and Tissue Kit accord-ing to the manufacturerrsquos instructions (Qiagen HildenGermany) Two extractions were combined to create fivepools per treatment which were barcoded separately tocreate technical replicates (total 10 pools of 100 fliesn frac14 1000) The DNA was fragmented using a CovarisS2 ma-chine (Covaris Inc Woburn MA) and libraries were pre-pared from 1 mg genomic DNA (per pool of 100 flies) usingthe Illumina TruSeq Prep module (Illumina San Diego CA)To minimize the effect of variation across lanes the ten poolswere combined in equal concentration (fig 3) and run mul-tiplexed in each lane of a full flow cell of an Illumina HiSeq2000 sequencer Clusters were generated using the TruSeq PECluster Kit v5 on a cBot and sequenced using Illumina TruSeqSBS v3 chemistry (Illumina) Fragmentation library construc-tion and sequencing were performed at the Micromon Next-Gen Sequencing facility (Monash University ClaytonAustralia)

Fig 3 Experimental design See text for further explanation

10

Data Processing and Mapping

Processing was performed on a high-performance computingcluster using the Rubra pipeline system that makes use of theRuffus Python library (Goodstadt 2010) The final variant-calling pipeline was based on a pipeline developed by ClareSloggett (Sloggett et al 2014) and is publicly available athttpsgithubcomgriffinpGWAS_pipeline (last accessedJanuary 13 2016) Raw sequence reads were processed persample per lane with P1 and P2 read files processed sepa-rately initially Trimming was performed with trimmomatic v030 (Lohse et al 2012) Adapter sequences were also re-moved Leading and trailing bases with a quality scorebelow 30 were trimmed and a sliding window (width frac14 10quality threshold frac14 25) was used Reads shorter than 40 bpafter trimming were discarded Quality before and after trim-ming was examined using FastQC v 0101 (Andrews 2014)

Posttrimming reads that remained paired were used asinput into the alignment step Alignment to the D melano-gaster reference genome version r553 was performed withbwa v 062 (Li and Durbin 2009) using the bwa aln commandand the following options No seeding (-l 150) 1 rate ofmissing alignments given 2 uniform base error rate (-n001) a maximum of 2 gap opens per read (-o 2) a maximumof 12 gap extensions per read (-e 12) and long deletionswithin 12 bp of 30 end disallowed (-d 12) The default valueswere used for all other options Output files were then con-verted to sorted bam files using the bwa sampe commandand the ldquoSortSamrdquo option in Picard v 196

At this point the bam files were merged into one file persample using PicardMerge Duplicates were identified withPicard MarkDuplicates The tool RealignerTargetCreator inGATK v 26-5 (DePristo et al 2011) was applied to the tenbam files simultaneously producing a list of intervals contain-ing probable indels around which local realignment would beperformed This was used as input for the IndelRealigner toolwhich was also applied to all ten bam files simultaneously

To investigate the variation among technical replicatesallele frequency was estimated based on read counts foreach technical replicate separately for 100000 randomly cho-sen SNPs across the genome After excluding candidate des-iccation-resistance SNPs and SNPs that may have been falsepositives due to sequencing error (those with mean frequencylt004 across replicates) 1803 SNPs remained For each locuswe calculated the variance in allele frequency estimate amongthe five control replicates and the variance among the fivedesiccation-resistance replicates We compared this with thevariance due to pool category (control vs desiccation resis-tant) using analysis of variance (ANOVA) We also calculatedthe mean pairwise difference in allele frequency estimateamong control and among desiccation-resistance replicatesand the mean concordance correlation coefficient was calcu-lated over all pairwise comparisons within pool category(using the epiccc function in the epiR package Stevensonet al 2015)

The following approach was then taken to identify SNPsdiffering between the desiccation-resistant and controlgroups Based on the technical replicate analysis we merged

the five ldquodesiccation-resistantrdquo and the five random controlsamples into two bam files A pileup file was created withsamtools v 0119 (Li et al 2009) and converted to a ldquosyncrdquo fileusing the mpileup2sync script in Popoolation2 v 1201 (Kofleret al 2011) retaining reads with a mapping quality of at least20 and bases with a minimum quality score of 20 Reads withunmapped mates were also retained (option ndashA in samtoolsmpileup) For this step we used a repeat-masked version ofthe reference genome created with RepeatMasker 403 (Smitet al 2013-2015) and a repeat file containing all annotatedtransposons from D melanogaster including shared ancestralsequences (Flybase release r557) plus all repetitive elementsfrom RepBase release v1904 for D melanogaster Simple re-peats small RNA genes and bacterial insertions were notmasked (options ndashnolow ndashnorna ndashno_is) The rmblastsearch engine was used

Association Analysis

We then performed Fisherrsquos exact test with the Fisher testscript in Popoolation2 (Kofler et al 2011) We set the min-imum count (of the minor allele) frac14 10 minimum coverage(in both populations) frac14 20 and maximum coverage frac1410000 The test was performed for each SNP (slidingwindow off) Although there are numerous approaches tocalculating genome-wide significance thresholds in aPoolseq framework currently there is no consensus on anappropriate standardized statistical method Methods in-clude simulation approaches (Orozco-terWengel et al2012 Bastide et al 2013 Burke et al 2014) correctionusing a null distribution based on estimating genetic driftfrom replicated artificial selection line allele frequency vari-ance (Turner and Miller 2012) use of a simple thresholdbased on a minimum allele frequency difference (Turneret al 2010) and the FDR correction suggested by Storeyand Tibshirani (2003) (used by Fabian et al 2012)

We determined based on our design to calculate the FDRthreshold by comparison with a simulated distribution of nullP values following the approach of Bastide et al (2013) Foreach SNP locus a P value was calculated by simulating thedistribution of read counts between the major and minorallele according to a beta-binomial distribution with meana(a+b) and variance (ab)((a+b)2(a+b+1)) Thismodel allowed for two types of sampling variationStochastic variation in allele sampling across the phenotypesand the background variation in the representation of allelesin each pool caused by library preparation artifacts or sto-chastic variation driven by unequal coverage of the poolsThese sources of variation have also been recognized else-where as being important in interpreting pooled sequenceallele calls and identifying significant differentiation betweenpooled samples (Lynch et al 2014) The value of afrac14 37 waschosen by chi-square test as the best match to the observedP value distribution (afrac14 10 to afrac14 40 were tested) with acorresponding bfrac14 4315 to reflect the coverage depth differ-ence between the random control (487) and desiccation-resistant (568) pools These values were then used tosimulate a null data set 10 the size of the observed data

11

set Because the null P value distribution still did not fit theobserved P value distribution particularly well (supplemen-tary fig S2 Supplementary Material online) these values werenot used for a standard FDR correction Instead a P valuethreshold was calculated based on the lowest 005 of thenull P values similar in approach to Orozco-terWengel et al(2012)

Inversion Analysis

In D melanogaster several cosmopolitan inversion polymor-phisms show cross-continent latitudinal patterns that explaina substantial proportion of clinal variation for many traitsincluding thermal tolerance (reviewed in Hoffmann andWeeks 2007) Patterns of disequilibria generated betweenloci in the vicinity of the rearrangement can obscure signalsof selection (Hoffmann and Weeks 2007) and inversion fre-quencies are routinely considered in genomic studies ofnatural populations sampled from known latitudinal clines(Fabian et al 2012 Kapun et al 2014) We assessedassociations between inversion frequencies and our controland resistant pools at SNPs diagnostic for the following in-versions In3R(Payne) (Anderson et al 2005 Kapun et al2014) In(2L)t (Andolfatto et al 1999 Kapun et al 2014)In(2R)Ns In(3L)P In(3R)C In(3R)K and In(3R)Mo (Kapunet al 2014) Between one and five diagnostic SNPs were in-vestigated for each inversion (depending on availability) Weconsidered an inversion to be associated with desiccationresistance if the frequency of its diagnostic SNP differed sig-nificantly between the control and desiccation-resistant poolsby a Fisherrsquos exact test

Genome Feature Annotation

The list of candidate SNPs was converted to vcf format using acustom R script setting the alternate allele as the allele thatincreased in frequency from the control to the desiccation-resistant pool and setting the reference allele as the otherallele present in cases where both the major and minor allelesdiffered from the reference genome The genomic features inwhich the candidate SNPs occurred were annotated usingsnpEff v40 first creating a database from the D melanogasterv553 genome annotation with the default approach(Cingolani et al 2012) The default annotation settings wereused except for setting the parameter ldquo-ud 0rdquo to annotateSNPs to genes only if they were within the gene body Using2 tests on SNP counts we tested for over- or under-enrichment of feature types by calculating the proportionof lsquoallrsquo features from all SNPs detected in the study and theproportion of ldquocandidaterdquo SNPs identified as differentiated indesiccation-resistant flies (Fabian et al 2012)

GO Analysis

GO analyses was used to associate the candidates with theirbiological functions (Ashburner et al 2000) Previously devel-oped for biological pathway analysis of global gene expressionstudies (Wang et al 2010) typical GO analysis assumes that allgenes are sampled independently with equal probabilityGWAS violate these criteria where SNPs are more likely to

occur in longer genes than in shorter genes resulting in highertype I error rates in GO categories overpopulated with longgenes Gene length as well as overlapping gene biases (Wanget al 2011) were accounted for using Gowinda (Kofler andSchleurootterer 2012) which implements a permutation strategyto reduce false positives Gowinda does not reconstruct LDbetween SNPs but operates in two modes underpinned bytwo extreme assumptions 1) Gene mode (assumes all SNPs ina gene are in LD) and 2) SNP mode (assumes all SNPs in agene are completely independent)

The candidates were examined in both modes but giventhe stringency of gene mode high-resolution SNP modeoutput was utilized for the final analysis with the followingparameters 106 simulations minimum significance frac141 andminimum genes per GO categoryfrac14 3 (to exclude gene-poorcategories) P values were corrected using the Gowinda FDRalgorithm GO annotations from Flybase r557 were applied

Candidate Gene Analysis

In conjunction with the GO analysis we also examinedknown candidate genes associated with survival under lowRH to better characterize possible biological functions of ourGWAS candidates We took a conservative approach to can-didate gene assignment where the SNPs were mapped withinthe entire gene region without up- and downstream exten-sions Genes were curated from FlyBase and the literature(table 1) predominantly using detailed functional experimen-tal evidence as criteria for functional desiccation candidates

Network Analysis

We next obtained a higher level summary of the candidategene list using PPI network analysis The full candidate genelist was used to construct a first-order interaction networkusing all Drosophila PPIs listed in the Drosophila InteractionDatabase v 2014_10 (avoiding interologs) (Murali et al 2011)This ldquofullrdquo PPI network includes manually curated data frompublished literature and experimental data for 9633 proteinsand 93799 interactions For the network construction theigraph R package was used to build a subgraph from the fullPPI network containing the candidate proteins and their first-order interacting proteins (Csardi and Nepusz 2006) Self-connections were removed Functional enrichment analysiswas conducted on the list of network genes using FlyMinev400 (wwwflymineorg last accessed January 13 2016) withthe following parameters for GO enrichment and KEGGReactome pathway enrichment BenjaminindashHochberg FDRcorrection P lt 005 background frac14 all D melanogastergenes in our full PPI network and Ontology (for GO enrich-ment only) frac14 biological process or molecular function

Cross-Study Comparison Overlap with Our PreviousWork

Following on from the comparison between our gene list andthe candidate genes suggested from the literature we quan-titatively evaluated the degree of overlap between the candi-date genes identified in this study with alleles mapped in ourprevious study of flies collected from the same geographic

12

region (Telonis-Scott et al 2012) The ldquooverlaprdquo analysis wasperformed similarly for the GWAS candidate gene and net-work-level approaches First the gene list overlap was testedbetween the two studies using Fisherrsquos exact test where the382 genes identified from the candidate SNPs (this study)were compared with 416 genes identified from candidatesingle feature polymorphisms (Telonis-Scott et al 2012)The contingency table was constructed using the numberof annotated genes in the D melanogaster R553 genomebuilt as an approximation for the total number of genestested in each study and took the following form

where A frac14 gene list from Telonis-Scott et al (2012)n frac14 416 B frac14 gene list from this study n frac14 382 N frac14 totalnumber of genes in D melanogaster reference genome r553n frac14 17106

To annotate genome features in the overlap candidategene list we used the SNP data (this study) which betterresolves alleles than array-based genotyping Due to this lim-itation we did not compare exact genomic coordinates be-tween the studies but considered overlap to occur whendifferentiated alleles were detected in the same gene inboth studies We did however examine the allele frequenciesaround the Dys-RF promoter which was sequenced sepa-rately revealing a selective sweep in the experimental evolu-tion lines (Telonis-Scott et al 2012)

The overlapping candidates were tested for functional en-richment using a basic gene list approach As genes ratherthan SNPs were analyzed here Fly Mine v400 (wwwflymineorg) was applied with the following parameters Gene lengthnormalization BenjaminindashHochberg FDR correction Plt 005Ontology frac14 biological process and default background (allGO-annotated D melanogaster genes)

Finally we constructed a second PPI network from thecandidate genes mapped in Telonis-Scott et al (2012) to ex-amine system-level overlap between the two studies The 416genes mapped to 337 protein seeds and the network wasconstructed as described above The degree of network over-lap between the two studies was tested using simulation Ineach iteration a gene set of the same length as the list fromthis study and a gene set of the same length as the list fromTelonis-Scott et al (2012) were resampled randomly from theentire D melanogaster mapped gene annotation (r553) Eachresampled set was then used to build a first-order network aspreviously described The overlap network between the tworesampled sets was calculated using the graphintersectioncommand from the igraph package and zero-degree nodes(proteins with no interactions) were removed Overlap wasquantified by counting the number of nodes and the numberof edges in this overlap network The observed overlap wascompared with a simulated null distribution for both nodeand edge number where n frac14 1000 simulation iterations

Supplementary MaterialSupplementary figures S1ndashS4 and tables S1ndashS5 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

We thank Robert Good for the fly sampling and JenniferShirriffs and Lea and Alan Rako for outstanding assistancewith laboratory culture and substantial desiccation assaysWe are grateful to Melissa Davis for advice on network build-ing and comparison approaches and to five anonymous re-viewers whose comments improved the manuscript Thiswork was supported by a DECRA fellowship DE120102575and Monash University Technology voucher scheme toMTS an ARC Future Fellowship and Discovery Grants toCMS an ARC Laureate Fellowship to AAH and a Scienceand Industry Endowment Fund grant to CMS and AAH

ReferencesAlexander JM Lomvardas S 2014 Nuclear architecture as an epigenetic

regulator of neural development and function Neuroscience26439ndash50

Anderson AR Hoffmann AA McKechnie SW Umina PA Weeks AR2005 The latitudinal cline in the In(3R)Payne inversion polymor-phism has shifted in the last 20 years in Australian Drosophilamelanogaster populations Mol Ecol 14851ndash858

Andolfatto P Wall JD Kreitman M 1999 Unusual haplotype structureat the proximal breakpoint of In(2L)t in a natural population ofDrosophila melanogaster Genetics 1531297ndash1311

Andrews S 2014 FastQC Available from httpwwwbioinformaticsbbsrcacukprojectsfastqc

Ashburner M Ball CA Blake JA Botstein D Butler H Cherry JM DavisAP Dolinski K Dwight SS Eppig JT et al 2000 Gene ontology toolfor the unification of biology The Gene Ontology Consortium NatGenet 2525ndash29

Aytes A Mitrofanova A Lefebvre C Alvarez Mariano J Castillo-MartinM Zheng T Eastham James A Gopalan A Pienta Kenneth J ShenMM et al 2014 Cross-species regulatory network analysis identifiesa synergistic interaction between FOXM1 and CENPF that drivesprostate cancer malignancy Cancer Cell 25638ndash651

Bastide H Betancourt A Nolte V Tobler R Steuroobe P Futschik ASchleurootterer C 2013 A genome-wide fine-scale map of natural pig-mentation variation in Drosophila melanogaster PLoS Genet9e1003534

Bergland AO Behrman EL OrsquoBrien KR Schmidt PS Petrov DA 2014Genomic evidence of rapid and stable adaptive oscillations overseasonal time scales in Drosophila PLoS Genet 10e1004775

Bonebrake TC Mastrandrea MD 2010 Tolerance adaptation and pre-cipitation changes complicate latitudinal patterns of climate changeimpacts Proc Natl Acad Sci U S A 10712581ndash12586

Bradley TJ 2009 Animal osmoregulation Oxford Oxford UniversityPress

Burke MK Liti G Long AD 2014 Standing genetic variation drivesrepeatable experimental evolution in outcrossing populations ofSaccharomyces cerevisiae Mol Biol Evol 313228ndash3239

Chown SL 2012 Trait-based approaches to conservation physiologyforecasting environmental change risks from the bottom upPhilos Trans R Soc B 3671615ndash1627

Chown SL Nicolson SW 2004 Insect physiological ecology mechanismsand patterns Oxford Oxford University Press

Chown SL Soslashrensen JG Terblanche JS 2011 Water loss in insectsan environmental change perspective J Insect Physiol571070ndash1084

Not in GWAS In GWAS

Not in Telonis-Scott et al (2012) N ndash intersect-AndashB A-intersect

In Telonis-Scott et al (2012) B-intersect Intersect(AB)

13

Chung H Loehlin DW Dufour HD Vaccarro K Millar JG Carroll SB2014 A single gene affects both ecological divergence and matechoice in Drosophila Science 3431148ndash1151

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly(Austin) 680ndash92

Civelek M Lusis AJ 2014 Systems genetics approaches to understandcomplex traits Nat Rev Genet 1534ndash48

Clusella-Trullas S Blackburn TM Chown SL 2011 Climatic predictors oftemperature performance curve parameters in ectotherms implycomplex responses to climate change Am Nat 177738ndash751

Csardi G Nepusz T 2006 The igraph software package for complexnetwork research InterJournal Complex Systems 1695 Availablefrom httpigraphorg

Davies SA Cabrero P Overend G Aitchison L Sebastian S Terhzaz SDow JA 2014 Cell signalling mechanisms for insect stress toleranceJ Exp Biol 217119ndash128

Davies SA Cabrero P Povsic M Johnston NR Terhzaz S Dow JA 2013Signaling by Drosophila capa neuropeptides Gen Comp Endocrinol18860ndash66

Davies SA Overend G Sebastian S Cundall M Cabrero P Dow JATerhzaz S 2012 Immune and stress response lsquocross-talkrsquo in theDrosophila Malpighian tubule J Insect Physiol 58488ndash497

Day JP Dow JA Houslay MD Davies SA 2005 Cyclic nucleotide phos-phodiesterases in Drosophila melanogaster Biochem J 388333ndash342

de Nadal E Ammerer G Posas F 2011 Controlling gene expression inresponse to stress Nat Rev Genet 12833ndash845

DePristo MA Banks E Poplin R Garimella KV Maguire JR Hartl CPhilippakis AA del Angel G Rivas MA Hanna M et al 2011 Aframework for variation discovery and genotyping using next-gen-eration DNA sequencing data Nat Genet 43491ndash498

Djawdan M Chippindale AK Rose MR Bradley TJ 1998 Metabolic re-serves and evolved stress resistance in Drosophila melanogasterPhysiol Zool 71584ndash594

Durham MF Magwire MM Stone EA Leips J 2014 Genome-wide anal-ysis in Drosophila reveals age-specific effects of SNPs on fitness traitsNat Commun 54338

Emilsson V Thorleifsson G Zhang B Leonardson AS Zink F Zhu JCarlson S Helgason A Walters GB Gunnarsdottir S et al 2008Genetics of gene expression and its effect on disease Nature452423ndash428

Fabian DK Kapun M Nolte V Kofler R Schmidt PS Schleurootterer C FlattT 2012 Genome-wide patterns of latitudinal differentiation amongpopulations of Drosophila melanogaster from North America MolEcol 214748ndash4769

Foley BR Telonis-Scott M 2011 Quantitative genetic analysis suggestscausal association between cuticular hydrocarbon composition anddesiccation survival in Drosophila melanogaster Heredity 10668ndash77

Folk DG Han C Bradley TJ 2001 Water acquisition and partitioning inDrosophila melanogaster effects of selection for desiccation-resis-tance J Exp Biol 2043323ndash3331

Fowler MA Montell C 2013 Drosophila TRP channels and animal be-havior Life Sci 92394ndash403

Fu J Keurentjes JJB Bouwmeester H America T Verstappen FWA WardJL Beale MH de Vos RCH Dijkstra M Scheltema RA et al 2009System-wide molecular evidence for phenotypic buffering inArabidopsis Nat Genet 41166ndash167

Futschik A Schleurootterer C 2010 The next generation of molecular mar-kers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Gibbs AG Fukuzato F Matzkin LM 2003 Evolution of water conserva-tion mechanisms in Drosophila J Exp Biol 2061183ndash1192

Gibbs AG Matzkin LM 2001 Evolution of water balance in the genusDrosophila J Exp Biol 2042331ndash2338

Goodstadt L 2010 Ruffus a lightweight Python library for computa-tional pipelines Bioinformatics 262778ndash2779

Hadley NF 1994 Water relations of terrestrial arthropods San Diego(CA) Academic Press

Hoffmann AA Hallas R Sinclair C Mitrovski P 2001 Levels of variationin stress resistance in Drosophila among strains local populationsand geographic regions patterns for desiccation starvation coldresistance and associated traits Evolution 551621ndash1630

Hoffmann AA Parsons PA 1989a An integrated approach to environ-mental stress tolerance and life-history variation desiccation toler-ance in Drosophila Biol J Linn Soc 37117ndash136

Hoffmann AA Parsons PA 1989b Selection for increased desiccationresistance in Drosophila melanogaster additive genetic control andcorrelated responses for other stresses Genetics 122837ndash845

Hoffmann AA Sgro CM 2011 Climate change and evolutionary adap-tation Nature 470479ndash485

Hoffmann AA Weeks AR 2007 Climatic selection on genes and traitsafter a 100 year-old invasion a critical look at the temperate-tropicalclines in Drosophila melanogaster from eastern Australia Genetica129133ndash147

Huang W Richards S Carbone MA Zhu D Anholt RR Ayroles JFDuncan L Jordan KW Lawrence F Magwire MM et al 2012Epistasis dominates the genetic architecture of Drosophila quanti-tative traits Proc Natl Acad Sci U S A 10915553ndash15559

Huang Z Tunnacliffe A 2004 Response of human cells to desiccationcomparison with hyperosmotic stress response J Physiol558181ndash191

Huang Z Tunnacliffe A 2005 Gene induction by desiccation stress inhuman cell cultures FEBS Lett 5794973ndash4977

Ichihashi Y Aguilar-Martınez JA Farhi M Chitwood DH Kumar RMillon LV Peng J Maloof JN Sinha NR 2014 Evolutionary develop-mental transcriptomics reveals a gene network module regulatinginterspecific diversity in plant leaf shape Proc Natl Acad Sci U S A111E2616ndashE2621

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Johnson WA Carder JW 2012 Drosophila nociceptors mediate larvalaversion to dry surface environments utilizing both the painless TRPchannel and the DEGENaC subunit PPK1 PLoS One 7e32878

Kahsai L Kapan N Dircksen H Winther AM Nassel DR 2010 Metabolicstress responses in Drosophila are modulated by brain neurosecre-tory cells that produce multiple neuropeptides PLoS One 5e11480

Kapun M van Schalkwyk H McAllister B Flatt T Schleurootterer C 2014Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogasterMol Ecol 231813ndash1827

Kawano T Shimoda M Matsumoto H Ryuda M Tsuzuki S Hayakawa Y2010 Identification of a gene Desiccate contributing to desicca-tion resistance in Drosophila melanogaster J Biol Chem28538889ndash38897

Kellermann V van Heerwaarden B Sgro CM Hoffmann AA 2009Fundamental evolutionary limits in ecological traits driveDrosophila species distributions Science 3251244ndash1246

Kofler R Nolte V Schleurootterer C 2016 The impact of library preparationprotocols on the consistency of allele frequency estimates in Pool-Seq data Mol Ecol Resour 16118ndash122

Kofler R Pandey RV Schleurootterer C 2011 PoPoolation2 identifying dif-ferentiation between populations using sequencing of pooled DNAsamples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schleurootterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics282084ndash2085

Leiserson MDM Eldridge JV Ramachandran S Raphael BJ 2013Network analysis of GWAS data Curr Opin Genet Dev 23602ndash610

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

14

Liu Y Luo J Carlsson MA Neuroassel DR 2015 Serotonin and insulin-likepeptides modulate leucokinin-producing neurons that affectfeeding and water homeostasis in Drosophila J Comp Neurol5231840ndash1863

Lohse M Bolger AM Nagel A Fernie AR Lunn JE Stitt M Usadel B 2012RobiNA a user-friendly integrated software solution for RNA-Seq-based transcriptomics Nucleic Acids Res 40W622ndashW627

Long A Liti G Luptak A Tenaillon O 2015 Elucidating the moleculararchitecture of adaptation via evolve and resequence experimentsNat Rev Genet 16567ndash582

Lynch M Bost D Wilson S Maruki T Harrison S 2014 Population-genetic inference from pooled-sequencing data Genome Biol Evol61210ndash1218

Mackay TFC Stone EA Ayroles JF 2009 The genetics of quantitativetraits challenges and prospects Nat Rev Genet 10565ndash577

Marron MT Markow TA Kain KJ Gibbs AG 2003 Effects of starvationand desiccation on energy metabolism in desert and mesicDrosophila J Insect Physiol 49261ndash270

Matzkin LM Markow TA 2009 Transcriptional regulation of metabo-lism associated with the increased desiccation resistance of thecactophilic Drosophila mojavensis Genetics 1821279ndash1288

Murali T Pacifico S Yu J Guest S Roberts GG Finley RL Jr 2011 DroID2011 a comprehensive integrated resource for protein transcrip-tion factor RNA and gene interactions for Drosophila Nucleic AcidsRes 39D736ndashD743

Nuzhdin SV Turner TL 2013 Promises and limitations of hitchhikingmapping Curr Opin Genet Dev 23694ndash699

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Picard Picard A set of Java command line tools for manipulating high-throughput seqeuncing (HTS) data and formats Available fromhttpbroadinstitutegithubiopicard

Qiu Y Tittiger C Wicker-Thomas C Le Goff G Young S Wajnberg EFricaux T Taquet N Blomquist GJ Feyereisen R 2012 An insect-specific P450 oxidative decarbonylase for cuticular hydrocarbon bio-synthesis Proc Natl Acad Sci U S A 10914858ndash14863

Rajpurohit S Oliveira CC Etges WJ Gibbs AG 2013 Functional genomicand phenotypic responses to desiccation in natural populations of adesert Drosophilid Mol Ecol 222698ndash2715

Sarup P Soslashrensen JG Kristensen TN Hoffmann AA Loeschcke V PaigeKN Soslashrensen P 2011 Candidate genes detected in transcriptomestudies are strongly dependent on genetic background PLoS One6e15644

Schiffer M Hangartner S Hoffmann AA 2013 Assessing the relativeimportance of environmental effects carry-over effects and speciesdifferences in thermal stress resistance a comparison ofDrosophilids across field and laboratory generations J Exp Biol2163790ndash3798

Schleurootterer C Kofler R Versace E Tobler R Franssen SU 2015Combining experimental evolution with next-generation sequen-cing A powerful tool to study adaptation from standing geneticvariation Heredity 114431ndash440

Sinclair BJ Gibbs AG Roberts SP 2007 Gene transcription during ex-posure to and recovery from cold and desiccation stress inDrosophila melanogaster Insect Mol Biol 16435ndash443

Sloggett C Wakefield M Philip G Pope B 2014 Rubramdashflexible distrib-uted pipelines figshare Available from httpdxdoiorg106084m9figshare895626

Smit AFA Hubley R Green P 2013-2015 RepeatMasker Open-403Available from httpwwwrepeatmaskerorg

Seurooderberg JA Birse RT Neuroassel DR 2011 Insulin production and signalingin renal tubules of Drosophila is under control of tachykinin-relatedpeptide and regulates stress resistance PLoS One 6e19866

Soslashrensen JG Nielsen MM Loeschcke V 2007 Gene expression profileanalysis of Drosophila melanogaster selected for resistance to envi-ronmental stressors J Evol Biol 201624ndash1636

St Pierre SE Ponting L Stefancsik R McQuilton P 2014 FlyBase102mdashadvanced approaches to interrogating FlyBase Nucleic AcidsRes 42D780ndashD788

Stevenson M Nunes T Heuser C Marshall J Sanchez J Thornton RReiczigel J Robison-Cox J Sebastiani P Solymos P et al 2015 epiRTools for the analysis of epidemiological data R package version09-62 Available from httpCRANR-projectorgpackagefrac14epiR

Storey JD Tibshirani R 2003 Statistical significance for genomewidestudies Proc Natl Acad Sci U S A 1009440ndash9445

Telonis-Scott M Gane M DeGaris S Sgro CM Hoffmann AA 2012 Highresolution mapping of candidate alleles for desiccation resistancein Drosophila melanogaster under selection Mol Biol Evol291335ndash1351

Telonis-Scott M Guthridge KM Hoffmann AA 2006 A new set oflaboratory-selected Drosophila melanogaster lines for the analysisof desiccation resistance response to selection physiology and cor-related responses J Exp Biol 2091837ndash1847

Terhzaz S Cabrero P Robben JH Radford JC Hudson BD Milligan GDow JA Davies SA 2012 Mechanism and function of Drosophilacapa GPCR a desiccation stress-responsive receptor with functionalhomology to human neuromedinU receptor PLoS One 7e29897

Terhzaz S Overend G Sebastian S Dow JA Davies SA 2014 The Dmelanogaster capa-1 neuropeptide activates renal NF-kB signalingPeptides 53218ndash224

Terhzaz S Teets NM Cabrero P Henderson L Ritchie MG Nachman RJDow JA Denlinger DL Davies SA 2015 Insect capa neuropeptidesimpact desiccation and cold tolerance Proc Natl Acad Sci U S A1122882ndash2887

Thorat LJ Gaikwad SM Nath BB 2012 Trehalose as an indicator ofdesiccation stress in Drosophila melanogaster larvae a potentialmarker of anhydrobiosis Biochem Biophys Res Commun419638ndash642

Tobler R Franssen SU Kofler R Orozco-terWengel P Nolte VHermisson J Schl eurootterer C 2014 Massive habitat-specific ge-nomic response in D melanogaster populations during exper-imental evolution in hot and cold environments Mol Biol Evol31364ndash375

Turner TL Bourne EC Von Wettberg EJ Hu TT Nuzhdin SV 2010Population resequencing reveals local adaptation of Arabidopsislyrata to serpentine soils Nat Genet 42260ndash263

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Vermeulen CJ Soslashrensen P Kirilova Gagalova K Loeschcke V 2013Transcriptomic analysis of inbreeding depression in cold-sensitiveDrosophila melanogaster shows upregulation of the immune re-sponse J Evol Biol 261890ndash1902

Wang B Goode J Best J Meltzer J Schilman PE Chen J Garza D ThomasJB Montminy M 2008 The insulin-regulated CREB coactivatorTORC promotes stress resistance in Drosophila Cell Metab7434ndash444

Wang J Kean L Yang J Allan AK Davies SA Herzyk P Dow JA 2004Function-informed transcriptome analysis of Drosophila renaltubule Genome Biol 5R69

Wang K Li M Hakonarson H 2010 Analysing biological pathways ingenome-wide association studies Nat Rev Genet 11843ndash854

Wang L Jia P Wolfinger RD Chen X Zhao Z 2011 Gene set analysis ofgenome-wide association studies methodological issues and per-spectives Genomics 981ndash8

Weber AL Khan GF Magwire MM Tabor CL Mackay TF Anholt RR2012 Genome-wide association analysis of oxidative stress resistancein Drosophila melanogaster PLoS One 7e34745

Weeks AR McKechnie SW Hoffmann AA 2002 Dissecting adaptiveclinal variation markers inversions and sizestress associations inDrosophila melanogaster from a central field population Ecol Lett5756ndash763

Zhu Y Bergland AO Gonzalez J Petrov DA 2012 Empirical validation ofpooled whole genome population re-sequencing in Drosophila mel-anogaster PLoS One 7e41901

15

Page 7: Cross-Study Comparison Reveals Common Genomic, Network, … · 2016-02-06 · resistance using a high-throughput sequencing Pool-GWAS (genome-wide association study) approach ( Bastide

In addition to the significant gene-level overlap betweenthe two studies we observed significant overlap between thePPI networks constructed from each set of candidate genes(summarized in fig 2) We examined the overlap by compar-ing the first-order networks 1) at the level of node overlap and2) at the level of edge (interaction) overlap Three hundredseven GWAS seeds formed a network of 3324 nodes whileour previous set of candidates mapped to 337 seeds forming anetwork that comprised 3215 nodes One thousand eighthundred twenty-six or ~55 of the nodes overlapped be-tween networks which was significantly higher than the nulldistribution estimated from 1000 simulations (mean simu-lated overlap SD 1304 143 nodes observed overlap atthe 999th percentile fig 2A and C) The overlap at the inter-action level was not significantly higher than expected (meansimulated overlap SD 25704 4024 edges observed over-lap 30686 edges 89th percentile fig 2B and D) Hive plotswere used to visualize the separate networks (supplementaryfig S4A and B Supplementary Material online) as well as theoverlap between the networks (supplementary fig S4CSupplementary Material online)

Given the high degree of shared nodes from distinct setsof protein seeds we compared the networks for biologicalfunction (fig 2E) Both networks produced long lists of sig-nificantly enriched GO terms (supplementary table S3Supplementary Material online) and assessing overlap was

difficult due to the nonindependent nature of GO hierar-chies However a comparison of functional pathway enrich-ment revealed some differences Although both networksshowed enrichment in the immune system pathways theribosome and Notch and NGF signaling pathways as well assome involvement of gene expression and translation path-ways the network constructed from the Telonis-Scott et al(2012) gene list showed extensive involvement of cell cyclepathway proteins as well as significant enrichment for Wntsignaling EGRF signaling DNA double-strand break repairaxon guidance gap junction trafficking and apoptosis path-ways which were not enriched in the GWAS network TheGWAS network was specifically enriched for dorso-ventralaxis formation oocyte maturation and many more geneexpression pathways than the Telonis-Scott et al (2012)gene network (table 2 and supplementary table S3Supplementary Material online)

Discussion

Genomic Signature of Desiccation Resistance fromNatural Standing Genetic Variation

We report the first large-scale genome-wide screen for nat-ural alleles associated with survival of low humidity condi-tions in D melanogaster By combining a powerful naturalassociation study with high-throughput Pool-seq we

Table 3 Overrepresentation of GO Categories for the 45 Core Candidate Genes Overlapping between the Current and Our Previous(Telonis-Scott et al 2012) Genomic Study

Term ID TermSubterms FDR P valuea Genes

Cellularbehavioral response to stimulus defense and signaling

GO0050896 Response to stimulus (cellular response tostimulus regulation of response tostimulus)

0007 Abd-B dnc fz mam nkd slo ct fru kluRbf klg santa-maria jbug fab1CG33275 Mtl Trim9 cv-c Gprk2 Ect4CG43658

GO0048585 Negative regulation of response to stimu-lus (Rho protein signal transductionsignal transduction)

0042 dnc mam slo fru Trim9 cv-c

GO0023052 Signaling (cAMP-mediated signaling cellcommunication)

0031 Amph Pde9 dnc fz mam nkd ct kluRbf santa-maria fab1 CG33275 MtlTrim9 cv-c Gprk2 Ect4 CG43658

Development

GO0048863 Stem cell differentiation 0014 Abd-D mam nkd klu nub

GO0014016 Neuroblast differentiation 0048

GO0045165 Cell fate commitment 0010 Abd-D dnc fz mam nkd ct klu klg nub

GO0035277 Spiracle morphogenesis open trachealsystem

0010 Abd-D ct cv-c

GO0007423 Sensory organ development (eye wingdisc imaginal disc)

0008 fz mam ct klu klg sdk Amph Nckx30cMtl

GO0048736 Appendage development 0020 fz mam ct fru CG33275 Mtl Fili nubcv-c Gprk2 CG43658

GO0007552 Metamorphosis 0001 fz mam ct fru CG33275 Mtl Fili cv-cGprk2 CG43658

Cellular component organization

GO0030030 Cell projection organization 0020 dnc fz ct fru GluClalpha Nckx30c MtlTrim9 nub cv-c

NOTEmdashNote that most genes are assigned to multiple annotations and therefore are shown purely as a guide to putative functionaBonferronindashHochberg FDR and gene length corrected P values using the Flymine v400 database Gene Ontology Enrichment The terms were condensed and trimmed usingREVIGO

7

mapped SNPs associated with desiccation resistance in 500naturally derived genotypes Although Pool-seq has limitedpower to reveal very low frequency alleles to reconstructphase or adequately estimate allele frequencies in smallpools (Lynch et al 2014) the Pool-GWAS approach hasbeen validated by retrieval of known candidate genes in-volved in body pigmentation using similar numbers of ge-notypes as assessed here (Bastide et al 2013)

Although studies in controlled genetic backgrounds reportlarge effects of single genes on desiccation resistance (Table 1)our current and previous screens revealed similarly largenumbers of variants (over 600) mapping to over 300 genesA polygenic basis of desiccation resistance is consistent withquantitative genetic data for the trait (Foley and Telonis-Scott2011) and we also anticipate more complexity from naturalD melanogaster populations with greater standing geneticvariation than inbred laboratory strains However somefalse positive associations are likely in the GWAS due to thelimitations of quantitative trait mapping approaches Pooledevolve-and-resequence experiments routinely yield vast num-bers of candidate SNPs partly due to high levels of standinglinkage disequilibrium (LD) and hitchhiking neutral alleles

(Schleurootterer et al 2015) Large population sizes from specieswith low levels of LD such as D melanogaster somewhat cir-cumvent this issue and a notable feature of our experimentaldesign is our sampling Here we selected the most desicca-tion-resistant phenotypes directly from substantial standinggenetic variation where one generation of recombinationshould largely reflect natural haplotypes

We observed the largest candidate allele frequency differ-ences in loci that had intermediate allele frequencies in thecontrol pool This is in line with population genetics modelswhich predict that evolution from standing genetic variationwill initially be strongest for intermediate frequency alleleswhich can facilitate rapid change (Long et al 2015) Howeverwe also observed a number of low frequency alleles which ifbeneficial can inflate false positives due to long-range LD(Orozco-terWengel et al 2012 Nuzhdin and Turner 2013Tobler et al 2014) Nonetheless spurious LD associationscannot fully account for our candidate list as evidenced bythe desiccation functional candidates and cross-study over-lap Furthermore chromosome inversion dynamics can alsogenerate both short and long-range LD and contribute nu-merous false positive SNP phenotype associations (Hoffmann

Fig 2 Observed versus simulated network overlap between the GWAS first-order PPI network and the first-order PPI network from the Telonis-Scottet al (2012) gene list Overlap is presented in terms of (A) node number and (B) edge number Networks were simulated by resampling from the entireDrosophila melanogaster gene list (r553) as described in the text The histogram shows the distribution of overlap measures for 1000 simulations Theblack line represents the histogram as density and the blue line shows the corresponding normal distribution The red vertical line shows the observedoverlap from the real networks labeled as a percentile of the normal distribution Area-proportional Venn diagrams summarized the extent of overlapfor PPI networks constructed separately from the GWAS candidates and Telonis-Scott et al (2012) candidates (labeled ldquoprevious studyrdquo) (C) Overlapexpressed as the number of nodes Approximately 55 of the nodes overlapped between the two studies which was significantly higher than simulatedexpectations (999 percentile 2B) (D) Overlap expressed in the number of interactions This was not significantly higher than expected by simulation(89th percentile 2B) (E) Overlap at the level of GO biological process terms (directly compared GO terms by name this was not tested statisticallybecause of the complex hierarchical nature of GO terms and is presented for interest)

8

and Weeks 2007 Tobler et al 2014) but we found little as-sociation between the desiccation pool and inversion fre-quency changes consistent with the lack of latitudinalpatterns in desiccation resistanceand traitinversion fre-quency associations in east Australian D melanogaster(Hoffmann et al 2001 Weeks et al 2002)

Cross-Study Comparison Reveals a Core Set ofCandidates for a Complex Trait

Our data contain many candidate genes functions and path-ways for insect desiccation resistance but partitioning genu-ine adaptive loci from experimental noise remains asignificant challenge We therefore focus our efforts on ournovel comparative analysis where we identified 45 robustcandidates for further exploration Common genetic signa-tures are rarely documented in cross-study comparisons ofcomplex traits (Sarup et al 2011 Huang et al 2012 but cfVermeulen et al 2013) and never before for desiccation re-sistance Some degree of overlap between our studies likelyresults from a shared genetic background from similar sam-pling sites highlighting the replicability of these responsesOne limitation of this approach is the overpopulation ofthe ldquocorerdquo list by longer genes This result reflects the genelength bias inherent to GWAS approaches an issue that is notstraightforward to correct in a GWAS let alone in the com-parative framework applied here A focus on the commongenes functionally linked to the desiccation phenotype canprovide a meaningful framework for future research and alsosuggests that some large-effect candidate genes exhibit natu-ral variation contributing to the phenotype (table 1)

Other feasible candidates include genes expressed inthe MTs (Wang et al 2004) or essential for MT develop-ment (St Pierre et al 2014) providing further researchopportunity focused on the primary stress sensing andfluid secretion tissues Genes include one of the mosthighly expressed MT transcripts CG7084 as well as dncNckx30C slo cv-c and nkd Other aspects of stress resis-tance are also implicated CrebB for example is involvedin regulating insulin-regulated stress resistance and ther-mosensory behavior (Wang et al 2008) nkd also impactslarval cuticle development and shares a PDZ signalingdomain also present in the desiccation candidate desi(Kawano et al 2010 St Pierre et al 2014) Finally severalcore candidates were consistently reported in recent stud-ies suggesting generalized roles in stress responses andfitness including transcriptome studies on MT function(Wang et al 2004) inbreeding and cold resistance(Vermeulen et al 2013) oxidative stress (Weber et al2012) and genomic studies exploring age-specific SNPson fitness traits (Durham et al 2014) and genes likelyunder spatial selection in flies from climatically divergenthabitats (Fabian et al 2012)

Several of the enriched GO categories from the core listalso make biological sense including stimulus response(almost half the suite) defense and signaling particularlyin pathways involved in MT stress sensing (Davies et al 20122013 2014) Functional categories enriched for sensory

organ development were highlighted consistent with envi-ronmental sensing categories from desert Drosophila tran-scriptomics (Matzkin and Markow 2009 Rajpurohit et al2013) and stress detection via phototransduction pathwaysin D melanogaster under a range of stressors including des-iccation (Soslashrensen et al 2007)

Evidence for Conserved Signatures at Higher LevelOrganization

Beyond the gene level we observed a striking degree ofsimilarity between this study and that of Telonis-Scottet al (2012) at the network level where significantly over-lapping protein networks were constructed from largely dif-ferent seed protein sets This supports a scenario where theldquobiological information flow from DNA to phenotyperdquo(Civelek and Lusis 2014) contains inherent redundancywhere alternative genetic solutions underlie phenotypesand functions Resistance to perturbation by genetic or en-vironmental variation appears to increase with increasinghierarchical complexity that is phenotype ldquobufferingrdquo (Fuet al 2009) The same functional network in different pop-ulationsgenetic backgrounds can be impacted but theexact genes and SNPs responding to perturbation will notnecessarily overlap In Drosophila different sets of loci fromthe same founders mapped to the same PPI network in thecase of two complex traits chill coma recovery and startleresponse (Huang et al 2012) Furthermore interspecificshared networkspathways from different genes have beendocumented for numerous complex traits (Emilsson et al2008 Aytes et al 2014 Ichihashi et al 2014) providing op-portunities for investigating taxa where individual genefunction is poorly understood

Although the null distribution is not always obviouswhen comparing networks and their functions partly be-cause of reduced sensitivity from missing interactions(Leiserson et al 2013) here the shared network signaturesportray a plausible multifaceted stress response involvingstress sensing rapid gene accessibility and expression Thepredominance of immune system and stress responsepathways are of interest with reference immunitystresspathway ldquocross-talkrdquo particularly in the MTs whereabiotic stressors elicit strong immunity transcriptionalresponses (Davies et al 2012) The higher order architec-tures reflect crucial aspects of rapid gene expression con-trol in response to stress from chromatin organization(Alexander and Lomvardas 2014) to transcription RNAdecay and translation (reviwed in de Nadal et al 2011) InDrosophila variation in transcriptional regulation of indi-vidual genes can underpin divergent desiccation pheno-types (Chung et al 2014) or be functionally inferred frompatterns of many genes elicited during desiccation stress(Rajpurohit et al 2013) Variants with potential to mod-ulate expression at different stages of gene expressioncontrol were evident in our GWAS data alone for exam-ple enrichment for 50UTR and splice variants at the SNPlevel and for chromatin and chromosome structure termsat the functional level

9

Genetic Overlap Versus Genetic differencesmdashCausesand Implications

Despite the common signatures at the gene and higher orderlevels between our studies pervasive differences were ob-served These can arise from differences in standing geneticvariation in the natural populations where different alleliccompositions impact gene-by-environment interactions andepistasis Our network overlap analyses highlight the potentialfor different molecular trajectories to result in similar func-tions (Leiserson et al 2013) consistent with the multiplephysiological pathways potentially available to D melanoga-ster under water stress (Chown et al 2011) Experimentaldesign also presumably contributed to the study differencesStrong directional selection as applied in the 2012 study canincrease the frequency of rare beneficial alleles and likely re-duced overall diversity than did the single-generation GWASdesign Furthermore selection regimes often impose addi-tional selective pressures (eg food deprivation during desic-cation) resulting in correlated responses such as starvationresistance with desiccation resistance and these factors differfrom the laboratory to the field Finally evidence suggests innatural D melanogaster that standing genetic variation canvary extensively with sampling season (Itoh et al 2010Bergland et al 2014) Although we cannot compare levelsof standing variation between our two studies evidence sug-gests that this may have affected the Dys gene where ourGWAS approach revealed no differentiation between thecontrol and resistant pools at this locus but rather detectedthe alleles associated with the selective sweep in Telonis-Scottet al (2012) in both pools at high frequency Whether thisresulted from sampling the later collection after a longdrought (2004ndash2008 wwwbomgovauclimate last accessedJanuary 13 2016) is unclear but this observation highlightsthe relevance of temporal sampling to standing genetic var-iation when attempting to link complex phenotypes togenotypes

Materials and Methods

Natural Population Sampling

Drosophila melanogaster was collected from vineyard waste atKilchorn Winery Romsey Australia in April 2010 Over 1200isofemale lines were established in the laboratory from wild-caught females Species identification was conducted on maleF1 to remove Drosophila simulans contamination and over900 D melanogaster isofemale lines were retained All flieswere maintained in vials of dextrose cornmeal media at25 C and constant light

Desiccation Assay

From each isofemale line ten inseminated F1 females weresorted by aspiration without CO2 and held in vials until 4ndash5days old For the desiccation assay 10 females per line (over9000 flies in total) were screened by placing groups of femalesinto empty vials covered with gauze and randomly assigninglines into 5 large sealed glass tanks containing trays of silicadesiccant (relative humidity [RH] lt10) The tanks werescored for mortality hourly and the final 558 surviving

individuals were selected for the desiccation-resistant pool(558 flies lt6 of the total) The control pool constitutedthe same number of females sampled from over 200 ran-domly chosen isofemale lines that were frozen prior to thedesiccation assay We chose this random control approachinstead of comparing the phenotypic extremes (ie the latemortality tail vs the ldquoearly mortalityrdquo tail) because unfavor-able environmental conditions experienced by the wild-caught dam may inflate environmental variance due to car-ryover effects in the F1 offspring (Schiffer et al 2013)

DNA Extraction and Sequencing

We used pooled whole-genome sequencing (Pool-seq)(Futschik and Schleurootterer 2010) to estimate and compareallele frequency differences between the most desiccation-resistant and randomly sampled flies from a wild populationto associate naturally segregating candidate SNPs with theresponse to low humidity For each treatment (resistantand control) genomic DNA was extracted from 50 femaleheads using the Qiagen DNeasy Blood and Tissue Kit accord-ing to the manufacturerrsquos instructions (Qiagen HildenGermany) Two extractions were combined to create fivepools per treatment which were barcoded separately tocreate technical replicates (total 10 pools of 100 fliesn frac14 1000) The DNA was fragmented using a CovarisS2 ma-chine (Covaris Inc Woburn MA) and libraries were pre-pared from 1 mg genomic DNA (per pool of 100 flies) usingthe Illumina TruSeq Prep module (Illumina San Diego CA)To minimize the effect of variation across lanes the ten poolswere combined in equal concentration (fig 3) and run mul-tiplexed in each lane of a full flow cell of an Illumina HiSeq2000 sequencer Clusters were generated using the TruSeq PECluster Kit v5 on a cBot and sequenced using Illumina TruSeqSBS v3 chemistry (Illumina) Fragmentation library construc-tion and sequencing were performed at the Micromon Next-Gen Sequencing facility (Monash University ClaytonAustralia)

Fig 3 Experimental design See text for further explanation

10

Data Processing and Mapping

Processing was performed on a high-performance computingcluster using the Rubra pipeline system that makes use of theRuffus Python library (Goodstadt 2010) The final variant-calling pipeline was based on a pipeline developed by ClareSloggett (Sloggett et al 2014) and is publicly available athttpsgithubcomgriffinpGWAS_pipeline (last accessedJanuary 13 2016) Raw sequence reads were processed persample per lane with P1 and P2 read files processed sepa-rately initially Trimming was performed with trimmomatic v030 (Lohse et al 2012) Adapter sequences were also re-moved Leading and trailing bases with a quality scorebelow 30 were trimmed and a sliding window (width frac14 10quality threshold frac14 25) was used Reads shorter than 40 bpafter trimming were discarded Quality before and after trim-ming was examined using FastQC v 0101 (Andrews 2014)

Posttrimming reads that remained paired were used asinput into the alignment step Alignment to the D melano-gaster reference genome version r553 was performed withbwa v 062 (Li and Durbin 2009) using the bwa aln commandand the following options No seeding (-l 150) 1 rate ofmissing alignments given 2 uniform base error rate (-n001) a maximum of 2 gap opens per read (-o 2) a maximumof 12 gap extensions per read (-e 12) and long deletionswithin 12 bp of 30 end disallowed (-d 12) The default valueswere used for all other options Output files were then con-verted to sorted bam files using the bwa sampe commandand the ldquoSortSamrdquo option in Picard v 196

At this point the bam files were merged into one file persample using PicardMerge Duplicates were identified withPicard MarkDuplicates The tool RealignerTargetCreator inGATK v 26-5 (DePristo et al 2011) was applied to the tenbam files simultaneously producing a list of intervals contain-ing probable indels around which local realignment would beperformed This was used as input for the IndelRealigner toolwhich was also applied to all ten bam files simultaneously

To investigate the variation among technical replicatesallele frequency was estimated based on read counts foreach technical replicate separately for 100000 randomly cho-sen SNPs across the genome After excluding candidate des-iccation-resistance SNPs and SNPs that may have been falsepositives due to sequencing error (those with mean frequencylt004 across replicates) 1803 SNPs remained For each locuswe calculated the variance in allele frequency estimate amongthe five control replicates and the variance among the fivedesiccation-resistance replicates We compared this with thevariance due to pool category (control vs desiccation resis-tant) using analysis of variance (ANOVA) We also calculatedthe mean pairwise difference in allele frequency estimateamong control and among desiccation-resistance replicatesand the mean concordance correlation coefficient was calcu-lated over all pairwise comparisons within pool category(using the epiccc function in the epiR package Stevensonet al 2015)

The following approach was then taken to identify SNPsdiffering between the desiccation-resistant and controlgroups Based on the technical replicate analysis we merged

the five ldquodesiccation-resistantrdquo and the five random controlsamples into two bam files A pileup file was created withsamtools v 0119 (Li et al 2009) and converted to a ldquosyncrdquo fileusing the mpileup2sync script in Popoolation2 v 1201 (Kofleret al 2011) retaining reads with a mapping quality of at least20 and bases with a minimum quality score of 20 Reads withunmapped mates were also retained (option ndashA in samtoolsmpileup) For this step we used a repeat-masked version ofthe reference genome created with RepeatMasker 403 (Smitet al 2013-2015) and a repeat file containing all annotatedtransposons from D melanogaster including shared ancestralsequences (Flybase release r557) plus all repetitive elementsfrom RepBase release v1904 for D melanogaster Simple re-peats small RNA genes and bacterial insertions were notmasked (options ndashnolow ndashnorna ndashno_is) The rmblastsearch engine was used

Association Analysis

We then performed Fisherrsquos exact test with the Fisher testscript in Popoolation2 (Kofler et al 2011) We set the min-imum count (of the minor allele) frac14 10 minimum coverage(in both populations) frac14 20 and maximum coverage frac1410000 The test was performed for each SNP (slidingwindow off) Although there are numerous approaches tocalculating genome-wide significance thresholds in aPoolseq framework currently there is no consensus on anappropriate standardized statistical method Methods in-clude simulation approaches (Orozco-terWengel et al2012 Bastide et al 2013 Burke et al 2014) correctionusing a null distribution based on estimating genetic driftfrom replicated artificial selection line allele frequency vari-ance (Turner and Miller 2012) use of a simple thresholdbased on a minimum allele frequency difference (Turneret al 2010) and the FDR correction suggested by Storeyand Tibshirani (2003) (used by Fabian et al 2012)

We determined based on our design to calculate the FDRthreshold by comparison with a simulated distribution of nullP values following the approach of Bastide et al (2013) Foreach SNP locus a P value was calculated by simulating thedistribution of read counts between the major and minorallele according to a beta-binomial distribution with meana(a+b) and variance (ab)((a+b)2(a+b+1)) Thismodel allowed for two types of sampling variationStochastic variation in allele sampling across the phenotypesand the background variation in the representation of allelesin each pool caused by library preparation artifacts or sto-chastic variation driven by unequal coverage of the poolsThese sources of variation have also been recognized else-where as being important in interpreting pooled sequenceallele calls and identifying significant differentiation betweenpooled samples (Lynch et al 2014) The value of afrac14 37 waschosen by chi-square test as the best match to the observedP value distribution (afrac14 10 to afrac14 40 were tested) with acorresponding bfrac14 4315 to reflect the coverage depth differ-ence between the random control (487) and desiccation-resistant (568) pools These values were then used tosimulate a null data set 10 the size of the observed data

11

set Because the null P value distribution still did not fit theobserved P value distribution particularly well (supplemen-tary fig S2 Supplementary Material online) these values werenot used for a standard FDR correction Instead a P valuethreshold was calculated based on the lowest 005 of thenull P values similar in approach to Orozco-terWengel et al(2012)

Inversion Analysis

In D melanogaster several cosmopolitan inversion polymor-phisms show cross-continent latitudinal patterns that explaina substantial proportion of clinal variation for many traitsincluding thermal tolerance (reviewed in Hoffmann andWeeks 2007) Patterns of disequilibria generated betweenloci in the vicinity of the rearrangement can obscure signalsof selection (Hoffmann and Weeks 2007) and inversion fre-quencies are routinely considered in genomic studies ofnatural populations sampled from known latitudinal clines(Fabian et al 2012 Kapun et al 2014) We assessedassociations between inversion frequencies and our controland resistant pools at SNPs diagnostic for the following in-versions In3R(Payne) (Anderson et al 2005 Kapun et al2014) In(2L)t (Andolfatto et al 1999 Kapun et al 2014)In(2R)Ns In(3L)P In(3R)C In(3R)K and In(3R)Mo (Kapunet al 2014) Between one and five diagnostic SNPs were in-vestigated for each inversion (depending on availability) Weconsidered an inversion to be associated with desiccationresistance if the frequency of its diagnostic SNP differed sig-nificantly between the control and desiccation-resistant poolsby a Fisherrsquos exact test

Genome Feature Annotation

The list of candidate SNPs was converted to vcf format using acustom R script setting the alternate allele as the allele thatincreased in frequency from the control to the desiccation-resistant pool and setting the reference allele as the otherallele present in cases where both the major and minor allelesdiffered from the reference genome The genomic features inwhich the candidate SNPs occurred were annotated usingsnpEff v40 first creating a database from the D melanogasterv553 genome annotation with the default approach(Cingolani et al 2012) The default annotation settings wereused except for setting the parameter ldquo-ud 0rdquo to annotateSNPs to genes only if they were within the gene body Using2 tests on SNP counts we tested for over- or under-enrichment of feature types by calculating the proportionof lsquoallrsquo features from all SNPs detected in the study and theproportion of ldquocandidaterdquo SNPs identified as differentiated indesiccation-resistant flies (Fabian et al 2012)

GO Analysis

GO analyses was used to associate the candidates with theirbiological functions (Ashburner et al 2000) Previously devel-oped for biological pathway analysis of global gene expressionstudies (Wang et al 2010) typical GO analysis assumes that allgenes are sampled independently with equal probabilityGWAS violate these criteria where SNPs are more likely to

occur in longer genes than in shorter genes resulting in highertype I error rates in GO categories overpopulated with longgenes Gene length as well as overlapping gene biases (Wanget al 2011) were accounted for using Gowinda (Kofler andSchleurootterer 2012) which implements a permutation strategyto reduce false positives Gowinda does not reconstruct LDbetween SNPs but operates in two modes underpinned bytwo extreme assumptions 1) Gene mode (assumes all SNPs ina gene are in LD) and 2) SNP mode (assumes all SNPs in agene are completely independent)

The candidates were examined in both modes but giventhe stringency of gene mode high-resolution SNP modeoutput was utilized for the final analysis with the followingparameters 106 simulations minimum significance frac141 andminimum genes per GO categoryfrac14 3 (to exclude gene-poorcategories) P values were corrected using the Gowinda FDRalgorithm GO annotations from Flybase r557 were applied

Candidate Gene Analysis

In conjunction with the GO analysis we also examinedknown candidate genes associated with survival under lowRH to better characterize possible biological functions of ourGWAS candidates We took a conservative approach to can-didate gene assignment where the SNPs were mapped withinthe entire gene region without up- and downstream exten-sions Genes were curated from FlyBase and the literature(table 1) predominantly using detailed functional experimen-tal evidence as criteria for functional desiccation candidates

Network Analysis

We next obtained a higher level summary of the candidategene list using PPI network analysis The full candidate genelist was used to construct a first-order interaction networkusing all Drosophila PPIs listed in the Drosophila InteractionDatabase v 2014_10 (avoiding interologs) (Murali et al 2011)This ldquofullrdquo PPI network includes manually curated data frompublished literature and experimental data for 9633 proteinsand 93799 interactions For the network construction theigraph R package was used to build a subgraph from the fullPPI network containing the candidate proteins and their first-order interacting proteins (Csardi and Nepusz 2006) Self-connections were removed Functional enrichment analysiswas conducted on the list of network genes using FlyMinev400 (wwwflymineorg last accessed January 13 2016) withthe following parameters for GO enrichment and KEGGReactome pathway enrichment BenjaminindashHochberg FDRcorrection P lt 005 background frac14 all D melanogastergenes in our full PPI network and Ontology (for GO enrich-ment only) frac14 biological process or molecular function

Cross-Study Comparison Overlap with Our PreviousWork

Following on from the comparison between our gene list andthe candidate genes suggested from the literature we quan-titatively evaluated the degree of overlap between the candi-date genes identified in this study with alleles mapped in ourprevious study of flies collected from the same geographic

12

region (Telonis-Scott et al 2012) The ldquooverlaprdquo analysis wasperformed similarly for the GWAS candidate gene and net-work-level approaches First the gene list overlap was testedbetween the two studies using Fisherrsquos exact test where the382 genes identified from the candidate SNPs (this study)were compared with 416 genes identified from candidatesingle feature polymorphisms (Telonis-Scott et al 2012)The contingency table was constructed using the numberof annotated genes in the D melanogaster R553 genomebuilt as an approximation for the total number of genestested in each study and took the following form

where A frac14 gene list from Telonis-Scott et al (2012)n frac14 416 B frac14 gene list from this study n frac14 382 N frac14 totalnumber of genes in D melanogaster reference genome r553n frac14 17106

To annotate genome features in the overlap candidategene list we used the SNP data (this study) which betterresolves alleles than array-based genotyping Due to this lim-itation we did not compare exact genomic coordinates be-tween the studies but considered overlap to occur whendifferentiated alleles were detected in the same gene inboth studies We did however examine the allele frequenciesaround the Dys-RF promoter which was sequenced sepa-rately revealing a selective sweep in the experimental evolu-tion lines (Telonis-Scott et al 2012)

The overlapping candidates were tested for functional en-richment using a basic gene list approach As genes ratherthan SNPs were analyzed here Fly Mine v400 (wwwflymineorg) was applied with the following parameters Gene lengthnormalization BenjaminindashHochberg FDR correction Plt 005Ontology frac14 biological process and default background (allGO-annotated D melanogaster genes)

Finally we constructed a second PPI network from thecandidate genes mapped in Telonis-Scott et al (2012) to ex-amine system-level overlap between the two studies The 416genes mapped to 337 protein seeds and the network wasconstructed as described above The degree of network over-lap between the two studies was tested using simulation Ineach iteration a gene set of the same length as the list fromthis study and a gene set of the same length as the list fromTelonis-Scott et al (2012) were resampled randomly from theentire D melanogaster mapped gene annotation (r553) Eachresampled set was then used to build a first-order network aspreviously described The overlap network between the tworesampled sets was calculated using the graphintersectioncommand from the igraph package and zero-degree nodes(proteins with no interactions) were removed Overlap wasquantified by counting the number of nodes and the numberof edges in this overlap network The observed overlap wascompared with a simulated null distribution for both nodeand edge number where n frac14 1000 simulation iterations

Supplementary MaterialSupplementary figures S1ndashS4 and tables S1ndashS5 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

We thank Robert Good for the fly sampling and JenniferShirriffs and Lea and Alan Rako for outstanding assistancewith laboratory culture and substantial desiccation assaysWe are grateful to Melissa Davis for advice on network build-ing and comparison approaches and to five anonymous re-viewers whose comments improved the manuscript Thiswork was supported by a DECRA fellowship DE120102575and Monash University Technology voucher scheme toMTS an ARC Future Fellowship and Discovery Grants toCMS an ARC Laureate Fellowship to AAH and a Scienceand Industry Endowment Fund grant to CMS and AAH

ReferencesAlexander JM Lomvardas S 2014 Nuclear architecture as an epigenetic

regulator of neural development and function Neuroscience26439ndash50

Anderson AR Hoffmann AA McKechnie SW Umina PA Weeks AR2005 The latitudinal cline in the In(3R)Payne inversion polymor-phism has shifted in the last 20 years in Australian Drosophilamelanogaster populations Mol Ecol 14851ndash858

Andolfatto P Wall JD Kreitman M 1999 Unusual haplotype structureat the proximal breakpoint of In(2L)t in a natural population ofDrosophila melanogaster Genetics 1531297ndash1311

Andrews S 2014 FastQC Available from httpwwwbioinformaticsbbsrcacukprojectsfastqc

Ashburner M Ball CA Blake JA Botstein D Butler H Cherry JM DavisAP Dolinski K Dwight SS Eppig JT et al 2000 Gene ontology toolfor the unification of biology The Gene Ontology Consortium NatGenet 2525ndash29

Aytes A Mitrofanova A Lefebvre C Alvarez Mariano J Castillo-MartinM Zheng T Eastham James A Gopalan A Pienta Kenneth J ShenMM et al 2014 Cross-species regulatory network analysis identifiesa synergistic interaction between FOXM1 and CENPF that drivesprostate cancer malignancy Cancer Cell 25638ndash651

Bastide H Betancourt A Nolte V Tobler R Steuroobe P Futschik ASchleurootterer C 2013 A genome-wide fine-scale map of natural pig-mentation variation in Drosophila melanogaster PLoS Genet9e1003534

Bergland AO Behrman EL OrsquoBrien KR Schmidt PS Petrov DA 2014Genomic evidence of rapid and stable adaptive oscillations overseasonal time scales in Drosophila PLoS Genet 10e1004775

Bonebrake TC Mastrandrea MD 2010 Tolerance adaptation and pre-cipitation changes complicate latitudinal patterns of climate changeimpacts Proc Natl Acad Sci U S A 10712581ndash12586

Bradley TJ 2009 Animal osmoregulation Oxford Oxford UniversityPress

Burke MK Liti G Long AD 2014 Standing genetic variation drivesrepeatable experimental evolution in outcrossing populations ofSaccharomyces cerevisiae Mol Biol Evol 313228ndash3239

Chown SL 2012 Trait-based approaches to conservation physiologyforecasting environmental change risks from the bottom upPhilos Trans R Soc B 3671615ndash1627

Chown SL Nicolson SW 2004 Insect physiological ecology mechanismsand patterns Oxford Oxford University Press

Chown SL Soslashrensen JG Terblanche JS 2011 Water loss in insectsan environmental change perspective J Insect Physiol571070ndash1084

Not in GWAS In GWAS

Not in Telonis-Scott et al (2012) N ndash intersect-AndashB A-intersect

In Telonis-Scott et al (2012) B-intersect Intersect(AB)

13

Chung H Loehlin DW Dufour HD Vaccarro K Millar JG Carroll SB2014 A single gene affects both ecological divergence and matechoice in Drosophila Science 3431148ndash1151

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly(Austin) 680ndash92

Civelek M Lusis AJ 2014 Systems genetics approaches to understandcomplex traits Nat Rev Genet 1534ndash48

Clusella-Trullas S Blackburn TM Chown SL 2011 Climatic predictors oftemperature performance curve parameters in ectotherms implycomplex responses to climate change Am Nat 177738ndash751

Csardi G Nepusz T 2006 The igraph software package for complexnetwork research InterJournal Complex Systems 1695 Availablefrom httpigraphorg

Davies SA Cabrero P Overend G Aitchison L Sebastian S Terhzaz SDow JA 2014 Cell signalling mechanisms for insect stress toleranceJ Exp Biol 217119ndash128

Davies SA Cabrero P Povsic M Johnston NR Terhzaz S Dow JA 2013Signaling by Drosophila capa neuropeptides Gen Comp Endocrinol18860ndash66

Davies SA Overend G Sebastian S Cundall M Cabrero P Dow JATerhzaz S 2012 Immune and stress response lsquocross-talkrsquo in theDrosophila Malpighian tubule J Insect Physiol 58488ndash497

Day JP Dow JA Houslay MD Davies SA 2005 Cyclic nucleotide phos-phodiesterases in Drosophila melanogaster Biochem J 388333ndash342

de Nadal E Ammerer G Posas F 2011 Controlling gene expression inresponse to stress Nat Rev Genet 12833ndash845

DePristo MA Banks E Poplin R Garimella KV Maguire JR Hartl CPhilippakis AA del Angel G Rivas MA Hanna M et al 2011 Aframework for variation discovery and genotyping using next-gen-eration DNA sequencing data Nat Genet 43491ndash498

Djawdan M Chippindale AK Rose MR Bradley TJ 1998 Metabolic re-serves and evolved stress resistance in Drosophila melanogasterPhysiol Zool 71584ndash594

Durham MF Magwire MM Stone EA Leips J 2014 Genome-wide anal-ysis in Drosophila reveals age-specific effects of SNPs on fitness traitsNat Commun 54338

Emilsson V Thorleifsson G Zhang B Leonardson AS Zink F Zhu JCarlson S Helgason A Walters GB Gunnarsdottir S et al 2008Genetics of gene expression and its effect on disease Nature452423ndash428

Fabian DK Kapun M Nolte V Kofler R Schmidt PS Schleurootterer C FlattT 2012 Genome-wide patterns of latitudinal differentiation amongpopulations of Drosophila melanogaster from North America MolEcol 214748ndash4769

Foley BR Telonis-Scott M 2011 Quantitative genetic analysis suggestscausal association between cuticular hydrocarbon composition anddesiccation survival in Drosophila melanogaster Heredity 10668ndash77

Folk DG Han C Bradley TJ 2001 Water acquisition and partitioning inDrosophila melanogaster effects of selection for desiccation-resis-tance J Exp Biol 2043323ndash3331

Fowler MA Montell C 2013 Drosophila TRP channels and animal be-havior Life Sci 92394ndash403

Fu J Keurentjes JJB Bouwmeester H America T Verstappen FWA WardJL Beale MH de Vos RCH Dijkstra M Scheltema RA et al 2009System-wide molecular evidence for phenotypic buffering inArabidopsis Nat Genet 41166ndash167

Futschik A Schleurootterer C 2010 The next generation of molecular mar-kers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Gibbs AG Fukuzato F Matzkin LM 2003 Evolution of water conserva-tion mechanisms in Drosophila J Exp Biol 2061183ndash1192

Gibbs AG Matzkin LM 2001 Evolution of water balance in the genusDrosophila J Exp Biol 2042331ndash2338

Goodstadt L 2010 Ruffus a lightweight Python library for computa-tional pipelines Bioinformatics 262778ndash2779

Hadley NF 1994 Water relations of terrestrial arthropods San Diego(CA) Academic Press

Hoffmann AA Hallas R Sinclair C Mitrovski P 2001 Levels of variationin stress resistance in Drosophila among strains local populationsand geographic regions patterns for desiccation starvation coldresistance and associated traits Evolution 551621ndash1630

Hoffmann AA Parsons PA 1989a An integrated approach to environ-mental stress tolerance and life-history variation desiccation toler-ance in Drosophila Biol J Linn Soc 37117ndash136

Hoffmann AA Parsons PA 1989b Selection for increased desiccationresistance in Drosophila melanogaster additive genetic control andcorrelated responses for other stresses Genetics 122837ndash845

Hoffmann AA Sgro CM 2011 Climate change and evolutionary adap-tation Nature 470479ndash485

Hoffmann AA Weeks AR 2007 Climatic selection on genes and traitsafter a 100 year-old invasion a critical look at the temperate-tropicalclines in Drosophila melanogaster from eastern Australia Genetica129133ndash147

Huang W Richards S Carbone MA Zhu D Anholt RR Ayroles JFDuncan L Jordan KW Lawrence F Magwire MM et al 2012Epistasis dominates the genetic architecture of Drosophila quanti-tative traits Proc Natl Acad Sci U S A 10915553ndash15559

Huang Z Tunnacliffe A 2004 Response of human cells to desiccationcomparison with hyperosmotic stress response J Physiol558181ndash191

Huang Z Tunnacliffe A 2005 Gene induction by desiccation stress inhuman cell cultures FEBS Lett 5794973ndash4977

Ichihashi Y Aguilar-Martınez JA Farhi M Chitwood DH Kumar RMillon LV Peng J Maloof JN Sinha NR 2014 Evolutionary develop-mental transcriptomics reveals a gene network module regulatinginterspecific diversity in plant leaf shape Proc Natl Acad Sci U S A111E2616ndashE2621

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Johnson WA Carder JW 2012 Drosophila nociceptors mediate larvalaversion to dry surface environments utilizing both the painless TRPchannel and the DEGENaC subunit PPK1 PLoS One 7e32878

Kahsai L Kapan N Dircksen H Winther AM Nassel DR 2010 Metabolicstress responses in Drosophila are modulated by brain neurosecre-tory cells that produce multiple neuropeptides PLoS One 5e11480

Kapun M van Schalkwyk H McAllister B Flatt T Schleurootterer C 2014Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogasterMol Ecol 231813ndash1827

Kawano T Shimoda M Matsumoto H Ryuda M Tsuzuki S Hayakawa Y2010 Identification of a gene Desiccate contributing to desicca-tion resistance in Drosophila melanogaster J Biol Chem28538889ndash38897

Kellermann V van Heerwaarden B Sgro CM Hoffmann AA 2009Fundamental evolutionary limits in ecological traits driveDrosophila species distributions Science 3251244ndash1246

Kofler R Nolte V Schleurootterer C 2016 The impact of library preparationprotocols on the consistency of allele frequency estimates in Pool-Seq data Mol Ecol Resour 16118ndash122

Kofler R Pandey RV Schleurootterer C 2011 PoPoolation2 identifying dif-ferentiation between populations using sequencing of pooled DNAsamples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schleurootterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics282084ndash2085

Leiserson MDM Eldridge JV Ramachandran S Raphael BJ 2013Network analysis of GWAS data Curr Opin Genet Dev 23602ndash610

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

14

Liu Y Luo J Carlsson MA Neuroassel DR 2015 Serotonin and insulin-likepeptides modulate leucokinin-producing neurons that affectfeeding and water homeostasis in Drosophila J Comp Neurol5231840ndash1863

Lohse M Bolger AM Nagel A Fernie AR Lunn JE Stitt M Usadel B 2012RobiNA a user-friendly integrated software solution for RNA-Seq-based transcriptomics Nucleic Acids Res 40W622ndashW627

Long A Liti G Luptak A Tenaillon O 2015 Elucidating the moleculararchitecture of adaptation via evolve and resequence experimentsNat Rev Genet 16567ndash582

Lynch M Bost D Wilson S Maruki T Harrison S 2014 Population-genetic inference from pooled-sequencing data Genome Biol Evol61210ndash1218

Mackay TFC Stone EA Ayroles JF 2009 The genetics of quantitativetraits challenges and prospects Nat Rev Genet 10565ndash577

Marron MT Markow TA Kain KJ Gibbs AG 2003 Effects of starvationand desiccation on energy metabolism in desert and mesicDrosophila J Insect Physiol 49261ndash270

Matzkin LM Markow TA 2009 Transcriptional regulation of metabo-lism associated with the increased desiccation resistance of thecactophilic Drosophila mojavensis Genetics 1821279ndash1288

Murali T Pacifico S Yu J Guest S Roberts GG Finley RL Jr 2011 DroID2011 a comprehensive integrated resource for protein transcrip-tion factor RNA and gene interactions for Drosophila Nucleic AcidsRes 39D736ndashD743

Nuzhdin SV Turner TL 2013 Promises and limitations of hitchhikingmapping Curr Opin Genet Dev 23694ndash699

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Picard Picard A set of Java command line tools for manipulating high-throughput seqeuncing (HTS) data and formats Available fromhttpbroadinstitutegithubiopicard

Qiu Y Tittiger C Wicker-Thomas C Le Goff G Young S Wajnberg EFricaux T Taquet N Blomquist GJ Feyereisen R 2012 An insect-specific P450 oxidative decarbonylase for cuticular hydrocarbon bio-synthesis Proc Natl Acad Sci U S A 10914858ndash14863

Rajpurohit S Oliveira CC Etges WJ Gibbs AG 2013 Functional genomicand phenotypic responses to desiccation in natural populations of adesert Drosophilid Mol Ecol 222698ndash2715

Sarup P Soslashrensen JG Kristensen TN Hoffmann AA Loeschcke V PaigeKN Soslashrensen P 2011 Candidate genes detected in transcriptomestudies are strongly dependent on genetic background PLoS One6e15644

Schiffer M Hangartner S Hoffmann AA 2013 Assessing the relativeimportance of environmental effects carry-over effects and speciesdifferences in thermal stress resistance a comparison ofDrosophilids across field and laboratory generations J Exp Biol2163790ndash3798

Schleurootterer C Kofler R Versace E Tobler R Franssen SU 2015Combining experimental evolution with next-generation sequen-cing A powerful tool to study adaptation from standing geneticvariation Heredity 114431ndash440

Sinclair BJ Gibbs AG Roberts SP 2007 Gene transcription during ex-posure to and recovery from cold and desiccation stress inDrosophila melanogaster Insect Mol Biol 16435ndash443

Sloggett C Wakefield M Philip G Pope B 2014 Rubramdashflexible distrib-uted pipelines figshare Available from httpdxdoiorg106084m9figshare895626

Smit AFA Hubley R Green P 2013-2015 RepeatMasker Open-403Available from httpwwwrepeatmaskerorg

Seurooderberg JA Birse RT Neuroassel DR 2011 Insulin production and signalingin renal tubules of Drosophila is under control of tachykinin-relatedpeptide and regulates stress resistance PLoS One 6e19866

Soslashrensen JG Nielsen MM Loeschcke V 2007 Gene expression profileanalysis of Drosophila melanogaster selected for resistance to envi-ronmental stressors J Evol Biol 201624ndash1636

St Pierre SE Ponting L Stefancsik R McQuilton P 2014 FlyBase102mdashadvanced approaches to interrogating FlyBase Nucleic AcidsRes 42D780ndashD788

Stevenson M Nunes T Heuser C Marshall J Sanchez J Thornton RReiczigel J Robison-Cox J Sebastiani P Solymos P et al 2015 epiRTools for the analysis of epidemiological data R package version09-62 Available from httpCRANR-projectorgpackagefrac14epiR

Storey JD Tibshirani R 2003 Statistical significance for genomewidestudies Proc Natl Acad Sci U S A 1009440ndash9445

Telonis-Scott M Gane M DeGaris S Sgro CM Hoffmann AA 2012 Highresolution mapping of candidate alleles for desiccation resistancein Drosophila melanogaster under selection Mol Biol Evol291335ndash1351

Telonis-Scott M Guthridge KM Hoffmann AA 2006 A new set oflaboratory-selected Drosophila melanogaster lines for the analysisof desiccation resistance response to selection physiology and cor-related responses J Exp Biol 2091837ndash1847

Terhzaz S Cabrero P Robben JH Radford JC Hudson BD Milligan GDow JA Davies SA 2012 Mechanism and function of Drosophilacapa GPCR a desiccation stress-responsive receptor with functionalhomology to human neuromedinU receptor PLoS One 7e29897

Terhzaz S Overend G Sebastian S Dow JA Davies SA 2014 The Dmelanogaster capa-1 neuropeptide activates renal NF-kB signalingPeptides 53218ndash224

Terhzaz S Teets NM Cabrero P Henderson L Ritchie MG Nachman RJDow JA Denlinger DL Davies SA 2015 Insect capa neuropeptidesimpact desiccation and cold tolerance Proc Natl Acad Sci U S A1122882ndash2887

Thorat LJ Gaikwad SM Nath BB 2012 Trehalose as an indicator ofdesiccation stress in Drosophila melanogaster larvae a potentialmarker of anhydrobiosis Biochem Biophys Res Commun419638ndash642

Tobler R Franssen SU Kofler R Orozco-terWengel P Nolte VHermisson J Schl eurootterer C 2014 Massive habitat-specific ge-nomic response in D melanogaster populations during exper-imental evolution in hot and cold environments Mol Biol Evol31364ndash375

Turner TL Bourne EC Von Wettberg EJ Hu TT Nuzhdin SV 2010Population resequencing reveals local adaptation of Arabidopsislyrata to serpentine soils Nat Genet 42260ndash263

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Vermeulen CJ Soslashrensen P Kirilova Gagalova K Loeschcke V 2013Transcriptomic analysis of inbreeding depression in cold-sensitiveDrosophila melanogaster shows upregulation of the immune re-sponse J Evol Biol 261890ndash1902

Wang B Goode J Best J Meltzer J Schilman PE Chen J Garza D ThomasJB Montminy M 2008 The insulin-regulated CREB coactivatorTORC promotes stress resistance in Drosophila Cell Metab7434ndash444

Wang J Kean L Yang J Allan AK Davies SA Herzyk P Dow JA 2004Function-informed transcriptome analysis of Drosophila renaltubule Genome Biol 5R69

Wang K Li M Hakonarson H 2010 Analysing biological pathways ingenome-wide association studies Nat Rev Genet 11843ndash854

Wang L Jia P Wolfinger RD Chen X Zhao Z 2011 Gene set analysis ofgenome-wide association studies methodological issues and per-spectives Genomics 981ndash8

Weber AL Khan GF Magwire MM Tabor CL Mackay TF Anholt RR2012 Genome-wide association analysis of oxidative stress resistancein Drosophila melanogaster PLoS One 7e34745

Weeks AR McKechnie SW Hoffmann AA 2002 Dissecting adaptiveclinal variation markers inversions and sizestress associations inDrosophila melanogaster from a central field population Ecol Lett5756ndash763

Zhu Y Bergland AO Gonzalez J Petrov DA 2012 Empirical validation ofpooled whole genome population re-sequencing in Drosophila mel-anogaster PLoS One 7e41901

15

Page 8: Cross-Study Comparison Reveals Common Genomic, Network, … · 2016-02-06 · resistance using a high-throughput sequencing Pool-GWAS (genome-wide association study) approach ( Bastide

mapped SNPs associated with desiccation resistance in 500naturally derived genotypes Although Pool-seq has limitedpower to reveal very low frequency alleles to reconstructphase or adequately estimate allele frequencies in smallpools (Lynch et al 2014) the Pool-GWAS approach hasbeen validated by retrieval of known candidate genes in-volved in body pigmentation using similar numbers of ge-notypes as assessed here (Bastide et al 2013)

Although studies in controlled genetic backgrounds reportlarge effects of single genes on desiccation resistance (Table 1)our current and previous screens revealed similarly largenumbers of variants (over 600) mapping to over 300 genesA polygenic basis of desiccation resistance is consistent withquantitative genetic data for the trait (Foley and Telonis-Scott2011) and we also anticipate more complexity from naturalD melanogaster populations with greater standing geneticvariation than inbred laboratory strains However somefalse positive associations are likely in the GWAS due to thelimitations of quantitative trait mapping approaches Pooledevolve-and-resequence experiments routinely yield vast num-bers of candidate SNPs partly due to high levels of standinglinkage disequilibrium (LD) and hitchhiking neutral alleles

(Schleurootterer et al 2015) Large population sizes from specieswith low levels of LD such as D melanogaster somewhat cir-cumvent this issue and a notable feature of our experimentaldesign is our sampling Here we selected the most desicca-tion-resistant phenotypes directly from substantial standinggenetic variation where one generation of recombinationshould largely reflect natural haplotypes

We observed the largest candidate allele frequency differ-ences in loci that had intermediate allele frequencies in thecontrol pool This is in line with population genetics modelswhich predict that evolution from standing genetic variationwill initially be strongest for intermediate frequency alleleswhich can facilitate rapid change (Long et al 2015) Howeverwe also observed a number of low frequency alleles which ifbeneficial can inflate false positives due to long-range LD(Orozco-terWengel et al 2012 Nuzhdin and Turner 2013Tobler et al 2014) Nonetheless spurious LD associationscannot fully account for our candidate list as evidenced bythe desiccation functional candidates and cross-study over-lap Furthermore chromosome inversion dynamics can alsogenerate both short and long-range LD and contribute nu-merous false positive SNP phenotype associations (Hoffmann

Fig 2 Observed versus simulated network overlap between the GWAS first-order PPI network and the first-order PPI network from the Telonis-Scottet al (2012) gene list Overlap is presented in terms of (A) node number and (B) edge number Networks were simulated by resampling from the entireDrosophila melanogaster gene list (r553) as described in the text The histogram shows the distribution of overlap measures for 1000 simulations Theblack line represents the histogram as density and the blue line shows the corresponding normal distribution The red vertical line shows the observedoverlap from the real networks labeled as a percentile of the normal distribution Area-proportional Venn diagrams summarized the extent of overlapfor PPI networks constructed separately from the GWAS candidates and Telonis-Scott et al (2012) candidates (labeled ldquoprevious studyrdquo) (C) Overlapexpressed as the number of nodes Approximately 55 of the nodes overlapped between the two studies which was significantly higher than simulatedexpectations (999 percentile 2B) (D) Overlap expressed in the number of interactions This was not significantly higher than expected by simulation(89th percentile 2B) (E) Overlap at the level of GO biological process terms (directly compared GO terms by name this was not tested statisticallybecause of the complex hierarchical nature of GO terms and is presented for interest)

8

and Weeks 2007 Tobler et al 2014) but we found little as-sociation between the desiccation pool and inversion fre-quency changes consistent with the lack of latitudinalpatterns in desiccation resistanceand traitinversion fre-quency associations in east Australian D melanogaster(Hoffmann et al 2001 Weeks et al 2002)

Cross-Study Comparison Reveals a Core Set ofCandidates for a Complex Trait

Our data contain many candidate genes functions and path-ways for insect desiccation resistance but partitioning genu-ine adaptive loci from experimental noise remains asignificant challenge We therefore focus our efforts on ournovel comparative analysis where we identified 45 robustcandidates for further exploration Common genetic signa-tures are rarely documented in cross-study comparisons ofcomplex traits (Sarup et al 2011 Huang et al 2012 but cfVermeulen et al 2013) and never before for desiccation re-sistance Some degree of overlap between our studies likelyresults from a shared genetic background from similar sam-pling sites highlighting the replicability of these responsesOne limitation of this approach is the overpopulation ofthe ldquocorerdquo list by longer genes This result reflects the genelength bias inherent to GWAS approaches an issue that is notstraightforward to correct in a GWAS let alone in the com-parative framework applied here A focus on the commongenes functionally linked to the desiccation phenotype canprovide a meaningful framework for future research and alsosuggests that some large-effect candidate genes exhibit natu-ral variation contributing to the phenotype (table 1)

Other feasible candidates include genes expressed inthe MTs (Wang et al 2004) or essential for MT develop-ment (St Pierre et al 2014) providing further researchopportunity focused on the primary stress sensing andfluid secretion tissues Genes include one of the mosthighly expressed MT transcripts CG7084 as well as dncNckx30C slo cv-c and nkd Other aspects of stress resis-tance are also implicated CrebB for example is involvedin regulating insulin-regulated stress resistance and ther-mosensory behavior (Wang et al 2008) nkd also impactslarval cuticle development and shares a PDZ signalingdomain also present in the desiccation candidate desi(Kawano et al 2010 St Pierre et al 2014) Finally severalcore candidates were consistently reported in recent stud-ies suggesting generalized roles in stress responses andfitness including transcriptome studies on MT function(Wang et al 2004) inbreeding and cold resistance(Vermeulen et al 2013) oxidative stress (Weber et al2012) and genomic studies exploring age-specific SNPson fitness traits (Durham et al 2014) and genes likelyunder spatial selection in flies from climatically divergenthabitats (Fabian et al 2012)

Several of the enriched GO categories from the core listalso make biological sense including stimulus response(almost half the suite) defense and signaling particularlyin pathways involved in MT stress sensing (Davies et al 20122013 2014) Functional categories enriched for sensory

organ development were highlighted consistent with envi-ronmental sensing categories from desert Drosophila tran-scriptomics (Matzkin and Markow 2009 Rajpurohit et al2013) and stress detection via phototransduction pathwaysin D melanogaster under a range of stressors including des-iccation (Soslashrensen et al 2007)

Evidence for Conserved Signatures at Higher LevelOrganization

Beyond the gene level we observed a striking degree ofsimilarity between this study and that of Telonis-Scottet al (2012) at the network level where significantly over-lapping protein networks were constructed from largely dif-ferent seed protein sets This supports a scenario where theldquobiological information flow from DNA to phenotyperdquo(Civelek and Lusis 2014) contains inherent redundancywhere alternative genetic solutions underlie phenotypesand functions Resistance to perturbation by genetic or en-vironmental variation appears to increase with increasinghierarchical complexity that is phenotype ldquobufferingrdquo (Fuet al 2009) The same functional network in different pop-ulationsgenetic backgrounds can be impacted but theexact genes and SNPs responding to perturbation will notnecessarily overlap In Drosophila different sets of loci fromthe same founders mapped to the same PPI network in thecase of two complex traits chill coma recovery and startleresponse (Huang et al 2012) Furthermore interspecificshared networkspathways from different genes have beendocumented for numerous complex traits (Emilsson et al2008 Aytes et al 2014 Ichihashi et al 2014) providing op-portunities for investigating taxa where individual genefunction is poorly understood

Although the null distribution is not always obviouswhen comparing networks and their functions partly be-cause of reduced sensitivity from missing interactions(Leiserson et al 2013) here the shared network signaturesportray a plausible multifaceted stress response involvingstress sensing rapid gene accessibility and expression Thepredominance of immune system and stress responsepathways are of interest with reference immunitystresspathway ldquocross-talkrdquo particularly in the MTs whereabiotic stressors elicit strong immunity transcriptionalresponses (Davies et al 2012) The higher order architec-tures reflect crucial aspects of rapid gene expression con-trol in response to stress from chromatin organization(Alexander and Lomvardas 2014) to transcription RNAdecay and translation (reviwed in de Nadal et al 2011) InDrosophila variation in transcriptional regulation of indi-vidual genes can underpin divergent desiccation pheno-types (Chung et al 2014) or be functionally inferred frompatterns of many genes elicited during desiccation stress(Rajpurohit et al 2013) Variants with potential to mod-ulate expression at different stages of gene expressioncontrol were evident in our GWAS data alone for exam-ple enrichment for 50UTR and splice variants at the SNPlevel and for chromatin and chromosome structure termsat the functional level

9

Genetic Overlap Versus Genetic differencesmdashCausesand Implications

Despite the common signatures at the gene and higher orderlevels between our studies pervasive differences were ob-served These can arise from differences in standing geneticvariation in the natural populations where different alleliccompositions impact gene-by-environment interactions andepistasis Our network overlap analyses highlight the potentialfor different molecular trajectories to result in similar func-tions (Leiserson et al 2013) consistent with the multiplephysiological pathways potentially available to D melanoga-ster under water stress (Chown et al 2011) Experimentaldesign also presumably contributed to the study differencesStrong directional selection as applied in the 2012 study canincrease the frequency of rare beneficial alleles and likely re-duced overall diversity than did the single-generation GWASdesign Furthermore selection regimes often impose addi-tional selective pressures (eg food deprivation during desic-cation) resulting in correlated responses such as starvationresistance with desiccation resistance and these factors differfrom the laboratory to the field Finally evidence suggests innatural D melanogaster that standing genetic variation canvary extensively with sampling season (Itoh et al 2010Bergland et al 2014) Although we cannot compare levelsof standing variation between our two studies evidence sug-gests that this may have affected the Dys gene where ourGWAS approach revealed no differentiation between thecontrol and resistant pools at this locus but rather detectedthe alleles associated with the selective sweep in Telonis-Scottet al (2012) in both pools at high frequency Whether thisresulted from sampling the later collection after a longdrought (2004ndash2008 wwwbomgovauclimate last accessedJanuary 13 2016) is unclear but this observation highlightsthe relevance of temporal sampling to standing genetic var-iation when attempting to link complex phenotypes togenotypes

Materials and Methods

Natural Population Sampling

Drosophila melanogaster was collected from vineyard waste atKilchorn Winery Romsey Australia in April 2010 Over 1200isofemale lines were established in the laboratory from wild-caught females Species identification was conducted on maleF1 to remove Drosophila simulans contamination and over900 D melanogaster isofemale lines were retained All flieswere maintained in vials of dextrose cornmeal media at25 C and constant light

Desiccation Assay

From each isofemale line ten inseminated F1 females weresorted by aspiration without CO2 and held in vials until 4ndash5days old For the desiccation assay 10 females per line (over9000 flies in total) were screened by placing groups of femalesinto empty vials covered with gauze and randomly assigninglines into 5 large sealed glass tanks containing trays of silicadesiccant (relative humidity [RH] lt10) The tanks werescored for mortality hourly and the final 558 surviving

individuals were selected for the desiccation-resistant pool(558 flies lt6 of the total) The control pool constitutedthe same number of females sampled from over 200 ran-domly chosen isofemale lines that were frozen prior to thedesiccation assay We chose this random control approachinstead of comparing the phenotypic extremes (ie the latemortality tail vs the ldquoearly mortalityrdquo tail) because unfavor-able environmental conditions experienced by the wild-caught dam may inflate environmental variance due to car-ryover effects in the F1 offspring (Schiffer et al 2013)

DNA Extraction and Sequencing

We used pooled whole-genome sequencing (Pool-seq)(Futschik and Schleurootterer 2010) to estimate and compareallele frequency differences between the most desiccation-resistant and randomly sampled flies from a wild populationto associate naturally segregating candidate SNPs with theresponse to low humidity For each treatment (resistantand control) genomic DNA was extracted from 50 femaleheads using the Qiagen DNeasy Blood and Tissue Kit accord-ing to the manufacturerrsquos instructions (Qiagen HildenGermany) Two extractions were combined to create fivepools per treatment which were barcoded separately tocreate technical replicates (total 10 pools of 100 fliesn frac14 1000) The DNA was fragmented using a CovarisS2 ma-chine (Covaris Inc Woburn MA) and libraries were pre-pared from 1 mg genomic DNA (per pool of 100 flies) usingthe Illumina TruSeq Prep module (Illumina San Diego CA)To minimize the effect of variation across lanes the ten poolswere combined in equal concentration (fig 3) and run mul-tiplexed in each lane of a full flow cell of an Illumina HiSeq2000 sequencer Clusters were generated using the TruSeq PECluster Kit v5 on a cBot and sequenced using Illumina TruSeqSBS v3 chemistry (Illumina) Fragmentation library construc-tion and sequencing were performed at the Micromon Next-Gen Sequencing facility (Monash University ClaytonAustralia)

Fig 3 Experimental design See text for further explanation

10

Data Processing and Mapping

Processing was performed on a high-performance computingcluster using the Rubra pipeline system that makes use of theRuffus Python library (Goodstadt 2010) The final variant-calling pipeline was based on a pipeline developed by ClareSloggett (Sloggett et al 2014) and is publicly available athttpsgithubcomgriffinpGWAS_pipeline (last accessedJanuary 13 2016) Raw sequence reads were processed persample per lane with P1 and P2 read files processed sepa-rately initially Trimming was performed with trimmomatic v030 (Lohse et al 2012) Adapter sequences were also re-moved Leading and trailing bases with a quality scorebelow 30 were trimmed and a sliding window (width frac14 10quality threshold frac14 25) was used Reads shorter than 40 bpafter trimming were discarded Quality before and after trim-ming was examined using FastQC v 0101 (Andrews 2014)

Posttrimming reads that remained paired were used asinput into the alignment step Alignment to the D melano-gaster reference genome version r553 was performed withbwa v 062 (Li and Durbin 2009) using the bwa aln commandand the following options No seeding (-l 150) 1 rate ofmissing alignments given 2 uniform base error rate (-n001) a maximum of 2 gap opens per read (-o 2) a maximumof 12 gap extensions per read (-e 12) and long deletionswithin 12 bp of 30 end disallowed (-d 12) The default valueswere used for all other options Output files were then con-verted to sorted bam files using the bwa sampe commandand the ldquoSortSamrdquo option in Picard v 196

At this point the bam files were merged into one file persample using PicardMerge Duplicates were identified withPicard MarkDuplicates The tool RealignerTargetCreator inGATK v 26-5 (DePristo et al 2011) was applied to the tenbam files simultaneously producing a list of intervals contain-ing probable indels around which local realignment would beperformed This was used as input for the IndelRealigner toolwhich was also applied to all ten bam files simultaneously

To investigate the variation among technical replicatesallele frequency was estimated based on read counts foreach technical replicate separately for 100000 randomly cho-sen SNPs across the genome After excluding candidate des-iccation-resistance SNPs and SNPs that may have been falsepositives due to sequencing error (those with mean frequencylt004 across replicates) 1803 SNPs remained For each locuswe calculated the variance in allele frequency estimate amongthe five control replicates and the variance among the fivedesiccation-resistance replicates We compared this with thevariance due to pool category (control vs desiccation resis-tant) using analysis of variance (ANOVA) We also calculatedthe mean pairwise difference in allele frequency estimateamong control and among desiccation-resistance replicatesand the mean concordance correlation coefficient was calcu-lated over all pairwise comparisons within pool category(using the epiccc function in the epiR package Stevensonet al 2015)

The following approach was then taken to identify SNPsdiffering between the desiccation-resistant and controlgroups Based on the technical replicate analysis we merged

the five ldquodesiccation-resistantrdquo and the five random controlsamples into two bam files A pileup file was created withsamtools v 0119 (Li et al 2009) and converted to a ldquosyncrdquo fileusing the mpileup2sync script in Popoolation2 v 1201 (Kofleret al 2011) retaining reads with a mapping quality of at least20 and bases with a minimum quality score of 20 Reads withunmapped mates were also retained (option ndashA in samtoolsmpileup) For this step we used a repeat-masked version ofthe reference genome created with RepeatMasker 403 (Smitet al 2013-2015) and a repeat file containing all annotatedtransposons from D melanogaster including shared ancestralsequences (Flybase release r557) plus all repetitive elementsfrom RepBase release v1904 for D melanogaster Simple re-peats small RNA genes and bacterial insertions were notmasked (options ndashnolow ndashnorna ndashno_is) The rmblastsearch engine was used

Association Analysis

We then performed Fisherrsquos exact test with the Fisher testscript in Popoolation2 (Kofler et al 2011) We set the min-imum count (of the minor allele) frac14 10 minimum coverage(in both populations) frac14 20 and maximum coverage frac1410000 The test was performed for each SNP (slidingwindow off) Although there are numerous approaches tocalculating genome-wide significance thresholds in aPoolseq framework currently there is no consensus on anappropriate standardized statistical method Methods in-clude simulation approaches (Orozco-terWengel et al2012 Bastide et al 2013 Burke et al 2014) correctionusing a null distribution based on estimating genetic driftfrom replicated artificial selection line allele frequency vari-ance (Turner and Miller 2012) use of a simple thresholdbased on a minimum allele frequency difference (Turneret al 2010) and the FDR correction suggested by Storeyand Tibshirani (2003) (used by Fabian et al 2012)

We determined based on our design to calculate the FDRthreshold by comparison with a simulated distribution of nullP values following the approach of Bastide et al (2013) Foreach SNP locus a P value was calculated by simulating thedistribution of read counts between the major and minorallele according to a beta-binomial distribution with meana(a+b) and variance (ab)((a+b)2(a+b+1)) Thismodel allowed for two types of sampling variationStochastic variation in allele sampling across the phenotypesand the background variation in the representation of allelesin each pool caused by library preparation artifacts or sto-chastic variation driven by unequal coverage of the poolsThese sources of variation have also been recognized else-where as being important in interpreting pooled sequenceallele calls and identifying significant differentiation betweenpooled samples (Lynch et al 2014) The value of afrac14 37 waschosen by chi-square test as the best match to the observedP value distribution (afrac14 10 to afrac14 40 were tested) with acorresponding bfrac14 4315 to reflect the coverage depth differ-ence between the random control (487) and desiccation-resistant (568) pools These values were then used tosimulate a null data set 10 the size of the observed data

11

set Because the null P value distribution still did not fit theobserved P value distribution particularly well (supplemen-tary fig S2 Supplementary Material online) these values werenot used for a standard FDR correction Instead a P valuethreshold was calculated based on the lowest 005 of thenull P values similar in approach to Orozco-terWengel et al(2012)

Inversion Analysis

In D melanogaster several cosmopolitan inversion polymor-phisms show cross-continent latitudinal patterns that explaina substantial proportion of clinal variation for many traitsincluding thermal tolerance (reviewed in Hoffmann andWeeks 2007) Patterns of disequilibria generated betweenloci in the vicinity of the rearrangement can obscure signalsof selection (Hoffmann and Weeks 2007) and inversion fre-quencies are routinely considered in genomic studies ofnatural populations sampled from known latitudinal clines(Fabian et al 2012 Kapun et al 2014) We assessedassociations between inversion frequencies and our controland resistant pools at SNPs diagnostic for the following in-versions In3R(Payne) (Anderson et al 2005 Kapun et al2014) In(2L)t (Andolfatto et al 1999 Kapun et al 2014)In(2R)Ns In(3L)P In(3R)C In(3R)K and In(3R)Mo (Kapunet al 2014) Between one and five diagnostic SNPs were in-vestigated for each inversion (depending on availability) Weconsidered an inversion to be associated with desiccationresistance if the frequency of its diagnostic SNP differed sig-nificantly between the control and desiccation-resistant poolsby a Fisherrsquos exact test

Genome Feature Annotation

The list of candidate SNPs was converted to vcf format using acustom R script setting the alternate allele as the allele thatincreased in frequency from the control to the desiccation-resistant pool and setting the reference allele as the otherallele present in cases where both the major and minor allelesdiffered from the reference genome The genomic features inwhich the candidate SNPs occurred were annotated usingsnpEff v40 first creating a database from the D melanogasterv553 genome annotation with the default approach(Cingolani et al 2012) The default annotation settings wereused except for setting the parameter ldquo-ud 0rdquo to annotateSNPs to genes only if they were within the gene body Using2 tests on SNP counts we tested for over- or under-enrichment of feature types by calculating the proportionof lsquoallrsquo features from all SNPs detected in the study and theproportion of ldquocandidaterdquo SNPs identified as differentiated indesiccation-resistant flies (Fabian et al 2012)

GO Analysis

GO analyses was used to associate the candidates with theirbiological functions (Ashburner et al 2000) Previously devel-oped for biological pathway analysis of global gene expressionstudies (Wang et al 2010) typical GO analysis assumes that allgenes are sampled independently with equal probabilityGWAS violate these criteria where SNPs are more likely to

occur in longer genes than in shorter genes resulting in highertype I error rates in GO categories overpopulated with longgenes Gene length as well as overlapping gene biases (Wanget al 2011) were accounted for using Gowinda (Kofler andSchleurootterer 2012) which implements a permutation strategyto reduce false positives Gowinda does not reconstruct LDbetween SNPs but operates in two modes underpinned bytwo extreme assumptions 1) Gene mode (assumes all SNPs ina gene are in LD) and 2) SNP mode (assumes all SNPs in agene are completely independent)

The candidates were examined in both modes but giventhe stringency of gene mode high-resolution SNP modeoutput was utilized for the final analysis with the followingparameters 106 simulations minimum significance frac141 andminimum genes per GO categoryfrac14 3 (to exclude gene-poorcategories) P values were corrected using the Gowinda FDRalgorithm GO annotations from Flybase r557 were applied

Candidate Gene Analysis

In conjunction with the GO analysis we also examinedknown candidate genes associated with survival under lowRH to better characterize possible biological functions of ourGWAS candidates We took a conservative approach to can-didate gene assignment where the SNPs were mapped withinthe entire gene region without up- and downstream exten-sions Genes were curated from FlyBase and the literature(table 1) predominantly using detailed functional experimen-tal evidence as criteria for functional desiccation candidates

Network Analysis

We next obtained a higher level summary of the candidategene list using PPI network analysis The full candidate genelist was used to construct a first-order interaction networkusing all Drosophila PPIs listed in the Drosophila InteractionDatabase v 2014_10 (avoiding interologs) (Murali et al 2011)This ldquofullrdquo PPI network includes manually curated data frompublished literature and experimental data for 9633 proteinsand 93799 interactions For the network construction theigraph R package was used to build a subgraph from the fullPPI network containing the candidate proteins and their first-order interacting proteins (Csardi and Nepusz 2006) Self-connections were removed Functional enrichment analysiswas conducted on the list of network genes using FlyMinev400 (wwwflymineorg last accessed January 13 2016) withthe following parameters for GO enrichment and KEGGReactome pathway enrichment BenjaminindashHochberg FDRcorrection P lt 005 background frac14 all D melanogastergenes in our full PPI network and Ontology (for GO enrich-ment only) frac14 biological process or molecular function

Cross-Study Comparison Overlap with Our PreviousWork

Following on from the comparison between our gene list andthe candidate genes suggested from the literature we quan-titatively evaluated the degree of overlap between the candi-date genes identified in this study with alleles mapped in ourprevious study of flies collected from the same geographic

12

region (Telonis-Scott et al 2012) The ldquooverlaprdquo analysis wasperformed similarly for the GWAS candidate gene and net-work-level approaches First the gene list overlap was testedbetween the two studies using Fisherrsquos exact test where the382 genes identified from the candidate SNPs (this study)were compared with 416 genes identified from candidatesingle feature polymorphisms (Telonis-Scott et al 2012)The contingency table was constructed using the numberof annotated genes in the D melanogaster R553 genomebuilt as an approximation for the total number of genestested in each study and took the following form

where A frac14 gene list from Telonis-Scott et al (2012)n frac14 416 B frac14 gene list from this study n frac14 382 N frac14 totalnumber of genes in D melanogaster reference genome r553n frac14 17106

To annotate genome features in the overlap candidategene list we used the SNP data (this study) which betterresolves alleles than array-based genotyping Due to this lim-itation we did not compare exact genomic coordinates be-tween the studies but considered overlap to occur whendifferentiated alleles were detected in the same gene inboth studies We did however examine the allele frequenciesaround the Dys-RF promoter which was sequenced sepa-rately revealing a selective sweep in the experimental evolu-tion lines (Telonis-Scott et al 2012)

The overlapping candidates were tested for functional en-richment using a basic gene list approach As genes ratherthan SNPs were analyzed here Fly Mine v400 (wwwflymineorg) was applied with the following parameters Gene lengthnormalization BenjaminindashHochberg FDR correction Plt 005Ontology frac14 biological process and default background (allGO-annotated D melanogaster genes)

Finally we constructed a second PPI network from thecandidate genes mapped in Telonis-Scott et al (2012) to ex-amine system-level overlap between the two studies The 416genes mapped to 337 protein seeds and the network wasconstructed as described above The degree of network over-lap between the two studies was tested using simulation Ineach iteration a gene set of the same length as the list fromthis study and a gene set of the same length as the list fromTelonis-Scott et al (2012) were resampled randomly from theentire D melanogaster mapped gene annotation (r553) Eachresampled set was then used to build a first-order network aspreviously described The overlap network between the tworesampled sets was calculated using the graphintersectioncommand from the igraph package and zero-degree nodes(proteins with no interactions) were removed Overlap wasquantified by counting the number of nodes and the numberof edges in this overlap network The observed overlap wascompared with a simulated null distribution for both nodeand edge number where n frac14 1000 simulation iterations

Supplementary MaterialSupplementary figures S1ndashS4 and tables S1ndashS5 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

We thank Robert Good for the fly sampling and JenniferShirriffs and Lea and Alan Rako for outstanding assistancewith laboratory culture and substantial desiccation assaysWe are grateful to Melissa Davis for advice on network build-ing and comparison approaches and to five anonymous re-viewers whose comments improved the manuscript Thiswork was supported by a DECRA fellowship DE120102575and Monash University Technology voucher scheme toMTS an ARC Future Fellowship and Discovery Grants toCMS an ARC Laureate Fellowship to AAH and a Scienceand Industry Endowment Fund grant to CMS and AAH

ReferencesAlexander JM Lomvardas S 2014 Nuclear architecture as an epigenetic

regulator of neural development and function Neuroscience26439ndash50

Anderson AR Hoffmann AA McKechnie SW Umina PA Weeks AR2005 The latitudinal cline in the In(3R)Payne inversion polymor-phism has shifted in the last 20 years in Australian Drosophilamelanogaster populations Mol Ecol 14851ndash858

Andolfatto P Wall JD Kreitman M 1999 Unusual haplotype structureat the proximal breakpoint of In(2L)t in a natural population ofDrosophila melanogaster Genetics 1531297ndash1311

Andrews S 2014 FastQC Available from httpwwwbioinformaticsbbsrcacukprojectsfastqc

Ashburner M Ball CA Blake JA Botstein D Butler H Cherry JM DavisAP Dolinski K Dwight SS Eppig JT et al 2000 Gene ontology toolfor the unification of biology The Gene Ontology Consortium NatGenet 2525ndash29

Aytes A Mitrofanova A Lefebvre C Alvarez Mariano J Castillo-MartinM Zheng T Eastham James A Gopalan A Pienta Kenneth J ShenMM et al 2014 Cross-species regulatory network analysis identifiesa synergistic interaction between FOXM1 and CENPF that drivesprostate cancer malignancy Cancer Cell 25638ndash651

Bastide H Betancourt A Nolte V Tobler R Steuroobe P Futschik ASchleurootterer C 2013 A genome-wide fine-scale map of natural pig-mentation variation in Drosophila melanogaster PLoS Genet9e1003534

Bergland AO Behrman EL OrsquoBrien KR Schmidt PS Petrov DA 2014Genomic evidence of rapid and stable adaptive oscillations overseasonal time scales in Drosophila PLoS Genet 10e1004775

Bonebrake TC Mastrandrea MD 2010 Tolerance adaptation and pre-cipitation changes complicate latitudinal patterns of climate changeimpacts Proc Natl Acad Sci U S A 10712581ndash12586

Bradley TJ 2009 Animal osmoregulation Oxford Oxford UniversityPress

Burke MK Liti G Long AD 2014 Standing genetic variation drivesrepeatable experimental evolution in outcrossing populations ofSaccharomyces cerevisiae Mol Biol Evol 313228ndash3239

Chown SL 2012 Trait-based approaches to conservation physiologyforecasting environmental change risks from the bottom upPhilos Trans R Soc B 3671615ndash1627

Chown SL Nicolson SW 2004 Insect physiological ecology mechanismsand patterns Oxford Oxford University Press

Chown SL Soslashrensen JG Terblanche JS 2011 Water loss in insectsan environmental change perspective J Insect Physiol571070ndash1084

Not in GWAS In GWAS

Not in Telonis-Scott et al (2012) N ndash intersect-AndashB A-intersect

In Telonis-Scott et al (2012) B-intersect Intersect(AB)

13

Chung H Loehlin DW Dufour HD Vaccarro K Millar JG Carroll SB2014 A single gene affects both ecological divergence and matechoice in Drosophila Science 3431148ndash1151

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly(Austin) 680ndash92

Civelek M Lusis AJ 2014 Systems genetics approaches to understandcomplex traits Nat Rev Genet 1534ndash48

Clusella-Trullas S Blackburn TM Chown SL 2011 Climatic predictors oftemperature performance curve parameters in ectotherms implycomplex responses to climate change Am Nat 177738ndash751

Csardi G Nepusz T 2006 The igraph software package for complexnetwork research InterJournal Complex Systems 1695 Availablefrom httpigraphorg

Davies SA Cabrero P Overend G Aitchison L Sebastian S Terhzaz SDow JA 2014 Cell signalling mechanisms for insect stress toleranceJ Exp Biol 217119ndash128

Davies SA Cabrero P Povsic M Johnston NR Terhzaz S Dow JA 2013Signaling by Drosophila capa neuropeptides Gen Comp Endocrinol18860ndash66

Davies SA Overend G Sebastian S Cundall M Cabrero P Dow JATerhzaz S 2012 Immune and stress response lsquocross-talkrsquo in theDrosophila Malpighian tubule J Insect Physiol 58488ndash497

Day JP Dow JA Houslay MD Davies SA 2005 Cyclic nucleotide phos-phodiesterases in Drosophila melanogaster Biochem J 388333ndash342

de Nadal E Ammerer G Posas F 2011 Controlling gene expression inresponse to stress Nat Rev Genet 12833ndash845

DePristo MA Banks E Poplin R Garimella KV Maguire JR Hartl CPhilippakis AA del Angel G Rivas MA Hanna M et al 2011 Aframework for variation discovery and genotyping using next-gen-eration DNA sequencing data Nat Genet 43491ndash498

Djawdan M Chippindale AK Rose MR Bradley TJ 1998 Metabolic re-serves and evolved stress resistance in Drosophila melanogasterPhysiol Zool 71584ndash594

Durham MF Magwire MM Stone EA Leips J 2014 Genome-wide anal-ysis in Drosophila reveals age-specific effects of SNPs on fitness traitsNat Commun 54338

Emilsson V Thorleifsson G Zhang B Leonardson AS Zink F Zhu JCarlson S Helgason A Walters GB Gunnarsdottir S et al 2008Genetics of gene expression and its effect on disease Nature452423ndash428

Fabian DK Kapun M Nolte V Kofler R Schmidt PS Schleurootterer C FlattT 2012 Genome-wide patterns of latitudinal differentiation amongpopulations of Drosophila melanogaster from North America MolEcol 214748ndash4769

Foley BR Telonis-Scott M 2011 Quantitative genetic analysis suggestscausal association between cuticular hydrocarbon composition anddesiccation survival in Drosophila melanogaster Heredity 10668ndash77

Folk DG Han C Bradley TJ 2001 Water acquisition and partitioning inDrosophila melanogaster effects of selection for desiccation-resis-tance J Exp Biol 2043323ndash3331

Fowler MA Montell C 2013 Drosophila TRP channels and animal be-havior Life Sci 92394ndash403

Fu J Keurentjes JJB Bouwmeester H America T Verstappen FWA WardJL Beale MH de Vos RCH Dijkstra M Scheltema RA et al 2009System-wide molecular evidence for phenotypic buffering inArabidopsis Nat Genet 41166ndash167

Futschik A Schleurootterer C 2010 The next generation of molecular mar-kers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Gibbs AG Fukuzato F Matzkin LM 2003 Evolution of water conserva-tion mechanisms in Drosophila J Exp Biol 2061183ndash1192

Gibbs AG Matzkin LM 2001 Evolution of water balance in the genusDrosophila J Exp Biol 2042331ndash2338

Goodstadt L 2010 Ruffus a lightweight Python library for computa-tional pipelines Bioinformatics 262778ndash2779

Hadley NF 1994 Water relations of terrestrial arthropods San Diego(CA) Academic Press

Hoffmann AA Hallas R Sinclair C Mitrovski P 2001 Levels of variationin stress resistance in Drosophila among strains local populationsand geographic regions patterns for desiccation starvation coldresistance and associated traits Evolution 551621ndash1630

Hoffmann AA Parsons PA 1989a An integrated approach to environ-mental stress tolerance and life-history variation desiccation toler-ance in Drosophila Biol J Linn Soc 37117ndash136

Hoffmann AA Parsons PA 1989b Selection for increased desiccationresistance in Drosophila melanogaster additive genetic control andcorrelated responses for other stresses Genetics 122837ndash845

Hoffmann AA Sgro CM 2011 Climate change and evolutionary adap-tation Nature 470479ndash485

Hoffmann AA Weeks AR 2007 Climatic selection on genes and traitsafter a 100 year-old invasion a critical look at the temperate-tropicalclines in Drosophila melanogaster from eastern Australia Genetica129133ndash147

Huang W Richards S Carbone MA Zhu D Anholt RR Ayroles JFDuncan L Jordan KW Lawrence F Magwire MM et al 2012Epistasis dominates the genetic architecture of Drosophila quanti-tative traits Proc Natl Acad Sci U S A 10915553ndash15559

Huang Z Tunnacliffe A 2004 Response of human cells to desiccationcomparison with hyperosmotic stress response J Physiol558181ndash191

Huang Z Tunnacliffe A 2005 Gene induction by desiccation stress inhuman cell cultures FEBS Lett 5794973ndash4977

Ichihashi Y Aguilar-Martınez JA Farhi M Chitwood DH Kumar RMillon LV Peng J Maloof JN Sinha NR 2014 Evolutionary develop-mental transcriptomics reveals a gene network module regulatinginterspecific diversity in plant leaf shape Proc Natl Acad Sci U S A111E2616ndashE2621

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Johnson WA Carder JW 2012 Drosophila nociceptors mediate larvalaversion to dry surface environments utilizing both the painless TRPchannel and the DEGENaC subunit PPK1 PLoS One 7e32878

Kahsai L Kapan N Dircksen H Winther AM Nassel DR 2010 Metabolicstress responses in Drosophila are modulated by brain neurosecre-tory cells that produce multiple neuropeptides PLoS One 5e11480

Kapun M van Schalkwyk H McAllister B Flatt T Schleurootterer C 2014Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogasterMol Ecol 231813ndash1827

Kawano T Shimoda M Matsumoto H Ryuda M Tsuzuki S Hayakawa Y2010 Identification of a gene Desiccate contributing to desicca-tion resistance in Drosophila melanogaster J Biol Chem28538889ndash38897

Kellermann V van Heerwaarden B Sgro CM Hoffmann AA 2009Fundamental evolutionary limits in ecological traits driveDrosophila species distributions Science 3251244ndash1246

Kofler R Nolte V Schleurootterer C 2016 The impact of library preparationprotocols on the consistency of allele frequency estimates in Pool-Seq data Mol Ecol Resour 16118ndash122

Kofler R Pandey RV Schleurootterer C 2011 PoPoolation2 identifying dif-ferentiation between populations using sequencing of pooled DNAsamples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schleurootterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics282084ndash2085

Leiserson MDM Eldridge JV Ramachandran S Raphael BJ 2013Network analysis of GWAS data Curr Opin Genet Dev 23602ndash610

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

14

Liu Y Luo J Carlsson MA Neuroassel DR 2015 Serotonin and insulin-likepeptides modulate leucokinin-producing neurons that affectfeeding and water homeostasis in Drosophila J Comp Neurol5231840ndash1863

Lohse M Bolger AM Nagel A Fernie AR Lunn JE Stitt M Usadel B 2012RobiNA a user-friendly integrated software solution for RNA-Seq-based transcriptomics Nucleic Acids Res 40W622ndashW627

Long A Liti G Luptak A Tenaillon O 2015 Elucidating the moleculararchitecture of adaptation via evolve and resequence experimentsNat Rev Genet 16567ndash582

Lynch M Bost D Wilson S Maruki T Harrison S 2014 Population-genetic inference from pooled-sequencing data Genome Biol Evol61210ndash1218

Mackay TFC Stone EA Ayroles JF 2009 The genetics of quantitativetraits challenges and prospects Nat Rev Genet 10565ndash577

Marron MT Markow TA Kain KJ Gibbs AG 2003 Effects of starvationand desiccation on energy metabolism in desert and mesicDrosophila J Insect Physiol 49261ndash270

Matzkin LM Markow TA 2009 Transcriptional regulation of metabo-lism associated with the increased desiccation resistance of thecactophilic Drosophila mojavensis Genetics 1821279ndash1288

Murali T Pacifico S Yu J Guest S Roberts GG Finley RL Jr 2011 DroID2011 a comprehensive integrated resource for protein transcrip-tion factor RNA and gene interactions for Drosophila Nucleic AcidsRes 39D736ndashD743

Nuzhdin SV Turner TL 2013 Promises and limitations of hitchhikingmapping Curr Opin Genet Dev 23694ndash699

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Picard Picard A set of Java command line tools for manipulating high-throughput seqeuncing (HTS) data and formats Available fromhttpbroadinstitutegithubiopicard

Qiu Y Tittiger C Wicker-Thomas C Le Goff G Young S Wajnberg EFricaux T Taquet N Blomquist GJ Feyereisen R 2012 An insect-specific P450 oxidative decarbonylase for cuticular hydrocarbon bio-synthesis Proc Natl Acad Sci U S A 10914858ndash14863

Rajpurohit S Oliveira CC Etges WJ Gibbs AG 2013 Functional genomicand phenotypic responses to desiccation in natural populations of adesert Drosophilid Mol Ecol 222698ndash2715

Sarup P Soslashrensen JG Kristensen TN Hoffmann AA Loeschcke V PaigeKN Soslashrensen P 2011 Candidate genes detected in transcriptomestudies are strongly dependent on genetic background PLoS One6e15644

Schiffer M Hangartner S Hoffmann AA 2013 Assessing the relativeimportance of environmental effects carry-over effects and speciesdifferences in thermal stress resistance a comparison ofDrosophilids across field and laboratory generations J Exp Biol2163790ndash3798

Schleurootterer C Kofler R Versace E Tobler R Franssen SU 2015Combining experimental evolution with next-generation sequen-cing A powerful tool to study adaptation from standing geneticvariation Heredity 114431ndash440

Sinclair BJ Gibbs AG Roberts SP 2007 Gene transcription during ex-posure to and recovery from cold and desiccation stress inDrosophila melanogaster Insect Mol Biol 16435ndash443

Sloggett C Wakefield M Philip G Pope B 2014 Rubramdashflexible distrib-uted pipelines figshare Available from httpdxdoiorg106084m9figshare895626

Smit AFA Hubley R Green P 2013-2015 RepeatMasker Open-403Available from httpwwwrepeatmaskerorg

Seurooderberg JA Birse RT Neuroassel DR 2011 Insulin production and signalingin renal tubules of Drosophila is under control of tachykinin-relatedpeptide and regulates stress resistance PLoS One 6e19866

Soslashrensen JG Nielsen MM Loeschcke V 2007 Gene expression profileanalysis of Drosophila melanogaster selected for resistance to envi-ronmental stressors J Evol Biol 201624ndash1636

St Pierre SE Ponting L Stefancsik R McQuilton P 2014 FlyBase102mdashadvanced approaches to interrogating FlyBase Nucleic AcidsRes 42D780ndashD788

Stevenson M Nunes T Heuser C Marshall J Sanchez J Thornton RReiczigel J Robison-Cox J Sebastiani P Solymos P et al 2015 epiRTools for the analysis of epidemiological data R package version09-62 Available from httpCRANR-projectorgpackagefrac14epiR

Storey JD Tibshirani R 2003 Statistical significance for genomewidestudies Proc Natl Acad Sci U S A 1009440ndash9445

Telonis-Scott M Gane M DeGaris S Sgro CM Hoffmann AA 2012 Highresolution mapping of candidate alleles for desiccation resistancein Drosophila melanogaster under selection Mol Biol Evol291335ndash1351

Telonis-Scott M Guthridge KM Hoffmann AA 2006 A new set oflaboratory-selected Drosophila melanogaster lines for the analysisof desiccation resistance response to selection physiology and cor-related responses J Exp Biol 2091837ndash1847

Terhzaz S Cabrero P Robben JH Radford JC Hudson BD Milligan GDow JA Davies SA 2012 Mechanism and function of Drosophilacapa GPCR a desiccation stress-responsive receptor with functionalhomology to human neuromedinU receptor PLoS One 7e29897

Terhzaz S Overend G Sebastian S Dow JA Davies SA 2014 The Dmelanogaster capa-1 neuropeptide activates renal NF-kB signalingPeptides 53218ndash224

Terhzaz S Teets NM Cabrero P Henderson L Ritchie MG Nachman RJDow JA Denlinger DL Davies SA 2015 Insect capa neuropeptidesimpact desiccation and cold tolerance Proc Natl Acad Sci U S A1122882ndash2887

Thorat LJ Gaikwad SM Nath BB 2012 Trehalose as an indicator ofdesiccation stress in Drosophila melanogaster larvae a potentialmarker of anhydrobiosis Biochem Biophys Res Commun419638ndash642

Tobler R Franssen SU Kofler R Orozco-terWengel P Nolte VHermisson J Schl eurootterer C 2014 Massive habitat-specific ge-nomic response in D melanogaster populations during exper-imental evolution in hot and cold environments Mol Biol Evol31364ndash375

Turner TL Bourne EC Von Wettberg EJ Hu TT Nuzhdin SV 2010Population resequencing reveals local adaptation of Arabidopsislyrata to serpentine soils Nat Genet 42260ndash263

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Vermeulen CJ Soslashrensen P Kirilova Gagalova K Loeschcke V 2013Transcriptomic analysis of inbreeding depression in cold-sensitiveDrosophila melanogaster shows upregulation of the immune re-sponse J Evol Biol 261890ndash1902

Wang B Goode J Best J Meltzer J Schilman PE Chen J Garza D ThomasJB Montminy M 2008 The insulin-regulated CREB coactivatorTORC promotes stress resistance in Drosophila Cell Metab7434ndash444

Wang J Kean L Yang J Allan AK Davies SA Herzyk P Dow JA 2004Function-informed transcriptome analysis of Drosophila renaltubule Genome Biol 5R69

Wang K Li M Hakonarson H 2010 Analysing biological pathways ingenome-wide association studies Nat Rev Genet 11843ndash854

Wang L Jia P Wolfinger RD Chen X Zhao Z 2011 Gene set analysis ofgenome-wide association studies methodological issues and per-spectives Genomics 981ndash8

Weber AL Khan GF Magwire MM Tabor CL Mackay TF Anholt RR2012 Genome-wide association analysis of oxidative stress resistancein Drosophila melanogaster PLoS One 7e34745

Weeks AR McKechnie SW Hoffmann AA 2002 Dissecting adaptiveclinal variation markers inversions and sizestress associations inDrosophila melanogaster from a central field population Ecol Lett5756ndash763

Zhu Y Bergland AO Gonzalez J Petrov DA 2012 Empirical validation ofpooled whole genome population re-sequencing in Drosophila mel-anogaster PLoS One 7e41901

15

Page 9: Cross-Study Comparison Reveals Common Genomic, Network, … · 2016-02-06 · resistance using a high-throughput sequencing Pool-GWAS (genome-wide association study) approach ( Bastide

and Weeks 2007 Tobler et al 2014) but we found little as-sociation between the desiccation pool and inversion fre-quency changes consistent with the lack of latitudinalpatterns in desiccation resistanceand traitinversion fre-quency associations in east Australian D melanogaster(Hoffmann et al 2001 Weeks et al 2002)

Cross-Study Comparison Reveals a Core Set ofCandidates for a Complex Trait

Our data contain many candidate genes functions and path-ways for insect desiccation resistance but partitioning genu-ine adaptive loci from experimental noise remains asignificant challenge We therefore focus our efforts on ournovel comparative analysis where we identified 45 robustcandidates for further exploration Common genetic signa-tures are rarely documented in cross-study comparisons ofcomplex traits (Sarup et al 2011 Huang et al 2012 but cfVermeulen et al 2013) and never before for desiccation re-sistance Some degree of overlap between our studies likelyresults from a shared genetic background from similar sam-pling sites highlighting the replicability of these responsesOne limitation of this approach is the overpopulation ofthe ldquocorerdquo list by longer genes This result reflects the genelength bias inherent to GWAS approaches an issue that is notstraightforward to correct in a GWAS let alone in the com-parative framework applied here A focus on the commongenes functionally linked to the desiccation phenotype canprovide a meaningful framework for future research and alsosuggests that some large-effect candidate genes exhibit natu-ral variation contributing to the phenotype (table 1)

Other feasible candidates include genes expressed inthe MTs (Wang et al 2004) or essential for MT develop-ment (St Pierre et al 2014) providing further researchopportunity focused on the primary stress sensing andfluid secretion tissues Genes include one of the mosthighly expressed MT transcripts CG7084 as well as dncNckx30C slo cv-c and nkd Other aspects of stress resis-tance are also implicated CrebB for example is involvedin regulating insulin-regulated stress resistance and ther-mosensory behavior (Wang et al 2008) nkd also impactslarval cuticle development and shares a PDZ signalingdomain also present in the desiccation candidate desi(Kawano et al 2010 St Pierre et al 2014) Finally severalcore candidates were consistently reported in recent stud-ies suggesting generalized roles in stress responses andfitness including transcriptome studies on MT function(Wang et al 2004) inbreeding and cold resistance(Vermeulen et al 2013) oxidative stress (Weber et al2012) and genomic studies exploring age-specific SNPson fitness traits (Durham et al 2014) and genes likelyunder spatial selection in flies from climatically divergenthabitats (Fabian et al 2012)

Several of the enriched GO categories from the core listalso make biological sense including stimulus response(almost half the suite) defense and signaling particularlyin pathways involved in MT stress sensing (Davies et al 20122013 2014) Functional categories enriched for sensory

organ development were highlighted consistent with envi-ronmental sensing categories from desert Drosophila tran-scriptomics (Matzkin and Markow 2009 Rajpurohit et al2013) and stress detection via phototransduction pathwaysin D melanogaster under a range of stressors including des-iccation (Soslashrensen et al 2007)

Evidence for Conserved Signatures at Higher LevelOrganization

Beyond the gene level we observed a striking degree ofsimilarity between this study and that of Telonis-Scottet al (2012) at the network level where significantly over-lapping protein networks were constructed from largely dif-ferent seed protein sets This supports a scenario where theldquobiological information flow from DNA to phenotyperdquo(Civelek and Lusis 2014) contains inherent redundancywhere alternative genetic solutions underlie phenotypesand functions Resistance to perturbation by genetic or en-vironmental variation appears to increase with increasinghierarchical complexity that is phenotype ldquobufferingrdquo (Fuet al 2009) The same functional network in different pop-ulationsgenetic backgrounds can be impacted but theexact genes and SNPs responding to perturbation will notnecessarily overlap In Drosophila different sets of loci fromthe same founders mapped to the same PPI network in thecase of two complex traits chill coma recovery and startleresponse (Huang et al 2012) Furthermore interspecificshared networkspathways from different genes have beendocumented for numerous complex traits (Emilsson et al2008 Aytes et al 2014 Ichihashi et al 2014) providing op-portunities for investigating taxa where individual genefunction is poorly understood

Although the null distribution is not always obviouswhen comparing networks and their functions partly be-cause of reduced sensitivity from missing interactions(Leiserson et al 2013) here the shared network signaturesportray a plausible multifaceted stress response involvingstress sensing rapid gene accessibility and expression Thepredominance of immune system and stress responsepathways are of interest with reference immunitystresspathway ldquocross-talkrdquo particularly in the MTs whereabiotic stressors elicit strong immunity transcriptionalresponses (Davies et al 2012) The higher order architec-tures reflect crucial aspects of rapid gene expression con-trol in response to stress from chromatin organization(Alexander and Lomvardas 2014) to transcription RNAdecay and translation (reviwed in de Nadal et al 2011) InDrosophila variation in transcriptional regulation of indi-vidual genes can underpin divergent desiccation pheno-types (Chung et al 2014) or be functionally inferred frompatterns of many genes elicited during desiccation stress(Rajpurohit et al 2013) Variants with potential to mod-ulate expression at different stages of gene expressioncontrol were evident in our GWAS data alone for exam-ple enrichment for 50UTR and splice variants at the SNPlevel and for chromatin and chromosome structure termsat the functional level

9

Genetic Overlap Versus Genetic differencesmdashCausesand Implications

Despite the common signatures at the gene and higher orderlevels between our studies pervasive differences were ob-served These can arise from differences in standing geneticvariation in the natural populations where different alleliccompositions impact gene-by-environment interactions andepistasis Our network overlap analyses highlight the potentialfor different molecular trajectories to result in similar func-tions (Leiserson et al 2013) consistent with the multiplephysiological pathways potentially available to D melanoga-ster under water stress (Chown et al 2011) Experimentaldesign also presumably contributed to the study differencesStrong directional selection as applied in the 2012 study canincrease the frequency of rare beneficial alleles and likely re-duced overall diversity than did the single-generation GWASdesign Furthermore selection regimes often impose addi-tional selective pressures (eg food deprivation during desic-cation) resulting in correlated responses such as starvationresistance with desiccation resistance and these factors differfrom the laboratory to the field Finally evidence suggests innatural D melanogaster that standing genetic variation canvary extensively with sampling season (Itoh et al 2010Bergland et al 2014) Although we cannot compare levelsof standing variation between our two studies evidence sug-gests that this may have affected the Dys gene where ourGWAS approach revealed no differentiation between thecontrol and resistant pools at this locus but rather detectedthe alleles associated with the selective sweep in Telonis-Scottet al (2012) in both pools at high frequency Whether thisresulted from sampling the later collection after a longdrought (2004ndash2008 wwwbomgovauclimate last accessedJanuary 13 2016) is unclear but this observation highlightsthe relevance of temporal sampling to standing genetic var-iation when attempting to link complex phenotypes togenotypes

Materials and Methods

Natural Population Sampling

Drosophila melanogaster was collected from vineyard waste atKilchorn Winery Romsey Australia in April 2010 Over 1200isofemale lines were established in the laboratory from wild-caught females Species identification was conducted on maleF1 to remove Drosophila simulans contamination and over900 D melanogaster isofemale lines were retained All flieswere maintained in vials of dextrose cornmeal media at25 C and constant light

Desiccation Assay

From each isofemale line ten inseminated F1 females weresorted by aspiration without CO2 and held in vials until 4ndash5days old For the desiccation assay 10 females per line (over9000 flies in total) were screened by placing groups of femalesinto empty vials covered with gauze and randomly assigninglines into 5 large sealed glass tanks containing trays of silicadesiccant (relative humidity [RH] lt10) The tanks werescored for mortality hourly and the final 558 surviving

individuals were selected for the desiccation-resistant pool(558 flies lt6 of the total) The control pool constitutedthe same number of females sampled from over 200 ran-domly chosen isofemale lines that were frozen prior to thedesiccation assay We chose this random control approachinstead of comparing the phenotypic extremes (ie the latemortality tail vs the ldquoearly mortalityrdquo tail) because unfavor-able environmental conditions experienced by the wild-caught dam may inflate environmental variance due to car-ryover effects in the F1 offspring (Schiffer et al 2013)

DNA Extraction and Sequencing

We used pooled whole-genome sequencing (Pool-seq)(Futschik and Schleurootterer 2010) to estimate and compareallele frequency differences between the most desiccation-resistant and randomly sampled flies from a wild populationto associate naturally segregating candidate SNPs with theresponse to low humidity For each treatment (resistantand control) genomic DNA was extracted from 50 femaleheads using the Qiagen DNeasy Blood and Tissue Kit accord-ing to the manufacturerrsquos instructions (Qiagen HildenGermany) Two extractions were combined to create fivepools per treatment which were barcoded separately tocreate technical replicates (total 10 pools of 100 fliesn frac14 1000) The DNA was fragmented using a CovarisS2 ma-chine (Covaris Inc Woburn MA) and libraries were pre-pared from 1 mg genomic DNA (per pool of 100 flies) usingthe Illumina TruSeq Prep module (Illumina San Diego CA)To minimize the effect of variation across lanes the ten poolswere combined in equal concentration (fig 3) and run mul-tiplexed in each lane of a full flow cell of an Illumina HiSeq2000 sequencer Clusters were generated using the TruSeq PECluster Kit v5 on a cBot and sequenced using Illumina TruSeqSBS v3 chemistry (Illumina) Fragmentation library construc-tion and sequencing were performed at the Micromon Next-Gen Sequencing facility (Monash University ClaytonAustralia)

Fig 3 Experimental design See text for further explanation

10

Data Processing and Mapping

Processing was performed on a high-performance computingcluster using the Rubra pipeline system that makes use of theRuffus Python library (Goodstadt 2010) The final variant-calling pipeline was based on a pipeline developed by ClareSloggett (Sloggett et al 2014) and is publicly available athttpsgithubcomgriffinpGWAS_pipeline (last accessedJanuary 13 2016) Raw sequence reads were processed persample per lane with P1 and P2 read files processed sepa-rately initially Trimming was performed with trimmomatic v030 (Lohse et al 2012) Adapter sequences were also re-moved Leading and trailing bases with a quality scorebelow 30 were trimmed and a sliding window (width frac14 10quality threshold frac14 25) was used Reads shorter than 40 bpafter trimming were discarded Quality before and after trim-ming was examined using FastQC v 0101 (Andrews 2014)

Posttrimming reads that remained paired were used asinput into the alignment step Alignment to the D melano-gaster reference genome version r553 was performed withbwa v 062 (Li and Durbin 2009) using the bwa aln commandand the following options No seeding (-l 150) 1 rate ofmissing alignments given 2 uniform base error rate (-n001) a maximum of 2 gap opens per read (-o 2) a maximumof 12 gap extensions per read (-e 12) and long deletionswithin 12 bp of 30 end disallowed (-d 12) The default valueswere used for all other options Output files were then con-verted to sorted bam files using the bwa sampe commandand the ldquoSortSamrdquo option in Picard v 196

At this point the bam files were merged into one file persample using PicardMerge Duplicates were identified withPicard MarkDuplicates The tool RealignerTargetCreator inGATK v 26-5 (DePristo et al 2011) was applied to the tenbam files simultaneously producing a list of intervals contain-ing probable indels around which local realignment would beperformed This was used as input for the IndelRealigner toolwhich was also applied to all ten bam files simultaneously

To investigate the variation among technical replicatesallele frequency was estimated based on read counts foreach technical replicate separately for 100000 randomly cho-sen SNPs across the genome After excluding candidate des-iccation-resistance SNPs and SNPs that may have been falsepositives due to sequencing error (those with mean frequencylt004 across replicates) 1803 SNPs remained For each locuswe calculated the variance in allele frequency estimate amongthe five control replicates and the variance among the fivedesiccation-resistance replicates We compared this with thevariance due to pool category (control vs desiccation resis-tant) using analysis of variance (ANOVA) We also calculatedthe mean pairwise difference in allele frequency estimateamong control and among desiccation-resistance replicatesand the mean concordance correlation coefficient was calcu-lated over all pairwise comparisons within pool category(using the epiccc function in the epiR package Stevensonet al 2015)

The following approach was then taken to identify SNPsdiffering between the desiccation-resistant and controlgroups Based on the technical replicate analysis we merged

the five ldquodesiccation-resistantrdquo and the five random controlsamples into two bam files A pileup file was created withsamtools v 0119 (Li et al 2009) and converted to a ldquosyncrdquo fileusing the mpileup2sync script in Popoolation2 v 1201 (Kofleret al 2011) retaining reads with a mapping quality of at least20 and bases with a minimum quality score of 20 Reads withunmapped mates were also retained (option ndashA in samtoolsmpileup) For this step we used a repeat-masked version ofthe reference genome created with RepeatMasker 403 (Smitet al 2013-2015) and a repeat file containing all annotatedtransposons from D melanogaster including shared ancestralsequences (Flybase release r557) plus all repetitive elementsfrom RepBase release v1904 for D melanogaster Simple re-peats small RNA genes and bacterial insertions were notmasked (options ndashnolow ndashnorna ndashno_is) The rmblastsearch engine was used

Association Analysis

We then performed Fisherrsquos exact test with the Fisher testscript in Popoolation2 (Kofler et al 2011) We set the min-imum count (of the minor allele) frac14 10 minimum coverage(in both populations) frac14 20 and maximum coverage frac1410000 The test was performed for each SNP (slidingwindow off) Although there are numerous approaches tocalculating genome-wide significance thresholds in aPoolseq framework currently there is no consensus on anappropriate standardized statistical method Methods in-clude simulation approaches (Orozco-terWengel et al2012 Bastide et al 2013 Burke et al 2014) correctionusing a null distribution based on estimating genetic driftfrom replicated artificial selection line allele frequency vari-ance (Turner and Miller 2012) use of a simple thresholdbased on a minimum allele frequency difference (Turneret al 2010) and the FDR correction suggested by Storeyand Tibshirani (2003) (used by Fabian et al 2012)

We determined based on our design to calculate the FDRthreshold by comparison with a simulated distribution of nullP values following the approach of Bastide et al (2013) Foreach SNP locus a P value was calculated by simulating thedistribution of read counts between the major and minorallele according to a beta-binomial distribution with meana(a+b) and variance (ab)((a+b)2(a+b+1)) Thismodel allowed for two types of sampling variationStochastic variation in allele sampling across the phenotypesand the background variation in the representation of allelesin each pool caused by library preparation artifacts or sto-chastic variation driven by unequal coverage of the poolsThese sources of variation have also been recognized else-where as being important in interpreting pooled sequenceallele calls and identifying significant differentiation betweenpooled samples (Lynch et al 2014) The value of afrac14 37 waschosen by chi-square test as the best match to the observedP value distribution (afrac14 10 to afrac14 40 were tested) with acorresponding bfrac14 4315 to reflect the coverage depth differ-ence between the random control (487) and desiccation-resistant (568) pools These values were then used tosimulate a null data set 10 the size of the observed data

11

set Because the null P value distribution still did not fit theobserved P value distribution particularly well (supplemen-tary fig S2 Supplementary Material online) these values werenot used for a standard FDR correction Instead a P valuethreshold was calculated based on the lowest 005 of thenull P values similar in approach to Orozco-terWengel et al(2012)

Inversion Analysis

In D melanogaster several cosmopolitan inversion polymor-phisms show cross-continent latitudinal patterns that explaina substantial proportion of clinal variation for many traitsincluding thermal tolerance (reviewed in Hoffmann andWeeks 2007) Patterns of disequilibria generated betweenloci in the vicinity of the rearrangement can obscure signalsof selection (Hoffmann and Weeks 2007) and inversion fre-quencies are routinely considered in genomic studies ofnatural populations sampled from known latitudinal clines(Fabian et al 2012 Kapun et al 2014) We assessedassociations between inversion frequencies and our controland resistant pools at SNPs diagnostic for the following in-versions In3R(Payne) (Anderson et al 2005 Kapun et al2014) In(2L)t (Andolfatto et al 1999 Kapun et al 2014)In(2R)Ns In(3L)P In(3R)C In(3R)K and In(3R)Mo (Kapunet al 2014) Between one and five diagnostic SNPs were in-vestigated for each inversion (depending on availability) Weconsidered an inversion to be associated with desiccationresistance if the frequency of its diagnostic SNP differed sig-nificantly between the control and desiccation-resistant poolsby a Fisherrsquos exact test

Genome Feature Annotation

The list of candidate SNPs was converted to vcf format using acustom R script setting the alternate allele as the allele thatincreased in frequency from the control to the desiccation-resistant pool and setting the reference allele as the otherallele present in cases where both the major and minor allelesdiffered from the reference genome The genomic features inwhich the candidate SNPs occurred were annotated usingsnpEff v40 first creating a database from the D melanogasterv553 genome annotation with the default approach(Cingolani et al 2012) The default annotation settings wereused except for setting the parameter ldquo-ud 0rdquo to annotateSNPs to genes only if they were within the gene body Using2 tests on SNP counts we tested for over- or under-enrichment of feature types by calculating the proportionof lsquoallrsquo features from all SNPs detected in the study and theproportion of ldquocandidaterdquo SNPs identified as differentiated indesiccation-resistant flies (Fabian et al 2012)

GO Analysis

GO analyses was used to associate the candidates with theirbiological functions (Ashburner et al 2000) Previously devel-oped for biological pathway analysis of global gene expressionstudies (Wang et al 2010) typical GO analysis assumes that allgenes are sampled independently with equal probabilityGWAS violate these criteria where SNPs are more likely to

occur in longer genes than in shorter genes resulting in highertype I error rates in GO categories overpopulated with longgenes Gene length as well as overlapping gene biases (Wanget al 2011) were accounted for using Gowinda (Kofler andSchleurootterer 2012) which implements a permutation strategyto reduce false positives Gowinda does not reconstruct LDbetween SNPs but operates in two modes underpinned bytwo extreme assumptions 1) Gene mode (assumes all SNPs ina gene are in LD) and 2) SNP mode (assumes all SNPs in agene are completely independent)

The candidates were examined in both modes but giventhe stringency of gene mode high-resolution SNP modeoutput was utilized for the final analysis with the followingparameters 106 simulations minimum significance frac141 andminimum genes per GO categoryfrac14 3 (to exclude gene-poorcategories) P values were corrected using the Gowinda FDRalgorithm GO annotations from Flybase r557 were applied

Candidate Gene Analysis

In conjunction with the GO analysis we also examinedknown candidate genes associated with survival under lowRH to better characterize possible biological functions of ourGWAS candidates We took a conservative approach to can-didate gene assignment where the SNPs were mapped withinthe entire gene region without up- and downstream exten-sions Genes were curated from FlyBase and the literature(table 1) predominantly using detailed functional experimen-tal evidence as criteria for functional desiccation candidates

Network Analysis

We next obtained a higher level summary of the candidategene list using PPI network analysis The full candidate genelist was used to construct a first-order interaction networkusing all Drosophila PPIs listed in the Drosophila InteractionDatabase v 2014_10 (avoiding interologs) (Murali et al 2011)This ldquofullrdquo PPI network includes manually curated data frompublished literature and experimental data for 9633 proteinsand 93799 interactions For the network construction theigraph R package was used to build a subgraph from the fullPPI network containing the candidate proteins and their first-order interacting proteins (Csardi and Nepusz 2006) Self-connections were removed Functional enrichment analysiswas conducted on the list of network genes using FlyMinev400 (wwwflymineorg last accessed January 13 2016) withthe following parameters for GO enrichment and KEGGReactome pathway enrichment BenjaminindashHochberg FDRcorrection P lt 005 background frac14 all D melanogastergenes in our full PPI network and Ontology (for GO enrich-ment only) frac14 biological process or molecular function

Cross-Study Comparison Overlap with Our PreviousWork

Following on from the comparison between our gene list andthe candidate genes suggested from the literature we quan-titatively evaluated the degree of overlap between the candi-date genes identified in this study with alleles mapped in ourprevious study of flies collected from the same geographic

12

region (Telonis-Scott et al 2012) The ldquooverlaprdquo analysis wasperformed similarly for the GWAS candidate gene and net-work-level approaches First the gene list overlap was testedbetween the two studies using Fisherrsquos exact test where the382 genes identified from the candidate SNPs (this study)were compared with 416 genes identified from candidatesingle feature polymorphisms (Telonis-Scott et al 2012)The contingency table was constructed using the numberof annotated genes in the D melanogaster R553 genomebuilt as an approximation for the total number of genestested in each study and took the following form

where A frac14 gene list from Telonis-Scott et al (2012)n frac14 416 B frac14 gene list from this study n frac14 382 N frac14 totalnumber of genes in D melanogaster reference genome r553n frac14 17106

To annotate genome features in the overlap candidategene list we used the SNP data (this study) which betterresolves alleles than array-based genotyping Due to this lim-itation we did not compare exact genomic coordinates be-tween the studies but considered overlap to occur whendifferentiated alleles were detected in the same gene inboth studies We did however examine the allele frequenciesaround the Dys-RF promoter which was sequenced sepa-rately revealing a selective sweep in the experimental evolu-tion lines (Telonis-Scott et al 2012)

The overlapping candidates were tested for functional en-richment using a basic gene list approach As genes ratherthan SNPs were analyzed here Fly Mine v400 (wwwflymineorg) was applied with the following parameters Gene lengthnormalization BenjaminindashHochberg FDR correction Plt 005Ontology frac14 biological process and default background (allGO-annotated D melanogaster genes)

Finally we constructed a second PPI network from thecandidate genes mapped in Telonis-Scott et al (2012) to ex-amine system-level overlap between the two studies The 416genes mapped to 337 protein seeds and the network wasconstructed as described above The degree of network over-lap between the two studies was tested using simulation Ineach iteration a gene set of the same length as the list fromthis study and a gene set of the same length as the list fromTelonis-Scott et al (2012) were resampled randomly from theentire D melanogaster mapped gene annotation (r553) Eachresampled set was then used to build a first-order network aspreviously described The overlap network between the tworesampled sets was calculated using the graphintersectioncommand from the igraph package and zero-degree nodes(proteins with no interactions) were removed Overlap wasquantified by counting the number of nodes and the numberof edges in this overlap network The observed overlap wascompared with a simulated null distribution for both nodeand edge number where n frac14 1000 simulation iterations

Supplementary MaterialSupplementary figures S1ndashS4 and tables S1ndashS5 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

We thank Robert Good for the fly sampling and JenniferShirriffs and Lea and Alan Rako for outstanding assistancewith laboratory culture and substantial desiccation assaysWe are grateful to Melissa Davis for advice on network build-ing and comparison approaches and to five anonymous re-viewers whose comments improved the manuscript Thiswork was supported by a DECRA fellowship DE120102575and Monash University Technology voucher scheme toMTS an ARC Future Fellowship and Discovery Grants toCMS an ARC Laureate Fellowship to AAH and a Scienceand Industry Endowment Fund grant to CMS and AAH

ReferencesAlexander JM Lomvardas S 2014 Nuclear architecture as an epigenetic

regulator of neural development and function Neuroscience26439ndash50

Anderson AR Hoffmann AA McKechnie SW Umina PA Weeks AR2005 The latitudinal cline in the In(3R)Payne inversion polymor-phism has shifted in the last 20 years in Australian Drosophilamelanogaster populations Mol Ecol 14851ndash858

Andolfatto P Wall JD Kreitman M 1999 Unusual haplotype structureat the proximal breakpoint of In(2L)t in a natural population ofDrosophila melanogaster Genetics 1531297ndash1311

Andrews S 2014 FastQC Available from httpwwwbioinformaticsbbsrcacukprojectsfastqc

Ashburner M Ball CA Blake JA Botstein D Butler H Cherry JM DavisAP Dolinski K Dwight SS Eppig JT et al 2000 Gene ontology toolfor the unification of biology The Gene Ontology Consortium NatGenet 2525ndash29

Aytes A Mitrofanova A Lefebvre C Alvarez Mariano J Castillo-MartinM Zheng T Eastham James A Gopalan A Pienta Kenneth J ShenMM et al 2014 Cross-species regulatory network analysis identifiesa synergistic interaction between FOXM1 and CENPF that drivesprostate cancer malignancy Cancer Cell 25638ndash651

Bastide H Betancourt A Nolte V Tobler R Steuroobe P Futschik ASchleurootterer C 2013 A genome-wide fine-scale map of natural pig-mentation variation in Drosophila melanogaster PLoS Genet9e1003534

Bergland AO Behrman EL OrsquoBrien KR Schmidt PS Petrov DA 2014Genomic evidence of rapid and stable adaptive oscillations overseasonal time scales in Drosophila PLoS Genet 10e1004775

Bonebrake TC Mastrandrea MD 2010 Tolerance adaptation and pre-cipitation changes complicate latitudinal patterns of climate changeimpacts Proc Natl Acad Sci U S A 10712581ndash12586

Bradley TJ 2009 Animal osmoregulation Oxford Oxford UniversityPress

Burke MK Liti G Long AD 2014 Standing genetic variation drivesrepeatable experimental evolution in outcrossing populations ofSaccharomyces cerevisiae Mol Biol Evol 313228ndash3239

Chown SL 2012 Trait-based approaches to conservation physiologyforecasting environmental change risks from the bottom upPhilos Trans R Soc B 3671615ndash1627

Chown SL Nicolson SW 2004 Insect physiological ecology mechanismsand patterns Oxford Oxford University Press

Chown SL Soslashrensen JG Terblanche JS 2011 Water loss in insectsan environmental change perspective J Insect Physiol571070ndash1084

Not in GWAS In GWAS

Not in Telonis-Scott et al (2012) N ndash intersect-AndashB A-intersect

In Telonis-Scott et al (2012) B-intersect Intersect(AB)

13

Chung H Loehlin DW Dufour HD Vaccarro K Millar JG Carroll SB2014 A single gene affects both ecological divergence and matechoice in Drosophila Science 3431148ndash1151

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly(Austin) 680ndash92

Civelek M Lusis AJ 2014 Systems genetics approaches to understandcomplex traits Nat Rev Genet 1534ndash48

Clusella-Trullas S Blackburn TM Chown SL 2011 Climatic predictors oftemperature performance curve parameters in ectotherms implycomplex responses to climate change Am Nat 177738ndash751

Csardi G Nepusz T 2006 The igraph software package for complexnetwork research InterJournal Complex Systems 1695 Availablefrom httpigraphorg

Davies SA Cabrero P Overend G Aitchison L Sebastian S Terhzaz SDow JA 2014 Cell signalling mechanisms for insect stress toleranceJ Exp Biol 217119ndash128

Davies SA Cabrero P Povsic M Johnston NR Terhzaz S Dow JA 2013Signaling by Drosophila capa neuropeptides Gen Comp Endocrinol18860ndash66

Davies SA Overend G Sebastian S Cundall M Cabrero P Dow JATerhzaz S 2012 Immune and stress response lsquocross-talkrsquo in theDrosophila Malpighian tubule J Insect Physiol 58488ndash497

Day JP Dow JA Houslay MD Davies SA 2005 Cyclic nucleotide phos-phodiesterases in Drosophila melanogaster Biochem J 388333ndash342

de Nadal E Ammerer G Posas F 2011 Controlling gene expression inresponse to stress Nat Rev Genet 12833ndash845

DePristo MA Banks E Poplin R Garimella KV Maguire JR Hartl CPhilippakis AA del Angel G Rivas MA Hanna M et al 2011 Aframework for variation discovery and genotyping using next-gen-eration DNA sequencing data Nat Genet 43491ndash498

Djawdan M Chippindale AK Rose MR Bradley TJ 1998 Metabolic re-serves and evolved stress resistance in Drosophila melanogasterPhysiol Zool 71584ndash594

Durham MF Magwire MM Stone EA Leips J 2014 Genome-wide anal-ysis in Drosophila reveals age-specific effects of SNPs on fitness traitsNat Commun 54338

Emilsson V Thorleifsson G Zhang B Leonardson AS Zink F Zhu JCarlson S Helgason A Walters GB Gunnarsdottir S et al 2008Genetics of gene expression and its effect on disease Nature452423ndash428

Fabian DK Kapun M Nolte V Kofler R Schmidt PS Schleurootterer C FlattT 2012 Genome-wide patterns of latitudinal differentiation amongpopulations of Drosophila melanogaster from North America MolEcol 214748ndash4769

Foley BR Telonis-Scott M 2011 Quantitative genetic analysis suggestscausal association between cuticular hydrocarbon composition anddesiccation survival in Drosophila melanogaster Heredity 10668ndash77

Folk DG Han C Bradley TJ 2001 Water acquisition and partitioning inDrosophila melanogaster effects of selection for desiccation-resis-tance J Exp Biol 2043323ndash3331

Fowler MA Montell C 2013 Drosophila TRP channels and animal be-havior Life Sci 92394ndash403

Fu J Keurentjes JJB Bouwmeester H America T Verstappen FWA WardJL Beale MH de Vos RCH Dijkstra M Scheltema RA et al 2009System-wide molecular evidence for phenotypic buffering inArabidopsis Nat Genet 41166ndash167

Futschik A Schleurootterer C 2010 The next generation of molecular mar-kers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Gibbs AG Fukuzato F Matzkin LM 2003 Evolution of water conserva-tion mechanisms in Drosophila J Exp Biol 2061183ndash1192

Gibbs AG Matzkin LM 2001 Evolution of water balance in the genusDrosophila J Exp Biol 2042331ndash2338

Goodstadt L 2010 Ruffus a lightweight Python library for computa-tional pipelines Bioinformatics 262778ndash2779

Hadley NF 1994 Water relations of terrestrial arthropods San Diego(CA) Academic Press

Hoffmann AA Hallas R Sinclair C Mitrovski P 2001 Levels of variationin stress resistance in Drosophila among strains local populationsand geographic regions patterns for desiccation starvation coldresistance and associated traits Evolution 551621ndash1630

Hoffmann AA Parsons PA 1989a An integrated approach to environ-mental stress tolerance and life-history variation desiccation toler-ance in Drosophila Biol J Linn Soc 37117ndash136

Hoffmann AA Parsons PA 1989b Selection for increased desiccationresistance in Drosophila melanogaster additive genetic control andcorrelated responses for other stresses Genetics 122837ndash845

Hoffmann AA Sgro CM 2011 Climate change and evolutionary adap-tation Nature 470479ndash485

Hoffmann AA Weeks AR 2007 Climatic selection on genes and traitsafter a 100 year-old invasion a critical look at the temperate-tropicalclines in Drosophila melanogaster from eastern Australia Genetica129133ndash147

Huang W Richards S Carbone MA Zhu D Anholt RR Ayroles JFDuncan L Jordan KW Lawrence F Magwire MM et al 2012Epistasis dominates the genetic architecture of Drosophila quanti-tative traits Proc Natl Acad Sci U S A 10915553ndash15559

Huang Z Tunnacliffe A 2004 Response of human cells to desiccationcomparison with hyperosmotic stress response J Physiol558181ndash191

Huang Z Tunnacliffe A 2005 Gene induction by desiccation stress inhuman cell cultures FEBS Lett 5794973ndash4977

Ichihashi Y Aguilar-Martınez JA Farhi M Chitwood DH Kumar RMillon LV Peng J Maloof JN Sinha NR 2014 Evolutionary develop-mental transcriptomics reveals a gene network module regulatinginterspecific diversity in plant leaf shape Proc Natl Acad Sci U S A111E2616ndashE2621

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Johnson WA Carder JW 2012 Drosophila nociceptors mediate larvalaversion to dry surface environments utilizing both the painless TRPchannel and the DEGENaC subunit PPK1 PLoS One 7e32878

Kahsai L Kapan N Dircksen H Winther AM Nassel DR 2010 Metabolicstress responses in Drosophila are modulated by brain neurosecre-tory cells that produce multiple neuropeptides PLoS One 5e11480

Kapun M van Schalkwyk H McAllister B Flatt T Schleurootterer C 2014Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogasterMol Ecol 231813ndash1827

Kawano T Shimoda M Matsumoto H Ryuda M Tsuzuki S Hayakawa Y2010 Identification of a gene Desiccate contributing to desicca-tion resistance in Drosophila melanogaster J Biol Chem28538889ndash38897

Kellermann V van Heerwaarden B Sgro CM Hoffmann AA 2009Fundamental evolutionary limits in ecological traits driveDrosophila species distributions Science 3251244ndash1246

Kofler R Nolte V Schleurootterer C 2016 The impact of library preparationprotocols on the consistency of allele frequency estimates in Pool-Seq data Mol Ecol Resour 16118ndash122

Kofler R Pandey RV Schleurootterer C 2011 PoPoolation2 identifying dif-ferentiation between populations using sequencing of pooled DNAsamples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schleurootterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics282084ndash2085

Leiserson MDM Eldridge JV Ramachandran S Raphael BJ 2013Network analysis of GWAS data Curr Opin Genet Dev 23602ndash610

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

14

Liu Y Luo J Carlsson MA Neuroassel DR 2015 Serotonin and insulin-likepeptides modulate leucokinin-producing neurons that affectfeeding and water homeostasis in Drosophila J Comp Neurol5231840ndash1863

Lohse M Bolger AM Nagel A Fernie AR Lunn JE Stitt M Usadel B 2012RobiNA a user-friendly integrated software solution for RNA-Seq-based transcriptomics Nucleic Acids Res 40W622ndashW627

Long A Liti G Luptak A Tenaillon O 2015 Elucidating the moleculararchitecture of adaptation via evolve and resequence experimentsNat Rev Genet 16567ndash582

Lynch M Bost D Wilson S Maruki T Harrison S 2014 Population-genetic inference from pooled-sequencing data Genome Biol Evol61210ndash1218

Mackay TFC Stone EA Ayroles JF 2009 The genetics of quantitativetraits challenges and prospects Nat Rev Genet 10565ndash577

Marron MT Markow TA Kain KJ Gibbs AG 2003 Effects of starvationand desiccation on energy metabolism in desert and mesicDrosophila J Insect Physiol 49261ndash270

Matzkin LM Markow TA 2009 Transcriptional regulation of metabo-lism associated with the increased desiccation resistance of thecactophilic Drosophila mojavensis Genetics 1821279ndash1288

Murali T Pacifico S Yu J Guest S Roberts GG Finley RL Jr 2011 DroID2011 a comprehensive integrated resource for protein transcrip-tion factor RNA and gene interactions for Drosophila Nucleic AcidsRes 39D736ndashD743

Nuzhdin SV Turner TL 2013 Promises and limitations of hitchhikingmapping Curr Opin Genet Dev 23694ndash699

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Picard Picard A set of Java command line tools for manipulating high-throughput seqeuncing (HTS) data and formats Available fromhttpbroadinstitutegithubiopicard

Qiu Y Tittiger C Wicker-Thomas C Le Goff G Young S Wajnberg EFricaux T Taquet N Blomquist GJ Feyereisen R 2012 An insect-specific P450 oxidative decarbonylase for cuticular hydrocarbon bio-synthesis Proc Natl Acad Sci U S A 10914858ndash14863

Rajpurohit S Oliveira CC Etges WJ Gibbs AG 2013 Functional genomicand phenotypic responses to desiccation in natural populations of adesert Drosophilid Mol Ecol 222698ndash2715

Sarup P Soslashrensen JG Kristensen TN Hoffmann AA Loeschcke V PaigeKN Soslashrensen P 2011 Candidate genes detected in transcriptomestudies are strongly dependent on genetic background PLoS One6e15644

Schiffer M Hangartner S Hoffmann AA 2013 Assessing the relativeimportance of environmental effects carry-over effects and speciesdifferences in thermal stress resistance a comparison ofDrosophilids across field and laboratory generations J Exp Biol2163790ndash3798

Schleurootterer C Kofler R Versace E Tobler R Franssen SU 2015Combining experimental evolution with next-generation sequen-cing A powerful tool to study adaptation from standing geneticvariation Heredity 114431ndash440

Sinclair BJ Gibbs AG Roberts SP 2007 Gene transcription during ex-posure to and recovery from cold and desiccation stress inDrosophila melanogaster Insect Mol Biol 16435ndash443

Sloggett C Wakefield M Philip G Pope B 2014 Rubramdashflexible distrib-uted pipelines figshare Available from httpdxdoiorg106084m9figshare895626

Smit AFA Hubley R Green P 2013-2015 RepeatMasker Open-403Available from httpwwwrepeatmaskerorg

Seurooderberg JA Birse RT Neuroassel DR 2011 Insulin production and signalingin renal tubules of Drosophila is under control of tachykinin-relatedpeptide and regulates stress resistance PLoS One 6e19866

Soslashrensen JG Nielsen MM Loeschcke V 2007 Gene expression profileanalysis of Drosophila melanogaster selected for resistance to envi-ronmental stressors J Evol Biol 201624ndash1636

St Pierre SE Ponting L Stefancsik R McQuilton P 2014 FlyBase102mdashadvanced approaches to interrogating FlyBase Nucleic AcidsRes 42D780ndashD788

Stevenson M Nunes T Heuser C Marshall J Sanchez J Thornton RReiczigel J Robison-Cox J Sebastiani P Solymos P et al 2015 epiRTools for the analysis of epidemiological data R package version09-62 Available from httpCRANR-projectorgpackagefrac14epiR

Storey JD Tibshirani R 2003 Statistical significance for genomewidestudies Proc Natl Acad Sci U S A 1009440ndash9445

Telonis-Scott M Gane M DeGaris S Sgro CM Hoffmann AA 2012 Highresolution mapping of candidate alleles for desiccation resistancein Drosophila melanogaster under selection Mol Biol Evol291335ndash1351

Telonis-Scott M Guthridge KM Hoffmann AA 2006 A new set oflaboratory-selected Drosophila melanogaster lines for the analysisof desiccation resistance response to selection physiology and cor-related responses J Exp Biol 2091837ndash1847

Terhzaz S Cabrero P Robben JH Radford JC Hudson BD Milligan GDow JA Davies SA 2012 Mechanism and function of Drosophilacapa GPCR a desiccation stress-responsive receptor with functionalhomology to human neuromedinU receptor PLoS One 7e29897

Terhzaz S Overend G Sebastian S Dow JA Davies SA 2014 The Dmelanogaster capa-1 neuropeptide activates renal NF-kB signalingPeptides 53218ndash224

Terhzaz S Teets NM Cabrero P Henderson L Ritchie MG Nachman RJDow JA Denlinger DL Davies SA 2015 Insect capa neuropeptidesimpact desiccation and cold tolerance Proc Natl Acad Sci U S A1122882ndash2887

Thorat LJ Gaikwad SM Nath BB 2012 Trehalose as an indicator ofdesiccation stress in Drosophila melanogaster larvae a potentialmarker of anhydrobiosis Biochem Biophys Res Commun419638ndash642

Tobler R Franssen SU Kofler R Orozco-terWengel P Nolte VHermisson J Schl eurootterer C 2014 Massive habitat-specific ge-nomic response in D melanogaster populations during exper-imental evolution in hot and cold environments Mol Biol Evol31364ndash375

Turner TL Bourne EC Von Wettberg EJ Hu TT Nuzhdin SV 2010Population resequencing reveals local adaptation of Arabidopsislyrata to serpentine soils Nat Genet 42260ndash263

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Vermeulen CJ Soslashrensen P Kirilova Gagalova K Loeschcke V 2013Transcriptomic analysis of inbreeding depression in cold-sensitiveDrosophila melanogaster shows upregulation of the immune re-sponse J Evol Biol 261890ndash1902

Wang B Goode J Best J Meltzer J Schilman PE Chen J Garza D ThomasJB Montminy M 2008 The insulin-regulated CREB coactivatorTORC promotes stress resistance in Drosophila Cell Metab7434ndash444

Wang J Kean L Yang J Allan AK Davies SA Herzyk P Dow JA 2004Function-informed transcriptome analysis of Drosophila renaltubule Genome Biol 5R69

Wang K Li M Hakonarson H 2010 Analysing biological pathways ingenome-wide association studies Nat Rev Genet 11843ndash854

Wang L Jia P Wolfinger RD Chen X Zhao Z 2011 Gene set analysis ofgenome-wide association studies methodological issues and per-spectives Genomics 981ndash8

Weber AL Khan GF Magwire MM Tabor CL Mackay TF Anholt RR2012 Genome-wide association analysis of oxidative stress resistancein Drosophila melanogaster PLoS One 7e34745

Weeks AR McKechnie SW Hoffmann AA 2002 Dissecting adaptiveclinal variation markers inversions and sizestress associations inDrosophila melanogaster from a central field population Ecol Lett5756ndash763

Zhu Y Bergland AO Gonzalez J Petrov DA 2012 Empirical validation ofpooled whole genome population re-sequencing in Drosophila mel-anogaster PLoS One 7e41901

15

Page 10: Cross-Study Comparison Reveals Common Genomic, Network, … · 2016-02-06 · resistance using a high-throughput sequencing Pool-GWAS (genome-wide association study) approach ( Bastide

Genetic Overlap Versus Genetic differencesmdashCausesand Implications

Despite the common signatures at the gene and higher orderlevels between our studies pervasive differences were ob-served These can arise from differences in standing geneticvariation in the natural populations where different alleliccompositions impact gene-by-environment interactions andepistasis Our network overlap analyses highlight the potentialfor different molecular trajectories to result in similar func-tions (Leiserson et al 2013) consistent with the multiplephysiological pathways potentially available to D melanoga-ster under water stress (Chown et al 2011) Experimentaldesign also presumably contributed to the study differencesStrong directional selection as applied in the 2012 study canincrease the frequency of rare beneficial alleles and likely re-duced overall diversity than did the single-generation GWASdesign Furthermore selection regimes often impose addi-tional selective pressures (eg food deprivation during desic-cation) resulting in correlated responses such as starvationresistance with desiccation resistance and these factors differfrom the laboratory to the field Finally evidence suggests innatural D melanogaster that standing genetic variation canvary extensively with sampling season (Itoh et al 2010Bergland et al 2014) Although we cannot compare levelsof standing variation between our two studies evidence sug-gests that this may have affected the Dys gene where ourGWAS approach revealed no differentiation between thecontrol and resistant pools at this locus but rather detectedthe alleles associated with the selective sweep in Telonis-Scottet al (2012) in both pools at high frequency Whether thisresulted from sampling the later collection after a longdrought (2004ndash2008 wwwbomgovauclimate last accessedJanuary 13 2016) is unclear but this observation highlightsthe relevance of temporal sampling to standing genetic var-iation when attempting to link complex phenotypes togenotypes

Materials and Methods

Natural Population Sampling

Drosophila melanogaster was collected from vineyard waste atKilchorn Winery Romsey Australia in April 2010 Over 1200isofemale lines were established in the laboratory from wild-caught females Species identification was conducted on maleF1 to remove Drosophila simulans contamination and over900 D melanogaster isofemale lines were retained All flieswere maintained in vials of dextrose cornmeal media at25 C and constant light

Desiccation Assay

From each isofemale line ten inseminated F1 females weresorted by aspiration without CO2 and held in vials until 4ndash5days old For the desiccation assay 10 females per line (over9000 flies in total) were screened by placing groups of femalesinto empty vials covered with gauze and randomly assigninglines into 5 large sealed glass tanks containing trays of silicadesiccant (relative humidity [RH] lt10) The tanks werescored for mortality hourly and the final 558 surviving

individuals were selected for the desiccation-resistant pool(558 flies lt6 of the total) The control pool constitutedthe same number of females sampled from over 200 ran-domly chosen isofemale lines that were frozen prior to thedesiccation assay We chose this random control approachinstead of comparing the phenotypic extremes (ie the latemortality tail vs the ldquoearly mortalityrdquo tail) because unfavor-able environmental conditions experienced by the wild-caught dam may inflate environmental variance due to car-ryover effects in the F1 offspring (Schiffer et al 2013)

DNA Extraction and Sequencing

We used pooled whole-genome sequencing (Pool-seq)(Futschik and Schleurootterer 2010) to estimate and compareallele frequency differences between the most desiccation-resistant and randomly sampled flies from a wild populationto associate naturally segregating candidate SNPs with theresponse to low humidity For each treatment (resistantand control) genomic DNA was extracted from 50 femaleheads using the Qiagen DNeasy Blood and Tissue Kit accord-ing to the manufacturerrsquos instructions (Qiagen HildenGermany) Two extractions were combined to create fivepools per treatment which were barcoded separately tocreate technical replicates (total 10 pools of 100 fliesn frac14 1000) The DNA was fragmented using a CovarisS2 ma-chine (Covaris Inc Woburn MA) and libraries were pre-pared from 1 mg genomic DNA (per pool of 100 flies) usingthe Illumina TruSeq Prep module (Illumina San Diego CA)To minimize the effect of variation across lanes the ten poolswere combined in equal concentration (fig 3) and run mul-tiplexed in each lane of a full flow cell of an Illumina HiSeq2000 sequencer Clusters were generated using the TruSeq PECluster Kit v5 on a cBot and sequenced using Illumina TruSeqSBS v3 chemistry (Illumina) Fragmentation library construc-tion and sequencing were performed at the Micromon Next-Gen Sequencing facility (Monash University ClaytonAustralia)

Fig 3 Experimental design See text for further explanation

10

Data Processing and Mapping

Processing was performed on a high-performance computingcluster using the Rubra pipeline system that makes use of theRuffus Python library (Goodstadt 2010) The final variant-calling pipeline was based on a pipeline developed by ClareSloggett (Sloggett et al 2014) and is publicly available athttpsgithubcomgriffinpGWAS_pipeline (last accessedJanuary 13 2016) Raw sequence reads were processed persample per lane with P1 and P2 read files processed sepa-rately initially Trimming was performed with trimmomatic v030 (Lohse et al 2012) Adapter sequences were also re-moved Leading and trailing bases with a quality scorebelow 30 were trimmed and a sliding window (width frac14 10quality threshold frac14 25) was used Reads shorter than 40 bpafter trimming were discarded Quality before and after trim-ming was examined using FastQC v 0101 (Andrews 2014)

Posttrimming reads that remained paired were used asinput into the alignment step Alignment to the D melano-gaster reference genome version r553 was performed withbwa v 062 (Li and Durbin 2009) using the bwa aln commandand the following options No seeding (-l 150) 1 rate ofmissing alignments given 2 uniform base error rate (-n001) a maximum of 2 gap opens per read (-o 2) a maximumof 12 gap extensions per read (-e 12) and long deletionswithin 12 bp of 30 end disallowed (-d 12) The default valueswere used for all other options Output files were then con-verted to sorted bam files using the bwa sampe commandand the ldquoSortSamrdquo option in Picard v 196

At this point the bam files were merged into one file persample using PicardMerge Duplicates were identified withPicard MarkDuplicates The tool RealignerTargetCreator inGATK v 26-5 (DePristo et al 2011) was applied to the tenbam files simultaneously producing a list of intervals contain-ing probable indels around which local realignment would beperformed This was used as input for the IndelRealigner toolwhich was also applied to all ten bam files simultaneously

To investigate the variation among technical replicatesallele frequency was estimated based on read counts foreach technical replicate separately for 100000 randomly cho-sen SNPs across the genome After excluding candidate des-iccation-resistance SNPs and SNPs that may have been falsepositives due to sequencing error (those with mean frequencylt004 across replicates) 1803 SNPs remained For each locuswe calculated the variance in allele frequency estimate amongthe five control replicates and the variance among the fivedesiccation-resistance replicates We compared this with thevariance due to pool category (control vs desiccation resis-tant) using analysis of variance (ANOVA) We also calculatedthe mean pairwise difference in allele frequency estimateamong control and among desiccation-resistance replicatesand the mean concordance correlation coefficient was calcu-lated over all pairwise comparisons within pool category(using the epiccc function in the epiR package Stevensonet al 2015)

The following approach was then taken to identify SNPsdiffering between the desiccation-resistant and controlgroups Based on the technical replicate analysis we merged

the five ldquodesiccation-resistantrdquo and the five random controlsamples into two bam files A pileup file was created withsamtools v 0119 (Li et al 2009) and converted to a ldquosyncrdquo fileusing the mpileup2sync script in Popoolation2 v 1201 (Kofleret al 2011) retaining reads with a mapping quality of at least20 and bases with a minimum quality score of 20 Reads withunmapped mates were also retained (option ndashA in samtoolsmpileup) For this step we used a repeat-masked version ofthe reference genome created with RepeatMasker 403 (Smitet al 2013-2015) and a repeat file containing all annotatedtransposons from D melanogaster including shared ancestralsequences (Flybase release r557) plus all repetitive elementsfrom RepBase release v1904 for D melanogaster Simple re-peats small RNA genes and bacterial insertions were notmasked (options ndashnolow ndashnorna ndashno_is) The rmblastsearch engine was used

Association Analysis

We then performed Fisherrsquos exact test with the Fisher testscript in Popoolation2 (Kofler et al 2011) We set the min-imum count (of the minor allele) frac14 10 minimum coverage(in both populations) frac14 20 and maximum coverage frac1410000 The test was performed for each SNP (slidingwindow off) Although there are numerous approaches tocalculating genome-wide significance thresholds in aPoolseq framework currently there is no consensus on anappropriate standardized statistical method Methods in-clude simulation approaches (Orozco-terWengel et al2012 Bastide et al 2013 Burke et al 2014) correctionusing a null distribution based on estimating genetic driftfrom replicated artificial selection line allele frequency vari-ance (Turner and Miller 2012) use of a simple thresholdbased on a minimum allele frequency difference (Turneret al 2010) and the FDR correction suggested by Storeyand Tibshirani (2003) (used by Fabian et al 2012)

We determined based on our design to calculate the FDRthreshold by comparison with a simulated distribution of nullP values following the approach of Bastide et al (2013) Foreach SNP locus a P value was calculated by simulating thedistribution of read counts between the major and minorallele according to a beta-binomial distribution with meana(a+b) and variance (ab)((a+b)2(a+b+1)) Thismodel allowed for two types of sampling variationStochastic variation in allele sampling across the phenotypesand the background variation in the representation of allelesin each pool caused by library preparation artifacts or sto-chastic variation driven by unequal coverage of the poolsThese sources of variation have also been recognized else-where as being important in interpreting pooled sequenceallele calls and identifying significant differentiation betweenpooled samples (Lynch et al 2014) The value of afrac14 37 waschosen by chi-square test as the best match to the observedP value distribution (afrac14 10 to afrac14 40 were tested) with acorresponding bfrac14 4315 to reflect the coverage depth differ-ence between the random control (487) and desiccation-resistant (568) pools These values were then used tosimulate a null data set 10 the size of the observed data

11

set Because the null P value distribution still did not fit theobserved P value distribution particularly well (supplemen-tary fig S2 Supplementary Material online) these values werenot used for a standard FDR correction Instead a P valuethreshold was calculated based on the lowest 005 of thenull P values similar in approach to Orozco-terWengel et al(2012)

Inversion Analysis

In D melanogaster several cosmopolitan inversion polymor-phisms show cross-continent latitudinal patterns that explaina substantial proportion of clinal variation for many traitsincluding thermal tolerance (reviewed in Hoffmann andWeeks 2007) Patterns of disequilibria generated betweenloci in the vicinity of the rearrangement can obscure signalsof selection (Hoffmann and Weeks 2007) and inversion fre-quencies are routinely considered in genomic studies ofnatural populations sampled from known latitudinal clines(Fabian et al 2012 Kapun et al 2014) We assessedassociations between inversion frequencies and our controland resistant pools at SNPs diagnostic for the following in-versions In3R(Payne) (Anderson et al 2005 Kapun et al2014) In(2L)t (Andolfatto et al 1999 Kapun et al 2014)In(2R)Ns In(3L)P In(3R)C In(3R)K and In(3R)Mo (Kapunet al 2014) Between one and five diagnostic SNPs were in-vestigated for each inversion (depending on availability) Weconsidered an inversion to be associated with desiccationresistance if the frequency of its diagnostic SNP differed sig-nificantly between the control and desiccation-resistant poolsby a Fisherrsquos exact test

Genome Feature Annotation

The list of candidate SNPs was converted to vcf format using acustom R script setting the alternate allele as the allele thatincreased in frequency from the control to the desiccation-resistant pool and setting the reference allele as the otherallele present in cases where both the major and minor allelesdiffered from the reference genome The genomic features inwhich the candidate SNPs occurred were annotated usingsnpEff v40 first creating a database from the D melanogasterv553 genome annotation with the default approach(Cingolani et al 2012) The default annotation settings wereused except for setting the parameter ldquo-ud 0rdquo to annotateSNPs to genes only if they were within the gene body Using2 tests on SNP counts we tested for over- or under-enrichment of feature types by calculating the proportionof lsquoallrsquo features from all SNPs detected in the study and theproportion of ldquocandidaterdquo SNPs identified as differentiated indesiccation-resistant flies (Fabian et al 2012)

GO Analysis

GO analyses was used to associate the candidates with theirbiological functions (Ashburner et al 2000) Previously devel-oped for biological pathway analysis of global gene expressionstudies (Wang et al 2010) typical GO analysis assumes that allgenes are sampled independently with equal probabilityGWAS violate these criteria where SNPs are more likely to

occur in longer genes than in shorter genes resulting in highertype I error rates in GO categories overpopulated with longgenes Gene length as well as overlapping gene biases (Wanget al 2011) were accounted for using Gowinda (Kofler andSchleurootterer 2012) which implements a permutation strategyto reduce false positives Gowinda does not reconstruct LDbetween SNPs but operates in two modes underpinned bytwo extreme assumptions 1) Gene mode (assumes all SNPs ina gene are in LD) and 2) SNP mode (assumes all SNPs in agene are completely independent)

The candidates were examined in both modes but giventhe stringency of gene mode high-resolution SNP modeoutput was utilized for the final analysis with the followingparameters 106 simulations minimum significance frac141 andminimum genes per GO categoryfrac14 3 (to exclude gene-poorcategories) P values were corrected using the Gowinda FDRalgorithm GO annotations from Flybase r557 were applied

Candidate Gene Analysis

In conjunction with the GO analysis we also examinedknown candidate genes associated with survival under lowRH to better characterize possible biological functions of ourGWAS candidates We took a conservative approach to can-didate gene assignment where the SNPs were mapped withinthe entire gene region without up- and downstream exten-sions Genes were curated from FlyBase and the literature(table 1) predominantly using detailed functional experimen-tal evidence as criteria for functional desiccation candidates

Network Analysis

We next obtained a higher level summary of the candidategene list using PPI network analysis The full candidate genelist was used to construct a first-order interaction networkusing all Drosophila PPIs listed in the Drosophila InteractionDatabase v 2014_10 (avoiding interologs) (Murali et al 2011)This ldquofullrdquo PPI network includes manually curated data frompublished literature and experimental data for 9633 proteinsand 93799 interactions For the network construction theigraph R package was used to build a subgraph from the fullPPI network containing the candidate proteins and their first-order interacting proteins (Csardi and Nepusz 2006) Self-connections were removed Functional enrichment analysiswas conducted on the list of network genes using FlyMinev400 (wwwflymineorg last accessed January 13 2016) withthe following parameters for GO enrichment and KEGGReactome pathway enrichment BenjaminindashHochberg FDRcorrection P lt 005 background frac14 all D melanogastergenes in our full PPI network and Ontology (for GO enrich-ment only) frac14 biological process or molecular function

Cross-Study Comparison Overlap with Our PreviousWork

Following on from the comparison between our gene list andthe candidate genes suggested from the literature we quan-titatively evaluated the degree of overlap between the candi-date genes identified in this study with alleles mapped in ourprevious study of flies collected from the same geographic

12

region (Telonis-Scott et al 2012) The ldquooverlaprdquo analysis wasperformed similarly for the GWAS candidate gene and net-work-level approaches First the gene list overlap was testedbetween the two studies using Fisherrsquos exact test where the382 genes identified from the candidate SNPs (this study)were compared with 416 genes identified from candidatesingle feature polymorphisms (Telonis-Scott et al 2012)The contingency table was constructed using the numberof annotated genes in the D melanogaster R553 genomebuilt as an approximation for the total number of genestested in each study and took the following form

where A frac14 gene list from Telonis-Scott et al (2012)n frac14 416 B frac14 gene list from this study n frac14 382 N frac14 totalnumber of genes in D melanogaster reference genome r553n frac14 17106

To annotate genome features in the overlap candidategene list we used the SNP data (this study) which betterresolves alleles than array-based genotyping Due to this lim-itation we did not compare exact genomic coordinates be-tween the studies but considered overlap to occur whendifferentiated alleles were detected in the same gene inboth studies We did however examine the allele frequenciesaround the Dys-RF promoter which was sequenced sepa-rately revealing a selective sweep in the experimental evolu-tion lines (Telonis-Scott et al 2012)

The overlapping candidates were tested for functional en-richment using a basic gene list approach As genes ratherthan SNPs were analyzed here Fly Mine v400 (wwwflymineorg) was applied with the following parameters Gene lengthnormalization BenjaminindashHochberg FDR correction Plt 005Ontology frac14 biological process and default background (allGO-annotated D melanogaster genes)

Finally we constructed a second PPI network from thecandidate genes mapped in Telonis-Scott et al (2012) to ex-amine system-level overlap between the two studies The 416genes mapped to 337 protein seeds and the network wasconstructed as described above The degree of network over-lap between the two studies was tested using simulation Ineach iteration a gene set of the same length as the list fromthis study and a gene set of the same length as the list fromTelonis-Scott et al (2012) were resampled randomly from theentire D melanogaster mapped gene annotation (r553) Eachresampled set was then used to build a first-order network aspreviously described The overlap network between the tworesampled sets was calculated using the graphintersectioncommand from the igraph package and zero-degree nodes(proteins with no interactions) were removed Overlap wasquantified by counting the number of nodes and the numberof edges in this overlap network The observed overlap wascompared with a simulated null distribution for both nodeand edge number where n frac14 1000 simulation iterations

Supplementary MaterialSupplementary figures S1ndashS4 and tables S1ndashS5 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

We thank Robert Good for the fly sampling and JenniferShirriffs and Lea and Alan Rako for outstanding assistancewith laboratory culture and substantial desiccation assaysWe are grateful to Melissa Davis for advice on network build-ing and comparison approaches and to five anonymous re-viewers whose comments improved the manuscript Thiswork was supported by a DECRA fellowship DE120102575and Monash University Technology voucher scheme toMTS an ARC Future Fellowship and Discovery Grants toCMS an ARC Laureate Fellowship to AAH and a Scienceand Industry Endowment Fund grant to CMS and AAH

ReferencesAlexander JM Lomvardas S 2014 Nuclear architecture as an epigenetic

regulator of neural development and function Neuroscience26439ndash50

Anderson AR Hoffmann AA McKechnie SW Umina PA Weeks AR2005 The latitudinal cline in the In(3R)Payne inversion polymor-phism has shifted in the last 20 years in Australian Drosophilamelanogaster populations Mol Ecol 14851ndash858

Andolfatto P Wall JD Kreitman M 1999 Unusual haplotype structureat the proximal breakpoint of In(2L)t in a natural population ofDrosophila melanogaster Genetics 1531297ndash1311

Andrews S 2014 FastQC Available from httpwwwbioinformaticsbbsrcacukprojectsfastqc

Ashburner M Ball CA Blake JA Botstein D Butler H Cherry JM DavisAP Dolinski K Dwight SS Eppig JT et al 2000 Gene ontology toolfor the unification of biology The Gene Ontology Consortium NatGenet 2525ndash29

Aytes A Mitrofanova A Lefebvre C Alvarez Mariano J Castillo-MartinM Zheng T Eastham James A Gopalan A Pienta Kenneth J ShenMM et al 2014 Cross-species regulatory network analysis identifiesa synergistic interaction between FOXM1 and CENPF that drivesprostate cancer malignancy Cancer Cell 25638ndash651

Bastide H Betancourt A Nolte V Tobler R Steuroobe P Futschik ASchleurootterer C 2013 A genome-wide fine-scale map of natural pig-mentation variation in Drosophila melanogaster PLoS Genet9e1003534

Bergland AO Behrman EL OrsquoBrien KR Schmidt PS Petrov DA 2014Genomic evidence of rapid and stable adaptive oscillations overseasonal time scales in Drosophila PLoS Genet 10e1004775

Bonebrake TC Mastrandrea MD 2010 Tolerance adaptation and pre-cipitation changes complicate latitudinal patterns of climate changeimpacts Proc Natl Acad Sci U S A 10712581ndash12586

Bradley TJ 2009 Animal osmoregulation Oxford Oxford UniversityPress

Burke MK Liti G Long AD 2014 Standing genetic variation drivesrepeatable experimental evolution in outcrossing populations ofSaccharomyces cerevisiae Mol Biol Evol 313228ndash3239

Chown SL 2012 Trait-based approaches to conservation physiologyforecasting environmental change risks from the bottom upPhilos Trans R Soc B 3671615ndash1627

Chown SL Nicolson SW 2004 Insect physiological ecology mechanismsand patterns Oxford Oxford University Press

Chown SL Soslashrensen JG Terblanche JS 2011 Water loss in insectsan environmental change perspective J Insect Physiol571070ndash1084

Not in GWAS In GWAS

Not in Telonis-Scott et al (2012) N ndash intersect-AndashB A-intersect

In Telonis-Scott et al (2012) B-intersect Intersect(AB)

13

Chung H Loehlin DW Dufour HD Vaccarro K Millar JG Carroll SB2014 A single gene affects both ecological divergence and matechoice in Drosophila Science 3431148ndash1151

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly(Austin) 680ndash92

Civelek M Lusis AJ 2014 Systems genetics approaches to understandcomplex traits Nat Rev Genet 1534ndash48

Clusella-Trullas S Blackburn TM Chown SL 2011 Climatic predictors oftemperature performance curve parameters in ectotherms implycomplex responses to climate change Am Nat 177738ndash751

Csardi G Nepusz T 2006 The igraph software package for complexnetwork research InterJournal Complex Systems 1695 Availablefrom httpigraphorg

Davies SA Cabrero P Overend G Aitchison L Sebastian S Terhzaz SDow JA 2014 Cell signalling mechanisms for insect stress toleranceJ Exp Biol 217119ndash128

Davies SA Cabrero P Povsic M Johnston NR Terhzaz S Dow JA 2013Signaling by Drosophila capa neuropeptides Gen Comp Endocrinol18860ndash66

Davies SA Overend G Sebastian S Cundall M Cabrero P Dow JATerhzaz S 2012 Immune and stress response lsquocross-talkrsquo in theDrosophila Malpighian tubule J Insect Physiol 58488ndash497

Day JP Dow JA Houslay MD Davies SA 2005 Cyclic nucleotide phos-phodiesterases in Drosophila melanogaster Biochem J 388333ndash342

de Nadal E Ammerer G Posas F 2011 Controlling gene expression inresponse to stress Nat Rev Genet 12833ndash845

DePristo MA Banks E Poplin R Garimella KV Maguire JR Hartl CPhilippakis AA del Angel G Rivas MA Hanna M et al 2011 Aframework for variation discovery and genotyping using next-gen-eration DNA sequencing data Nat Genet 43491ndash498

Djawdan M Chippindale AK Rose MR Bradley TJ 1998 Metabolic re-serves and evolved stress resistance in Drosophila melanogasterPhysiol Zool 71584ndash594

Durham MF Magwire MM Stone EA Leips J 2014 Genome-wide anal-ysis in Drosophila reveals age-specific effects of SNPs on fitness traitsNat Commun 54338

Emilsson V Thorleifsson G Zhang B Leonardson AS Zink F Zhu JCarlson S Helgason A Walters GB Gunnarsdottir S et al 2008Genetics of gene expression and its effect on disease Nature452423ndash428

Fabian DK Kapun M Nolte V Kofler R Schmidt PS Schleurootterer C FlattT 2012 Genome-wide patterns of latitudinal differentiation amongpopulations of Drosophila melanogaster from North America MolEcol 214748ndash4769

Foley BR Telonis-Scott M 2011 Quantitative genetic analysis suggestscausal association between cuticular hydrocarbon composition anddesiccation survival in Drosophila melanogaster Heredity 10668ndash77

Folk DG Han C Bradley TJ 2001 Water acquisition and partitioning inDrosophila melanogaster effects of selection for desiccation-resis-tance J Exp Biol 2043323ndash3331

Fowler MA Montell C 2013 Drosophila TRP channels and animal be-havior Life Sci 92394ndash403

Fu J Keurentjes JJB Bouwmeester H America T Verstappen FWA WardJL Beale MH de Vos RCH Dijkstra M Scheltema RA et al 2009System-wide molecular evidence for phenotypic buffering inArabidopsis Nat Genet 41166ndash167

Futschik A Schleurootterer C 2010 The next generation of molecular mar-kers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Gibbs AG Fukuzato F Matzkin LM 2003 Evolution of water conserva-tion mechanisms in Drosophila J Exp Biol 2061183ndash1192

Gibbs AG Matzkin LM 2001 Evolution of water balance in the genusDrosophila J Exp Biol 2042331ndash2338

Goodstadt L 2010 Ruffus a lightweight Python library for computa-tional pipelines Bioinformatics 262778ndash2779

Hadley NF 1994 Water relations of terrestrial arthropods San Diego(CA) Academic Press

Hoffmann AA Hallas R Sinclair C Mitrovski P 2001 Levels of variationin stress resistance in Drosophila among strains local populationsand geographic regions patterns for desiccation starvation coldresistance and associated traits Evolution 551621ndash1630

Hoffmann AA Parsons PA 1989a An integrated approach to environ-mental stress tolerance and life-history variation desiccation toler-ance in Drosophila Biol J Linn Soc 37117ndash136

Hoffmann AA Parsons PA 1989b Selection for increased desiccationresistance in Drosophila melanogaster additive genetic control andcorrelated responses for other stresses Genetics 122837ndash845

Hoffmann AA Sgro CM 2011 Climate change and evolutionary adap-tation Nature 470479ndash485

Hoffmann AA Weeks AR 2007 Climatic selection on genes and traitsafter a 100 year-old invasion a critical look at the temperate-tropicalclines in Drosophila melanogaster from eastern Australia Genetica129133ndash147

Huang W Richards S Carbone MA Zhu D Anholt RR Ayroles JFDuncan L Jordan KW Lawrence F Magwire MM et al 2012Epistasis dominates the genetic architecture of Drosophila quanti-tative traits Proc Natl Acad Sci U S A 10915553ndash15559

Huang Z Tunnacliffe A 2004 Response of human cells to desiccationcomparison with hyperosmotic stress response J Physiol558181ndash191

Huang Z Tunnacliffe A 2005 Gene induction by desiccation stress inhuman cell cultures FEBS Lett 5794973ndash4977

Ichihashi Y Aguilar-Martınez JA Farhi M Chitwood DH Kumar RMillon LV Peng J Maloof JN Sinha NR 2014 Evolutionary develop-mental transcriptomics reveals a gene network module regulatinginterspecific diversity in plant leaf shape Proc Natl Acad Sci U S A111E2616ndashE2621

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Johnson WA Carder JW 2012 Drosophila nociceptors mediate larvalaversion to dry surface environments utilizing both the painless TRPchannel and the DEGENaC subunit PPK1 PLoS One 7e32878

Kahsai L Kapan N Dircksen H Winther AM Nassel DR 2010 Metabolicstress responses in Drosophila are modulated by brain neurosecre-tory cells that produce multiple neuropeptides PLoS One 5e11480

Kapun M van Schalkwyk H McAllister B Flatt T Schleurootterer C 2014Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogasterMol Ecol 231813ndash1827

Kawano T Shimoda M Matsumoto H Ryuda M Tsuzuki S Hayakawa Y2010 Identification of a gene Desiccate contributing to desicca-tion resistance in Drosophila melanogaster J Biol Chem28538889ndash38897

Kellermann V van Heerwaarden B Sgro CM Hoffmann AA 2009Fundamental evolutionary limits in ecological traits driveDrosophila species distributions Science 3251244ndash1246

Kofler R Nolte V Schleurootterer C 2016 The impact of library preparationprotocols on the consistency of allele frequency estimates in Pool-Seq data Mol Ecol Resour 16118ndash122

Kofler R Pandey RV Schleurootterer C 2011 PoPoolation2 identifying dif-ferentiation between populations using sequencing of pooled DNAsamples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schleurootterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics282084ndash2085

Leiserson MDM Eldridge JV Ramachandran S Raphael BJ 2013Network analysis of GWAS data Curr Opin Genet Dev 23602ndash610

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

14

Liu Y Luo J Carlsson MA Neuroassel DR 2015 Serotonin and insulin-likepeptides modulate leucokinin-producing neurons that affectfeeding and water homeostasis in Drosophila J Comp Neurol5231840ndash1863

Lohse M Bolger AM Nagel A Fernie AR Lunn JE Stitt M Usadel B 2012RobiNA a user-friendly integrated software solution for RNA-Seq-based transcriptomics Nucleic Acids Res 40W622ndashW627

Long A Liti G Luptak A Tenaillon O 2015 Elucidating the moleculararchitecture of adaptation via evolve and resequence experimentsNat Rev Genet 16567ndash582

Lynch M Bost D Wilson S Maruki T Harrison S 2014 Population-genetic inference from pooled-sequencing data Genome Biol Evol61210ndash1218

Mackay TFC Stone EA Ayroles JF 2009 The genetics of quantitativetraits challenges and prospects Nat Rev Genet 10565ndash577

Marron MT Markow TA Kain KJ Gibbs AG 2003 Effects of starvationand desiccation on energy metabolism in desert and mesicDrosophila J Insect Physiol 49261ndash270

Matzkin LM Markow TA 2009 Transcriptional regulation of metabo-lism associated with the increased desiccation resistance of thecactophilic Drosophila mojavensis Genetics 1821279ndash1288

Murali T Pacifico S Yu J Guest S Roberts GG Finley RL Jr 2011 DroID2011 a comprehensive integrated resource for protein transcrip-tion factor RNA and gene interactions for Drosophila Nucleic AcidsRes 39D736ndashD743

Nuzhdin SV Turner TL 2013 Promises and limitations of hitchhikingmapping Curr Opin Genet Dev 23694ndash699

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Picard Picard A set of Java command line tools for manipulating high-throughput seqeuncing (HTS) data and formats Available fromhttpbroadinstitutegithubiopicard

Qiu Y Tittiger C Wicker-Thomas C Le Goff G Young S Wajnberg EFricaux T Taquet N Blomquist GJ Feyereisen R 2012 An insect-specific P450 oxidative decarbonylase for cuticular hydrocarbon bio-synthesis Proc Natl Acad Sci U S A 10914858ndash14863

Rajpurohit S Oliveira CC Etges WJ Gibbs AG 2013 Functional genomicand phenotypic responses to desiccation in natural populations of adesert Drosophilid Mol Ecol 222698ndash2715

Sarup P Soslashrensen JG Kristensen TN Hoffmann AA Loeschcke V PaigeKN Soslashrensen P 2011 Candidate genes detected in transcriptomestudies are strongly dependent on genetic background PLoS One6e15644

Schiffer M Hangartner S Hoffmann AA 2013 Assessing the relativeimportance of environmental effects carry-over effects and speciesdifferences in thermal stress resistance a comparison ofDrosophilids across field and laboratory generations J Exp Biol2163790ndash3798

Schleurootterer C Kofler R Versace E Tobler R Franssen SU 2015Combining experimental evolution with next-generation sequen-cing A powerful tool to study adaptation from standing geneticvariation Heredity 114431ndash440

Sinclair BJ Gibbs AG Roberts SP 2007 Gene transcription during ex-posure to and recovery from cold and desiccation stress inDrosophila melanogaster Insect Mol Biol 16435ndash443

Sloggett C Wakefield M Philip G Pope B 2014 Rubramdashflexible distrib-uted pipelines figshare Available from httpdxdoiorg106084m9figshare895626

Smit AFA Hubley R Green P 2013-2015 RepeatMasker Open-403Available from httpwwwrepeatmaskerorg

Seurooderberg JA Birse RT Neuroassel DR 2011 Insulin production and signalingin renal tubules of Drosophila is under control of tachykinin-relatedpeptide and regulates stress resistance PLoS One 6e19866

Soslashrensen JG Nielsen MM Loeschcke V 2007 Gene expression profileanalysis of Drosophila melanogaster selected for resistance to envi-ronmental stressors J Evol Biol 201624ndash1636

St Pierre SE Ponting L Stefancsik R McQuilton P 2014 FlyBase102mdashadvanced approaches to interrogating FlyBase Nucleic AcidsRes 42D780ndashD788

Stevenson M Nunes T Heuser C Marshall J Sanchez J Thornton RReiczigel J Robison-Cox J Sebastiani P Solymos P et al 2015 epiRTools for the analysis of epidemiological data R package version09-62 Available from httpCRANR-projectorgpackagefrac14epiR

Storey JD Tibshirani R 2003 Statistical significance for genomewidestudies Proc Natl Acad Sci U S A 1009440ndash9445

Telonis-Scott M Gane M DeGaris S Sgro CM Hoffmann AA 2012 Highresolution mapping of candidate alleles for desiccation resistancein Drosophila melanogaster under selection Mol Biol Evol291335ndash1351

Telonis-Scott M Guthridge KM Hoffmann AA 2006 A new set oflaboratory-selected Drosophila melanogaster lines for the analysisof desiccation resistance response to selection physiology and cor-related responses J Exp Biol 2091837ndash1847

Terhzaz S Cabrero P Robben JH Radford JC Hudson BD Milligan GDow JA Davies SA 2012 Mechanism and function of Drosophilacapa GPCR a desiccation stress-responsive receptor with functionalhomology to human neuromedinU receptor PLoS One 7e29897

Terhzaz S Overend G Sebastian S Dow JA Davies SA 2014 The Dmelanogaster capa-1 neuropeptide activates renal NF-kB signalingPeptides 53218ndash224

Terhzaz S Teets NM Cabrero P Henderson L Ritchie MG Nachman RJDow JA Denlinger DL Davies SA 2015 Insect capa neuropeptidesimpact desiccation and cold tolerance Proc Natl Acad Sci U S A1122882ndash2887

Thorat LJ Gaikwad SM Nath BB 2012 Trehalose as an indicator ofdesiccation stress in Drosophila melanogaster larvae a potentialmarker of anhydrobiosis Biochem Biophys Res Commun419638ndash642

Tobler R Franssen SU Kofler R Orozco-terWengel P Nolte VHermisson J Schl eurootterer C 2014 Massive habitat-specific ge-nomic response in D melanogaster populations during exper-imental evolution in hot and cold environments Mol Biol Evol31364ndash375

Turner TL Bourne EC Von Wettberg EJ Hu TT Nuzhdin SV 2010Population resequencing reveals local adaptation of Arabidopsislyrata to serpentine soils Nat Genet 42260ndash263

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Vermeulen CJ Soslashrensen P Kirilova Gagalova K Loeschcke V 2013Transcriptomic analysis of inbreeding depression in cold-sensitiveDrosophila melanogaster shows upregulation of the immune re-sponse J Evol Biol 261890ndash1902

Wang B Goode J Best J Meltzer J Schilman PE Chen J Garza D ThomasJB Montminy M 2008 The insulin-regulated CREB coactivatorTORC promotes stress resistance in Drosophila Cell Metab7434ndash444

Wang J Kean L Yang J Allan AK Davies SA Herzyk P Dow JA 2004Function-informed transcriptome analysis of Drosophila renaltubule Genome Biol 5R69

Wang K Li M Hakonarson H 2010 Analysing biological pathways ingenome-wide association studies Nat Rev Genet 11843ndash854

Wang L Jia P Wolfinger RD Chen X Zhao Z 2011 Gene set analysis ofgenome-wide association studies methodological issues and per-spectives Genomics 981ndash8

Weber AL Khan GF Magwire MM Tabor CL Mackay TF Anholt RR2012 Genome-wide association analysis of oxidative stress resistancein Drosophila melanogaster PLoS One 7e34745

Weeks AR McKechnie SW Hoffmann AA 2002 Dissecting adaptiveclinal variation markers inversions and sizestress associations inDrosophila melanogaster from a central field population Ecol Lett5756ndash763

Zhu Y Bergland AO Gonzalez J Petrov DA 2012 Empirical validation ofpooled whole genome population re-sequencing in Drosophila mel-anogaster PLoS One 7e41901

15

Page 11: Cross-Study Comparison Reveals Common Genomic, Network, … · 2016-02-06 · resistance using a high-throughput sequencing Pool-GWAS (genome-wide association study) approach ( Bastide

Data Processing and Mapping

Processing was performed on a high-performance computingcluster using the Rubra pipeline system that makes use of theRuffus Python library (Goodstadt 2010) The final variant-calling pipeline was based on a pipeline developed by ClareSloggett (Sloggett et al 2014) and is publicly available athttpsgithubcomgriffinpGWAS_pipeline (last accessedJanuary 13 2016) Raw sequence reads were processed persample per lane with P1 and P2 read files processed sepa-rately initially Trimming was performed with trimmomatic v030 (Lohse et al 2012) Adapter sequences were also re-moved Leading and trailing bases with a quality scorebelow 30 were trimmed and a sliding window (width frac14 10quality threshold frac14 25) was used Reads shorter than 40 bpafter trimming were discarded Quality before and after trim-ming was examined using FastQC v 0101 (Andrews 2014)

Posttrimming reads that remained paired were used asinput into the alignment step Alignment to the D melano-gaster reference genome version r553 was performed withbwa v 062 (Li and Durbin 2009) using the bwa aln commandand the following options No seeding (-l 150) 1 rate ofmissing alignments given 2 uniform base error rate (-n001) a maximum of 2 gap opens per read (-o 2) a maximumof 12 gap extensions per read (-e 12) and long deletionswithin 12 bp of 30 end disallowed (-d 12) The default valueswere used for all other options Output files were then con-verted to sorted bam files using the bwa sampe commandand the ldquoSortSamrdquo option in Picard v 196

At this point the bam files were merged into one file persample using PicardMerge Duplicates were identified withPicard MarkDuplicates The tool RealignerTargetCreator inGATK v 26-5 (DePristo et al 2011) was applied to the tenbam files simultaneously producing a list of intervals contain-ing probable indels around which local realignment would beperformed This was used as input for the IndelRealigner toolwhich was also applied to all ten bam files simultaneously

To investigate the variation among technical replicatesallele frequency was estimated based on read counts foreach technical replicate separately for 100000 randomly cho-sen SNPs across the genome After excluding candidate des-iccation-resistance SNPs and SNPs that may have been falsepositives due to sequencing error (those with mean frequencylt004 across replicates) 1803 SNPs remained For each locuswe calculated the variance in allele frequency estimate amongthe five control replicates and the variance among the fivedesiccation-resistance replicates We compared this with thevariance due to pool category (control vs desiccation resis-tant) using analysis of variance (ANOVA) We also calculatedthe mean pairwise difference in allele frequency estimateamong control and among desiccation-resistance replicatesand the mean concordance correlation coefficient was calcu-lated over all pairwise comparisons within pool category(using the epiccc function in the epiR package Stevensonet al 2015)

The following approach was then taken to identify SNPsdiffering between the desiccation-resistant and controlgroups Based on the technical replicate analysis we merged

the five ldquodesiccation-resistantrdquo and the five random controlsamples into two bam files A pileup file was created withsamtools v 0119 (Li et al 2009) and converted to a ldquosyncrdquo fileusing the mpileup2sync script in Popoolation2 v 1201 (Kofleret al 2011) retaining reads with a mapping quality of at least20 and bases with a minimum quality score of 20 Reads withunmapped mates were also retained (option ndashA in samtoolsmpileup) For this step we used a repeat-masked version ofthe reference genome created with RepeatMasker 403 (Smitet al 2013-2015) and a repeat file containing all annotatedtransposons from D melanogaster including shared ancestralsequences (Flybase release r557) plus all repetitive elementsfrom RepBase release v1904 for D melanogaster Simple re-peats small RNA genes and bacterial insertions were notmasked (options ndashnolow ndashnorna ndashno_is) The rmblastsearch engine was used

Association Analysis

We then performed Fisherrsquos exact test with the Fisher testscript in Popoolation2 (Kofler et al 2011) We set the min-imum count (of the minor allele) frac14 10 minimum coverage(in both populations) frac14 20 and maximum coverage frac1410000 The test was performed for each SNP (slidingwindow off) Although there are numerous approaches tocalculating genome-wide significance thresholds in aPoolseq framework currently there is no consensus on anappropriate standardized statistical method Methods in-clude simulation approaches (Orozco-terWengel et al2012 Bastide et al 2013 Burke et al 2014) correctionusing a null distribution based on estimating genetic driftfrom replicated artificial selection line allele frequency vari-ance (Turner and Miller 2012) use of a simple thresholdbased on a minimum allele frequency difference (Turneret al 2010) and the FDR correction suggested by Storeyand Tibshirani (2003) (used by Fabian et al 2012)

We determined based on our design to calculate the FDRthreshold by comparison with a simulated distribution of nullP values following the approach of Bastide et al (2013) Foreach SNP locus a P value was calculated by simulating thedistribution of read counts between the major and minorallele according to a beta-binomial distribution with meana(a+b) and variance (ab)((a+b)2(a+b+1)) Thismodel allowed for two types of sampling variationStochastic variation in allele sampling across the phenotypesand the background variation in the representation of allelesin each pool caused by library preparation artifacts or sto-chastic variation driven by unequal coverage of the poolsThese sources of variation have also been recognized else-where as being important in interpreting pooled sequenceallele calls and identifying significant differentiation betweenpooled samples (Lynch et al 2014) The value of afrac14 37 waschosen by chi-square test as the best match to the observedP value distribution (afrac14 10 to afrac14 40 were tested) with acorresponding bfrac14 4315 to reflect the coverage depth differ-ence between the random control (487) and desiccation-resistant (568) pools These values were then used tosimulate a null data set 10 the size of the observed data

11

set Because the null P value distribution still did not fit theobserved P value distribution particularly well (supplemen-tary fig S2 Supplementary Material online) these values werenot used for a standard FDR correction Instead a P valuethreshold was calculated based on the lowest 005 of thenull P values similar in approach to Orozco-terWengel et al(2012)

Inversion Analysis

In D melanogaster several cosmopolitan inversion polymor-phisms show cross-continent latitudinal patterns that explaina substantial proportion of clinal variation for many traitsincluding thermal tolerance (reviewed in Hoffmann andWeeks 2007) Patterns of disequilibria generated betweenloci in the vicinity of the rearrangement can obscure signalsof selection (Hoffmann and Weeks 2007) and inversion fre-quencies are routinely considered in genomic studies ofnatural populations sampled from known latitudinal clines(Fabian et al 2012 Kapun et al 2014) We assessedassociations between inversion frequencies and our controland resistant pools at SNPs diagnostic for the following in-versions In3R(Payne) (Anderson et al 2005 Kapun et al2014) In(2L)t (Andolfatto et al 1999 Kapun et al 2014)In(2R)Ns In(3L)P In(3R)C In(3R)K and In(3R)Mo (Kapunet al 2014) Between one and five diagnostic SNPs were in-vestigated for each inversion (depending on availability) Weconsidered an inversion to be associated with desiccationresistance if the frequency of its diagnostic SNP differed sig-nificantly between the control and desiccation-resistant poolsby a Fisherrsquos exact test

Genome Feature Annotation

The list of candidate SNPs was converted to vcf format using acustom R script setting the alternate allele as the allele thatincreased in frequency from the control to the desiccation-resistant pool and setting the reference allele as the otherallele present in cases where both the major and minor allelesdiffered from the reference genome The genomic features inwhich the candidate SNPs occurred were annotated usingsnpEff v40 first creating a database from the D melanogasterv553 genome annotation with the default approach(Cingolani et al 2012) The default annotation settings wereused except for setting the parameter ldquo-ud 0rdquo to annotateSNPs to genes only if they were within the gene body Using2 tests on SNP counts we tested for over- or under-enrichment of feature types by calculating the proportionof lsquoallrsquo features from all SNPs detected in the study and theproportion of ldquocandidaterdquo SNPs identified as differentiated indesiccation-resistant flies (Fabian et al 2012)

GO Analysis

GO analyses was used to associate the candidates with theirbiological functions (Ashburner et al 2000) Previously devel-oped for biological pathway analysis of global gene expressionstudies (Wang et al 2010) typical GO analysis assumes that allgenes are sampled independently with equal probabilityGWAS violate these criteria where SNPs are more likely to

occur in longer genes than in shorter genes resulting in highertype I error rates in GO categories overpopulated with longgenes Gene length as well as overlapping gene biases (Wanget al 2011) were accounted for using Gowinda (Kofler andSchleurootterer 2012) which implements a permutation strategyto reduce false positives Gowinda does not reconstruct LDbetween SNPs but operates in two modes underpinned bytwo extreme assumptions 1) Gene mode (assumes all SNPs ina gene are in LD) and 2) SNP mode (assumes all SNPs in agene are completely independent)

The candidates were examined in both modes but giventhe stringency of gene mode high-resolution SNP modeoutput was utilized for the final analysis with the followingparameters 106 simulations minimum significance frac141 andminimum genes per GO categoryfrac14 3 (to exclude gene-poorcategories) P values were corrected using the Gowinda FDRalgorithm GO annotations from Flybase r557 were applied

Candidate Gene Analysis

In conjunction with the GO analysis we also examinedknown candidate genes associated with survival under lowRH to better characterize possible biological functions of ourGWAS candidates We took a conservative approach to can-didate gene assignment where the SNPs were mapped withinthe entire gene region without up- and downstream exten-sions Genes were curated from FlyBase and the literature(table 1) predominantly using detailed functional experimen-tal evidence as criteria for functional desiccation candidates

Network Analysis

We next obtained a higher level summary of the candidategene list using PPI network analysis The full candidate genelist was used to construct a first-order interaction networkusing all Drosophila PPIs listed in the Drosophila InteractionDatabase v 2014_10 (avoiding interologs) (Murali et al 2011)This ldquofullrdquo PPI network includes manually curated data frompublished literature and experimental data for 9633 proteinsand 93799 interactions For the network construction theigraph R package was used to build a subgraph from the fullPPI network containing the candidate proteins and their first-order interacting proteins (Csardi and Nepusz 2006) Self-connections were removed Functional enrichment analysiswas conducted on the list of network genes using FlyMinev400 (wwwflymineorg last accessed January 13 2016) withthe following parameters for GO enrichment and KEGGReactome pathway enrichment BenjaminindashHochberg FDRcorrection P lt 005 background frac14 all D melanogastergenes in our full PPI network and Ontology (for GO enrich-ment only) frac14 biological process or molecular function

Cross-Study Comparison Overlap with Our PreviousWork

Following on from the comparison between our gene list andthe candidate genes suggested from the literature we quan-titatively evaluated the degree of overlap between the candi-date genes identified in this study with alleles mapped in ourprevious study of flies collected from the same geographic

12

region (Telonis-Scott et al 2012) The ldquooverlaprdquo analysis wasperformed similarly for the GWAS candidate gene and net-work-level approaches First the gene list overlap was testedbetween the two studies using Fisherrsquos exact test where the382 genes identified from the candidate SNPs (this study)were compared with 416 genes identified from candidatesingle feature polymorphisms (Telonis-Scott et al 2012)The contingency table was constructed using the numberof annotated genes in the D melanogaster R553 genomebuilt as an approximation for the total number of genestested in each study and took the following form

where A frac14 gene list from Telonis-Scott et al (2012)n frac14 416 B frac14 gene list from this study n frac14 382 N frac14 totalnumber of genes in D melanogaster reference genome r553n frac14 17106

To annotate genome features in the overlap candidategene list we used the SNP data (this study) which betterresolves alleles than array-based genotyping Due to this lim-itation we did not compare exact genomic coordinates be-tween the studies but considered overlap to occur whendifferentiated alleles were detected in the same gene inboth studies We did however examine the allele frequenciesaround the Dys-RF promoter which was sequenced sepa-rately revealing a selective sweep in the experimental evolu-tion lines (Telonis-Scott et al 2012)

The overlapping candidates were tested for functional en-richment using a basic gene list approach As genes ratherthan SNPs were analyzed here Fly Mine v400 (wwwflymineorg) was applied with the following parameters Gene lengthnormalization BenjaminindashHochberg FDR correction Plt 005Ontology frac14 biological process and default background (allGO-annotated D melanogaster genes)

Finally we constructed a second PPI network from thecandidate genes mapped in Telonis-Scott et al (2012) to ex-amine system-level overlap between the two studies The 416genes mapped to 337 protein seeds and the network wasconstructed as described above The degree of network over-lap between the two studies was tested using simulation Ineach iteration a gene set of the same length as the list fromthis study and a gene set of the same length as the list fromTelonis-Scott et al (2012) were resampled randomly from theentire D melanogaster mapped gene annotation (r553) Eachresampled set was then used to build a first-order network aspreviously described The overlap network between the tworesampled sets was calculated using the graphintersectioncommand from the igraph package and zero-degree nodes(proteins with no interactions) were removed Overlap wasquantified by counting the number of nodes and the numberof edges in this overlap network The observed overlap wascompared with a simulated null distribution for both nodeand edge number where n frac14 1000 simulation iterations

Supplementary MaterialSupplementary figures S1ndashS4 and tables S1ndashS5 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

We thank Robert Good for the fly sampling and JenniferShirriffs and Lea and Alan Rako for outstanding assistancewith laboratory culture and substantial desiccation assaysWe are grateful to Melissa Davis for advice on network build-ing and comparison approaches and to five anonymous re-viewers whose comments improved the manuscript Thiswork was supported by a DECRA fellowship DE120102575and Monash University Technology voucher scheme toMTS an ARC Future Fellowship and Discovery Grants toCMS an ARC Laureate Fellowship to AAH and a Scienceand Industry Endowment Fund grant to CMS and AAH

ReferencesAlexander JM Lomvardas S 2014 Nuclear architecture as an epigenetic

regulator of neural development and function Neuroscience26439ndash50

Anderson AR Hoffmann AA McKechnie SW Umina PA Weeks AR2005 The latitudinal cline in the In(3R)Payne inversion polymor-phism has shifted in the last 20 years in Australian Drosophilamelanogaster populations Mol Ecol 14851ndash858

Andolfatto P Wall JD Kreitman M 1999 Unusual haplotype structureat the proximal breakpoint of In(2L)t in a natural population ofDrosophila melanogaster Genetics 1531297ndash1311

Andrews S 2014 FastQC Available from httpwwwbioinformaticsbbsrcacukprojectsfastqc

Ashburner M Ball CA Blake JA Botstein D Butler H Cherry JM DavisAP Dolinski K Dwight SS Eppig JT et al 2000 Gene ontology toolfor the unification of biology The Gene Ontology Consortium NatGenet 2525ndash29

Aytes A Mitrofanova A Lefebvre C Alvarez Mariano J Castillo-MartinM Zheng T Eastham James A Gopalan A Pienta Kenneth J ShenMM et al 2014 Cross-species regulatory network analysis identifiesa synergistic interaction between FOXM1 and CENPF that drivesprostate cancer malignancy Cancer Cell 25638ndash651

Bastide H Betancourt A Nolte V Tobler R Steuroobe P Futschik ASchleurootterer C 2013 A genome-wide fine-scale map of natural pig-mentation variation in Drosophila melanogaster PLoS Genet9e1003534

Bergland AO Behrman EL OrsquoBrien KR Schmidt PS Petrov DA 2014Genomic evidence of rapid and stable adaptive oscillations overseasonal time scales in Drosophila PLoS Genet 10e1004775

Bonebrake TC Mastrandrea MD 2010 Tolerance adaptation and pre-cipitation changes complicate latitudinal patterns of climate changeimpacts Proc Natl Acad Sci U S A 10712581ndash12586

Bradley TJ 2009 Animal osmoregulation Oxford Oxford UniversityPress

Burke MK Liti G Long AD 2014 Standing genetic variation drivesrepeatable experimental evolution in outcrossing populations ofSaccharomyces cerevisiae Mol Biol Evol 313228ndash3239

Chown SL 2012 Trait-based approaches to conservation physiologyforecasting environmental change risks from the bottom upPhilos Trans R Soc B 3671615ndash1627

Chown SL Nicolson SW 2004 Insect physiological ecology mechanismsand patterns Oxford Oxford University Press

Chown SL Soslashrensen JG Terblanche JS 2011 Water loss in insectsan environmental change perspective J Insect Physiol571070ndash1084

Not in GWAS In GWAS

Not in Telonis-Scott et al (2012) N ndash intersect-AndashB A-intersect

In Telonis-Scott et al (2012) B-intersect Intersect(AB)

13

Chung H Loehlin DW Dufour HD Vaccarro K Millar JG Carroll SB2014 A single gene affects both ecological divergence and matechoice in Drosophila Science 3431148ndash1151

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly(Austin) 680ndash92

Civelek M Lusis AJ 2014 Systems genetics approaches to understandcomplex traits Nat Rev Genet 1534ndash48

Clusella-Trullas S Blackburn TM Chown SL 2011 Climatic predictors oftemperature performance curve parameters in ectotherms implycomplex responses to climate change Am Nat 177738ndash751

Csardi G Nepusz T 2006 The igraph software package for complexnetwork research InterJournal Complex Systems 1695 Availablefrom httpigraphorg

Davies SA Cabrero P Overend G Aitchison L Sebastian S Terhzaz SDow JA 2014 Cell signalling mechanisms for insect stress toleranceJ Exp Biol 217119ndash128

Davies SA Cabrero P Povsic M Johnston NR Terhzaz S Dow JA 2013Signaling by Drosophila capa neuropeptides Gen Comp Endocrinol18860ndash66

Davies SA Overend G Sebastian S Cundall M Cabrero P Dow JATerhzaz S 2012 Immune and stress response lsquocross-talkrsquo in theDrosophila Malpighian tubule J Insect Physiol 58488ndash497

Day JP Dow JA Houslay MD Davies SA 2005 Cyclic nucleotide phos-phodiesterases in Drosophila melanogaster Biochem J 388333ndash342

de Nadal E Ammerer G Posas F 2011 Controlling gene expression inresponse to stress Nat Rev Genet 12833ndash845

DePristo MA Banks E Poplin R Garimella KV Maguire JR Hartl CPhilippakis AA del Angel G Rivas MA Hanna M et al 2011 Aframework for variation discovery and genotyping using next-gen-eration DNA sequencing data Nat Genet 43491ndash498

Djawdan M Chippindale AK Rose MR Bradley TJ 1998 Metabolic re-serves and evolved stress resistance in Drosophila melanogasterPhysiol Zool 71584ndash594

Durham MF Magwire MM Stone EA Leips J 2014 Genome-wide anal-ysis in Drosophila reveals age-specific effects of SNPs on fitness traitsNat Commun 54338

Emilsson V Thorleifsson G Zhang B Leonardson AS Zink F Zhu JCarlson S Helgason A Walters GB Gunnarsdottir S et al 2008Genetics of gene expression and its effect on disease Nature452423ndash428

Fabian DK Kapun M Nolte V Kofler R Schmidt PS Schleurootterer C FlattT 2012 Genome-wide patterns of latitudinal differentiation amongpopulations of Drosophila melanogaster from North America MolEcol 214748ndash4769

Foley BR Telonis-Scott M 2011 Quantitative genetic analysis suggestscausal association between cuticular hydrocarbon composition anddesiccation survival in Drosophila melanogaster Heredity 10668ndash77

Folk DG Han C Bradley TJ 2001 Water acquisition and partitioning inDrosophila melanogaster effects of selection for desiccation-resis-tance J Exp Biol 2043323ndash3331

Fowler MA Montell C 2013 Drosophila TRP channels and animal be-havior Life Sci 92394ndash403

Fu J Keurentjes JJB Bouwmeester H America T Verstappen FWA WardJL Beale MH de Vos RCH Dijkstra M Scheltema RA et al 2009System-wide molecular evidence for phenotypic buffering inArabidopsis Nat Genet 41166ndash167

Futschik A Schleurootterer C 2010 The next generation of molecular mar-kers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Gibbs AG Fukuzato F Matzkin LM 2003 Evolution of water conserva-tion mechanisms in Drosophila J Exp Biol 2061183ndash1192

Gibbs AG Matzkin LM 2001 Evolution of water balance in the genusDrosophila J Exp Biol 2042331ndash2338

Goodstadt L 2010 Ruffus a lightweight Python library for computa-tional pipelines Bioinformatics 262778ndash2779

Hadley NF 1994 Water relations of terrestrial arthropods San Diego(CA) Academic Press

Hoffmann AA Hallas R Sinclair C Mitrovski P 2001 Levels of variationin stress resistance in Drosophila among strains local populationsand geographic regions patterns for desiccation starvation coldresistance and associated traits Evolution 551621ndash1630

Hoffmann AA Parsons PA 1989a An integrated approach to environ-mental stress tolerance and life-history variation desiccation toler-ance in Drosophila Biol J Linn Soc 37117ndash136

Hoffmann AA Parsons PA 1989b Selection for increased desiccationresistance in Drosophila melanogaster additive genetic control andcorrelated responses for other stresses Genetics 122837ndash845

Hoffmann AA Sgro CM 2011 Climate change and evolutionary adap-tation Nature 470479ndash485

Hoffmann AA Weeks AR 2007 Climatic selection on genes and traitsafter a 100 year-old invasion a critical look at the temperate-tropicalclines in Drosophila melanogaster from eastern Australia Genetica129133ndash147

Huang W Richards S Carbone MA Zhu D Anholt RR Ayroles JFDuncan L Jordan KW Lawrence F Magwire MM et al 2012Epistasis dominates the genetic architecture of Drosophila quanti-tative traits Proc Natl Acad Sci U S A 10915553ndash15559

Huang Z Tunnacliffe A 2004 Response of human cells to desiccationcomparison with hyperosmotic stress response J Physiol558181ndash191

Huang Z Tunnacliffe A 2005 Gene induction by desiccation stress inhuman cell cultures FEBS Lett 5794973ndash4977

Ichihashi Y Aguilar-Martınez JA Farhi M Chitwood DH Kumar RMillon LV Peng J Maloof JN Sinha NR 2014 Evolutionary develop-mental transcriptomics reveals a gene network module regulatinginterspecific diversity in plant leaf shape Proc Natl Acad Sci U S A111E2616ndashE2621

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Johnson WA Carder JW 2012 Drosophila nociceptors mediate larvalaversion to dry surface environments utilizing both the painless TRPchannel and the DEGENaC subunit PPK1 PLoS One 7e32878

Kahsai L Kapan N Dircksen H Winther AM Nassel DR 2010 Metabolicstress responses in Drosophila are modulated by brain neurosecre-tory cells that produce multiple neuropeptides PLoS One 5e11480

Kapun M van Schalkwyk H McAllister B Flatt T Schleurootterer C 2014Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogasterMol Ecol 231813ndash1827

Kawano T Shimoda M Matsumoto H Ryuda M Tsuzuki S Hayakawa Y2010 Identification of a gene Desiccate contributing to desicca-tion resistance in Drosophila melanogaster J Biol Chem28538889ndash38897

Kellermann V van Heerwaarden B Sgro CM Hoffmann AA 2009Fundamental evolutionary limits in ecological traits driveDrosophila species distributions Science 3251244ndash1246

Kofler R Nolte V Schleurootterer C 2016 The impact of library preparationprotocols on the consistency of allele frequency estimates in Pool-Seq data Mol Ecol Resour 16118ndash122

Kofler R Pandey RV Schleurootterer C 2011 PoPoolation2 identifying dif-ferentiation between populations using sequencing of pooled DNAsamples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schleurootterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics282084ndash2085

Leiserson MDM Eldridge JV Ramachandran S Raphael BJ 2013Network analysis of GWAS data Curr Opin Genet Dev 23602ndash610

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

14

Liu Y Luo J Carlsson MA Neuroassel DR 2015 Serotonin and insulin-likepeptides modulate leucokinin-producing neurons that affectfeeding and water homeostasis in Drosophila J Comp Neurol5231840ndash1863

Lohse M Bolger AM Nagel A Fernie AR Lunn JE Stitt M Usadel B 2012RobiNA a user-friendly integrated software solution for RNA-Seq-based transcriptomics Nucleic Acids Res 40W622ndashW627

Long A Liti G Luptak A Tenaillon O 2015 Elucidating the moleculararchitecture of adaptation via evolve and resequence experimentsNat Rev Genet 16567ndash582

Lynch M Bost D Wilson S Maruki T Harrison S 2014 Population-genetic inference from pooled-sequencing data Genome Biol Evol61210ndash1218

Mackay TFC Stone EA Ayroles JF 2009 The genetics of quantitativetraits challenges and prospects Nat Rev Genet 10565ndash577

Marron MT Markow TA Kain KJ Gibbs AG 2003 Effects of starvationand desiccation on energy metabolism in desert and mesicDrosophila J Insect Physiol 49261ndash270

Matzkin LM Markow TA 2009 Transcriptional regulation of metabo-lism associated with the increased desiccation resistance of thecactophilic Drosophila mojavensis Genetics 1821279ndash1288

Murali T Pacifico S Yu J Guest S Roberts GG Finley RL Jr 2011 DroID2011 a comprehensive integrated resource for protein transcrip-tion factor RNA and gene interactions for Drosophila Nucleic AcidsRes 39D736ndashD743

Nuzhdin SV Turner TL 2013 Promises and limitations of hitchhikingmapping Curr Opin Genet Dev 23694ndash699

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Picard Picard A set of Java command line tools for manipulating high-throughput seqeuncing (HTS) data and formats Available fromhttpbroadinstitutegithubiopicard

Qiu Y Tittiger C Wicker-Thomas C Le Goff G Young S Wajnberg EFricaux T Taquet N Blomquist GJ Feyereisen R 2012 An insect-specific P450 oxidative decarbonylase for cuticular hydrocarbon bio-synthesis Proc Natl Acad Sci U S A 10914858ndash14863

Rajpurohit S Oliveira CC Etges WJ Gibbs AG 2013 Functional genomicand phenotypic responses to desiccation in natural populations of adesert Drosophilid Mol Ecol 222698ndash2715

Sarup P Soslashrensen JG Kristensen TN Hoffmann AA Loeschcke V PaigeKN Soslashrensen P 2011 Candidate genes detected in transcriptomestudies are strongly dependent on genetic background PLoS One6e15644

Schiffer M Hangartner S Hoffmann AA 2013 Assessing the relativeimportance of environmental effects carry-over effects and speciesdifferences in thermal stress resistance a comparison ofDrosophilids across field and laboratory generations J Exp Biol2163790ndash3798

Schleurootterer C Kofler R Versace E Tobler R Franssen SU 2015Combining experimental evolution with next-generation sequen-cing A powerful tool to study adaptation from standing geneticvariation Heredity 114431ndash440

Sinclair BJ Gibbs AG Roberts SP 2007 Gene transcription during ex-posure to and recovery from cold and desiccation stress inDrosophila melanogaster Insect Mol Biol 16435ndash443

Sloggett C Wakefield M Philip G Pope B 2014 Rubramdashflexible distrib-uted pipelines figshare Available from httpdxdoiorg106084m9figshare895626

Smit AFA Hubley R Green P 2013-2015 RepeatMasker Open-403Available from httpwwwrepeatmaskerorg

Seurooderberg JA Birse RT Neuroassel DR 2011 Insulin production and signalingin renal tubules of Drosophila is under control of tachykinin-relatedpeptide and regulates stress resistance PLoS One 6e19866

Soslashrensen JG Nielsen MM Loeschcke V 2007 Gene expression profileanalysis of Drosophila melanogaster selected for resistance to envi-ronmental stressors J Evol Biol 201624ndash1636

St Pierre SE Ponting L Stefancsik R McQuilton P 2014 FlyBase102mdashadvanced approaches to interrogating FlyBase Nucleic AcidsRes 42D780ndashD788

Stevenson M Nunes T Heuser C Marshall J Sanchez J Thornton RReiczigel J Robison-Cox J Sebastiani P Solymos P et al 2015 epiRTools for the analysis of epidemiological data R package version09-62 Available from httpCRANR-projectorgpackagefrac14epiR

Storey JD Tibshirani R 2003 Statistical significance for genomewidestudies Proc Natl Acad Sci U S A 1009440ndash9445

Telonis-Scott M Gane M DeGaris S Sgro CM Hoffmann AA 2012 Highresolution mapping of candidate alleles for desiccation resistancein Drosophila melanogaster under selection Mol Biol Evol291335ndash1351

Telonis-Scott M Guthridge KM Hoffmann AA 2006 A new set oflaboratory-selected Drosophila melanogaster lines for the analysisof desiccation resistance response to selection physiology and cor-related responses J Exp Biol 2091837ndash1847

Terhzaz S Cabrero P Robben JH Radford JC Hudson BD Milligan GDow JA Davies SA 2012 Mechanism and function of Drosophilacapa GPCR a desiccation stress-responsive receptor with functionalhomology to human neuromedinU receptor PLoS One 7e29897

Terhzaz S Overend G Sebastian S Dow JA Davies SA 2014 The Dmelanogaster capa-1 neuropeptide activates renal NF-kB signalingPeptides 53218ndash224

Terhzaz S Teets NM Cabrero P Henderson L Ritchie MG Nachman RJDow JA Denlinger DL Davies SA 2015 Insect capa neuropeptidesimpact desiccation and cold tolerance Proc Natl Acad Sci U S A1122882ndash2887

Thorat LJ Gaikwad SM Nath BB 2012 Trehalose as an indicator ofdesiccation stress in Drosophila melanogaster larvae a potentialmarker of anhydrobiosis Biochem Biophys Res Commun419638ndash642

Tobler R Franssen SU Kofler R Orozco-terWengel P Nolte VHermisson J Schl eurootterer C 2014 Massive habitat-specific ge-nomic response in D melanogaster populations during exper-imental evolution in hot and cold environments Mol Biol Evol31364ndash375

Turner TL Bourne EC Von Wettberg EJ Hu TT Nuzhdin SV 2010Population resequencing reveals local adaptation of Arabidopsislyrata to serpentine soils Nat Genet 42260ndash263

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Vermeulen CJ Soslashrensen P Kirilova Gagalova K Loeschcke V 2013Transcriptomic analysis of inbreeding depression in cold-sensitiveDrosophila melanogaster shows upregulation of the immune re-sponse J Evol Biol 261890ndash1902

Wang B Goode J Best J Meltzer J Schilman PE Chen J Garza D ThomasJB Montminy M 2008 The insulin-regulated CREB coactivatorTORC promotes stress resistance in Drosophila Cell Metab7434ndash444

Wang J Kean L Yang J Allan AK Davies SA Herzyk P Dow JA 2004Function-informed transcriptome analysis of Drosophila renaltubule Genome Biol 5R69

Wang K Li M Hakonarson H 2010 Analysing biological pathways ingenome-wide association studies Nat Rev Genet 11843ndash854

Wang L Jia P Wolfinger RD Chen X Zhao Z 2011 Gene set analysis ofgenome-wide association studies methodological issues and per-spectives Genomics 981ndash8

Weber AL Khan GF Magwire MM Tabor CL Mackay TF Anholt RR2012 Genome-wide association analysis of oxidative stress resistancein Drosophila melanogaster PLoS One 7e34745

Weeks AR McKechnie SW Hoffmann AA 2002 Dissecting adaptiveclinal variation markers inversions and sizestress associations inDrosophila melanogaster from a central field population Ecol Lett5756ndash763

Zhu Y Bergland AO Gonzalez J Petrov DA 2012 Empirical validation ofpooled whole genome population re-sequencing in Drosophila mel-anogaster PLoS One 7e41901

15

Page 12: Cross-Study Comparison Reveals Common Genomic, Network, … · 2016-02-06 · resistance using a high-throughput sequencing Pool-GWAS (genome-wide association study) approach ( Bastide

set Because the null P value distribution still did not fit theobserved P value distribution particularly well (supplemen-tary fig S2 Supplementary Material online) these values werenot used for a standard FDR correction Instead a P valuethreshold was calculated based on the lowest 005 of thenull P values similar in approach to Orozco-terWengel et al(2012)

Inversion Analysis

In D melanogaster several cosmopolitan inversion polymor-phisms show cross-continent latitudinal patterns that explaina substantial proportion of clinal variation for many traitsincluding thermal tolerance (reviewed in Hoffmann andWeeks 2007) Patterns of disequilibria generated betweenloci in the vicinity of the rearrangement can obscure signalsof selection (Hoffmann and Weeks 2007) and inversion fre-quencies are routinely considered in genomic studies ofnatural populations sampled from known latitudinal clines(Fabian et al 2012 Kapun et al 2014) We assessedassociations between inversion frequencies and our controland resistant pools at SNPs diagnostic for the following in-versions In3R(Payne) (Anderson et al 2005 Kapun et al2014) In(2L)t (Andolfatto et al 1999 Kapun et al 2014)In(2R)Ns In(3L)P In(3R)C In(3R)K and In(3R)Mo (Kapunet al 2014) Between one and five diagnostic SNPs were in-vestigated for each inversion (depending on availability) Weconsidered an inversion to be associated with desiccationresistance if the frequency of its diagnostic SNP differed sig-nificantly between the control and desiccation-resistant poolsby a Fisherrsquos exact test

Genome Feature Annotation

The list of candidate SNPs was converted to vcf format using acustom R script setting the alternate allele as the allele thatincreased in frequency from the control to the desiccation-resistant pool and setting the reference allele as the otherallele present in cases where both the major and minor allelesdiffered from the reference genome The genomic features inwhich the candidate SNPs occurred were annotated usingsnpEff v40 first creating a database from the D melanogasterv553 genome annotation with the default approach(Cingolani et al 2012) The default annotation settings wereused except for setting the parameter ldquo-ud 0rdquo to annotateSNPs to genes only if they were within the gene body Using2 tests on SNP counts we tested for over- or under-enrichment of feature types by calculating the proportionof lsquoallrsquo features from all SNPs detected in the study and theproportion of ldquocandidaterdquo SNPs identified as differentiated indesiccation-resistant flies (Fabian et al 2012)

GO Analysis

GO analyses was used to associate the candidates with theirbiological functions (Ashburner et al 2000) Previously devel-oped for biological pathway analysis of global gene expressionstudies (Wang et al 2010) typical GO analysis assumes that allgenes are sampled independently with equal probabilityGWAS violate these criteria where SNPs are more likely to

occur in longer genes than in shorter genes resulting in highertype I error rates in GO categories overpopulated with longgenes Gene length as well as overlapping gene biases (Wanget al 2011) were accounted for using Gowinda (Kofler andSchleurootterer 2012) which implements a permutation strategyto reduce false positives Gowinda does not reconstruct LDbetween SNPs but operates in two modes underpinned bytwo extreme assumptions 1) Gene mode (assumes all SNPs ina gene are in LD) and 2) SNP mode (assumes all SNPs in agene are completely independent)

The candidates were examined in both modes but giventhe stringency of gene mode high-resolution SNP modeoutput was utilized for the final analysis with the followingparameters 106 simulations minimum significance frac141 andminimum genes per GO categoryfrac14 3 (to exclude gene-poorcategories) P values were corrected using the Gowinda FDRalgorithm GO annotations from Flybase r557 were applied

Candidate Gene Analysis

In conjunction with the GO analysis we also examinedknown candidate genes associated with survival under lowRH to better characterize possible biological functions of ourGWAS candidates We took a conservative approach to can-didate gene assignment where the SNPs were mapped withinthe entire gene region without up- and downstream exten-sions Genes were curated from FlyBase and the literature(table 1) predominantly using detailed functional experimen-tal evidence as criteria for functional desiccation candidates

Network Analysis

We next obtained a higher level summary of the candidategene list using PPI network analysis The full candidate genelist was used to construct a first-order interaction networkusing all Drosophila PPIs listed in the Drosophila InteractionDatabase v 2014_10 (avoiding interologs) (Murali et al 2011)This ldquofullrdquo PPI network includes manually curated data frompublished literature and experimental data for 9633 proteinsand 93799 interactions For the network construction theigraph R package was used to build a subgraph from the fullPPI network containing the candidate proteins and their first-order interacting proteins (Csardi and Nepusz 2006) Self-connections were removed Functional enrichment analysiswas conducted on the list of network genes using FlyMinev400 (wwwflymineorg last accessed January 13 2016) withthe following parameters for GO enrichment and KEGGReactome pathway enrichment BenjaminindashHochberg FDRcorrection P lt 005 background frac14 all D melanogastergenes in our full PPI network and Ontology (for GO enrich-ment only) frac14 biological process or molecular function

Cross-Study Comparison Overlap with Our PreviousWork

Following on from the comparison between our gene list andthe candidate genes suggested from the literature we quan-titatively evaluated the degree of overlap between the candi-date genes identified in this study with alleles mapped in ourprevious study of flies collected from the same geographic

12

region (Telonis-Scott et al 2012) The ldquooverlaprdquo analysis wasperformed similarly for the GWAS candidate gene and net-work-level approaches First the gene list overlap was testedbetween the two studies using Fisherrsquos exact test where the382 genes identified from the candidate SNPs (this study)were compared with 416 genes identified from candidatesingle feature polymorphisms (Telonis-Scott et al 2012)The contingency table was constructed using the numberof annotated genes in the D melanogaster R553 genomebuilt as an approximation for the total number of genestested in each study and took the following form

where A frac14 gene list from Telonis-Scott et al (2012)n frac14 416 B frac14 gene list from this study n frac14 382 N frac14 totalnumber of genes in D melanogaster reference genome r553n frac14 17106

To annotate genome features in the overlap candidategene list we used the SNP data (this study) which betterresolves alleles than array-based genotyping Due to this lim-itation we did not compare exact genomic coordinates be-tween the studies but considered overlap to occur whendifferentiated alleles were detected in the same gene inboth studies We did however examine the allele frequenciesaround the Dys-RF promoter which was sequenced sepa-rately revealing a selective sweep in the experimental evolu-tion lines (Telonis-Scott et al 2012)

The overlapping candidates were tested for functional en-richment using a basic gene list approach As genes ratherthan SNPs were analyzed here Fly Mine v400 (wwwflymineorg) was applied with the following parameters Gene lengthnormalization BenjaminindashHochberg FDR correction Plt 005Ontology frac14 biological process and default background (allGO-annotated D melanogaster genes)

Finally we constructed a second PPI network from thecandidate genes mapped in Telonis-Scott et al (2012) to ex-amine system-level overlap between the two studies The 416genes mapped to 337 protein seeds and the network wasconstructed as described above The degree of network over-lap between the two studies was tested using simulation Ineach iteration a gene set of the same length as the list fromthis study and a gene set of the same length as the list fromTelonis-Scott et al (2012) were resampled randomly from theentire D melanogaster mapped gene annotation (r553) Eachresampled set was then used to build a first-order network aspreviously described The overlap network between the tworesampled sets was calculated using the graphintersectioncommand from the igraph package and zero-degree nodes(proteins with no interactions) were removed Overlap wasquantified by counting the number of nodes and the numberof edges in this overlap network The observed overlap wascompared with a simulated null distribution for both nodeand edge number where n frac14 1000 simulation iterations

Supplementary MaterialSupplementary figures S1ndashS4 and tables S1ndashS5 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

We thank Robert Good for the fly sampling and JenniferShirriffs and Lea and Alan Rako for outstanding assistancewith laboratory culture and substantial desiccation assaysWe are grateful to Melissa Davis for advice on network build-ing and comparison approaches and to five anonymous re-viewers whose comments improved the manuscript Thiswork was supported by a DECRA fellowship DE120102575and Monash University Technology voucher scheme toMTS an ARC Future Fellowship and Discovery Grants toCMS an ARC Laureate Fellowship to AAH and a Scienceand Industry Endowment Fund grant to CMS and AAH

ReferencesAlexander JM Lomvardas S 2014 Nuclear architecture as an epigenetic

regulator of neural development and function Neuroscience26439ndash50

Anderson AR Hoffmann AA McKechnie SW Umina PA Weeks AR2005 The latitudinal cline in the In(3R)Payne inversion polymor-phism has shifted in the last 20 years in Australian Drosophilamelanogaster populations Mol Ecol 14851ndash858

Andolfatto P Wall JD Kreitman M 1999 Unusual haplotype structureat the proximal breakpoint of In(2L)t in a natural population ofDrosophila melanogaster Genetics 1531297ndash1311

Andrews S 2014 FastQC Available from httpwwwbioinformaticsbbsrcacukprojectsfastqc

Ashburner M Ball CA Blake JA Botstein D Butler H Cherry JM DavisAP Dolinski K Dwight SS Eppig JT et al 2000 Gene ontology toolfor the unification of biology The Gene Ontology Consortium NatGenet 2525ndash29

Aytes A Mitrofanova A Lefebvre C Alvarez Mariano J Castillo-MartinM Zheng T Eastham James A Gopalan A Pienta Kenneth J ShenMM et al 2014 Cross-species regulatory network analysis identifiesa synergistic interaction between FOXM1 and CENPF that drivesprostate cancer malignancy Cancer Cell 25638ndash651

Bastide H Betancourt A Nolte V Tobler R Steuroobe P Futschik ASchleurootterer C 2013 A genome-wide fine-scale map of natural pig-mentation variation in Drosophila melanogaster PLoS Genet9e1003534

Bergland AO Behrman EL OrsquoBrien KR Schmidt PS Petrov DA 2014Genomic evidence of rapid and stable adaptive oscillations overseasonal time scales in Drosophila PLoS Genet 10e1004775

Bonebrake TC Mastrandrea MD 2010 Tolerance adaptation and pre-cipitation changes complicate latitudinal patterns of climate changeimpacts Proc Natl Acad Sci U S A 10712581ndash12586

Bradley TJ 2009 Animal osmoregulation Oxford Oxford UniversityPress

Burke MK Liti G Long AD 2014 Standing genetic variation drivesrepeatable experimental evolution in outcrossing populations ofSaccharomyces cerevisiae Mol Biol Evol 313228ndash3239

Chown SL 2012 Trait-based approaches to conservation physiologyforecasting environmental change risks from the bottom upPhilos Trans R Soc B 3671615ndash1627

Chown SL Nicolson SW 2004 Insect physiological ecology mechanismsand patterns Oxford Oxford University Press

Chown SL Soslashrensen JG Terblanche JS 2011 Water loss in insectsan environmental change perspective J Insect Physiol571070ndash1084

Not in GWAS In GWAS

Not in Telonis-Scott et al (2012) N ndash intersect-AndashB A-intersect

In Telonis-Scott et al (2012) B-intersect Intersect(AB)

13

Chung H Loehlin DW Dufour HD Vaccarro K Millar JG Carroll SB2014 A single gene affects both ecological divergence and matechoice in Drosophila Science 3431148ndash1151

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly(Austin) 680ndash92

Civelek M Lusis AJ 2014 Systems genetics approaches to understandcomplex traits Nat Rev Genet 1534ndash48

Clusella-Trullas S Blackburn TM Chown SL 2011 Climatic predictors oftemperature performance curve parameters in ectotherms implycomplex responses to climate change Am Nat 177738ndash751

Csardi G Nepusz T 2006 The igraph software package for complexnetwork research InterJournal Complex Systems 1695 Availablefrom httpigraphorg

Davies SA Cabrero P Overend G Aitchison L Sebastian S Terhzaz SDow JA 2014 Cell signalling mechanisms for insect stress toleranceJ Exp Biol 217119ndash128

Davies SA Cabrero P Povsic M Johnston NR Terhzaz S Dow JA 2013Signaling by Drosophila capa neuropeptides Gen Comp Endocrinol18860ndash66

Davies SA Overend G Sebastian S Cundall M Cabrero P Dow JATerhzaz S 2012 Immune and stress response lsquocross-talkrsquo in theDrosophila Malpighian tubule J Insect Physiol 58488ndash497

Day JP Dow JA Houslay MD Davies SA 2005 Cyclic nucleotide phos-phodiesterases in Drosophila melanogaster Biochem J 388333ndash342

de Nadal E Ammerer G Posas F 2011 Controlling gene expression inresponse to stress Nat Rev Genet 12833ndash845

DePristo MA Banks E Poplin R Garimella KV Maguire JR Hartl CPhilippakis AA del Angel G Rivas MA Hanna M et al 2011 Aframework for variation discovery and genotyping using next-gen-eration DNA sequencing data Nat Genet 43491ndash498

Djawdan M Chippindale AK Rose MR Bradley TJ 1998 Metabolic re-serves and evolved stress resistance in Drosophila melanogasterPhysiol Zool 71584ndash594

Durham MF Magwire MM Stone EA Leips J 2014 Genome-wide anal-ysis in Drosophila reveals age-specific effects of SNPs on fitness traitsNat Commun 54338

Emilsson V Thorleifsson G Zhang B Leonardson AS Zink F Zhu JCarlson S Helgason A Walters GB Gunnarsdottir S et al 2008Genetics of gene expression and its effect on disease Nature452423ndash428

Fabian DK Kapun M Nolte V Kofler R Schmidt PS Schleurootterer C FlattT 2012 Genome-wide patterns of latitudinal differentiation amongpopulations of Drosophila melanogaster from North America MolEcol 214748ndash4769

Foley BR Telonis-Scott M 2011 Quantitative genetic analysis suggestscausal association between cuticular hydrocarbon composition anddesiccation survival in Drosophila melanogaster Heredity 10668ndash77

Folk DG Han C Bradley TJ 2001 Water acquisition and partitioning inDrosophila melanogaster effects of selection for desiccation-resis-tance J Exp Biol 2043323ndash3331

Fowler MA Montell C 2013 Drosophila TRP channels and animal be-havior Life Sci 92394ndash403

Fu J Keurentjes JJB Bouwmeester H America T Verstappen FWA WardJL Beale MH de Vos RCH Dijkstra M Scheltema RA et al 2009System-wide molecular evidence for phenotypic buffering inArabidopsis Nat Genet 41166ndash167

Futschik A Schleurootterer C 2010 The next generation of molecular mar-kers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Gibbs AG Fukuzato F Matzkin LM 2003 Evolution of water conserva-tion mechanisms in Drosophila J Exp Biol 2061183ndash1192

Gibbs AG Matzkin LM 2001 Evolution of water balance in the genusDrosophila J Exp Biol 2042331ndash2338

Goodstadt L 2010 Ruffus a lightweight Python library for computa-tional pipelines Bioinformatics 262778ndash2779

Hadley NF 1994 Water relations of terrestrial arthropods San Diego(CA) Academic Press

Hoffmann AA Hallas R Sinclair C Mitrovski P 2001 Levels of variationin stress resistance in Drosophila among strains local populationsand geographic regions patterns for desiccation starvation coldresistance and associated traits Evolution 551621ndash1630

Hoffmann AA Parsons PA 1989a An integrated approach to environ-mental stress tolerance and life-history variation desiccation toler-ance in Drosophila Biol J Linn Soc 37117ndash136

Hoffmann AA Parsons PA 1989b Selection for increased desiccationresistance in Drosophila melanogaster additive genetic control andcorrelated responses for other stresses Genetics 122837ndash845

Hoffmann AA Sgro CM 2011 Climate change and evolutionary adap-tation Nature 470479ndash485

Hoffmann AA Weeks AR 2007 Climatic selection on genes and traitsafter a 100 year-old invasion a critical look at the temperate-tropicalclines in Drosophila melanogaster from eastern Australia Genetica129133ndash147

Huang W Richards S Carbone MA Zhu D Anholt RR Ayroles JFDuncan L Jordan KW Lawrence F Magwire MM et al 2012Epistasis dominates the genetic architecture of Drosophila quanti-tative traits Proc Natl Acad Sci U S A 10915553ndash15559

Huang Z Tunnacliffe A 2004 Response of human cells to desiccationcomparison with hyperosmotic stress response J Physiol558181ndash191

Huang Z Tunnacliffe A 2005 Gene induction by desiccation stress inhuman cell cultures FEBS Lett 5794973ndash4977

Ichihashi Y Aguilar-Martınez JA Farhi M Chitwood DH Kumar RMillon LV Peng J Maloof JN Sinha NR 2014 Evolutionary develop-mental transcriptomics reveals a gene network module regulatinginterspecific diversity in plant leaf shape Proc Natl Acad Sci U S A111E2616ndashE2621

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Johnson WA Carder JW 2012 Drosophila nociceptors mediate larvalaversion to dry surface environments utilizing both the painless TRPchannel and the DEGENaC subunit PPK1 PLoS One 7e32878

Kahsai L Kapan N Dircksen H Winther AM Nassel DR 2010 Metabolicstress responses in Drosophila are modulated by brain neurosecre-tory cells that produce multiple neuropeptides PLoS One 5e11480

Kapun M van Schalkwyk H McAllister B Flatt T Schleurootterer C 2014Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogasterMol Ecol 231813ndash1827

Kawano T Shimoda M Matsumoto H Ryuda M Tsuzuki S Hayakawa Y2010 Identification of a gene Desiccate contributing to desicca-tion resistance in Drosophila melanogaster J Biol Chem28538889ndash38897

Kellermann V van Heerwaarden B Sgro CM Hoffmann AA 2009Fundamental evolutionary limits in ecological traits driveDrosophila species distributions Science 3251244ndash1246

Kofler R Nolte V Schleurootterer C 2016 The impact of library preparationprotocols on the consistency of allele frequency estimates in Pool-Seq data Mol Ecol Resour 16118ndash122

Kofler R Pandey RV Schleurootterer C 2011 PoPoolation2 identifying dif-ferentiation between populations using sequencing of pooled DNAsamples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schleurootterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics282084ndash2085

Leiserson MDM Eldridge JV Ramachandran S Raphael BJ 2013Network analysis of GWAS data Curr Opin Genet Dev 23602ndash610

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

14

Liu Y Luo J Carlsson MA Neuroassel DR 2015 Serotonin and insulin-likepeptides modulate leucokinin-producing neurons that affectfeeding and water homeostasis in Drosophila J Comp Neurol5231840ndash1863

Lohse M Bolger AM Nagel A Fernie AR Lunn JE Stitt M Usadel B 2012RobiNA a user-friendly integrated software solution for RNA-Seq-based transcriptomics Nucleic Acids Res 40W622ndashW627

Long A Liti G Luptak A Tenaillon O 2015 Elucidating the moleculararchitecture of adaptation via evolve and resequence experimentsNat Rev Genet 16567ndash582

Lynch M Bost D Wilson S Maruki T Harrison S 2014 Population-genetic inference from pooled-sequencing data Genome Biol Evol61210ndash1218

Mackay TFC Stone EA Ayroles JF 2009 The genetics of quantitativetraits challenges and prospects Nat Rev Genet 10565ndash577

Marron MT Markow TA Kain KJ Gibbs AG 2003 Effects of starvationand desiccation on energy metabolism in desert and mesicDrosophila J Insect Physiol 49261ndash270

Matzkin LM Markow TA 2009 Transcriptional regulation of metabo-lism associated with the increased desiccation resistance of thecactophilic Drosophila mojavensis Genetics 1821279ndash1288

Murali T Pacifico S Yu J Guest S Roberts GG Finley RL Jr 2011 DroID2011 a comprehensive integrated resource for protein transcrip-tion factor RNA and gene interactions for Drosophila Nucleic AcidsRes 39D736ndashD743

Nuzhdin SV Turner TL 2013 Promises and limitations of hitchhikingmapping Curr Opin Genet Dev 23694ndash699

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Picard Picard A set of Java command line tools for manipulating high-throughput seqeuncing (HTS) data and formats Available fromhttpbroadinstitutegithubiopicard

Qiu Y Tittiger C Wicker-Thomas C Le Goff G Young S Wajnberg EFricaux T Taquet N Blomquist GJ Feyereisen R 2012 An insect-specific P450 oxidative decarbonylase for cuticular hydrocarbon bio-synthesis Proc Natl Acad Sci U S A 10914858ndash14863

Rajpurohit S Oliveira CC Etges WJ Gibbs AG 2013 Functional genomicand phenotypic responses to desiccation in natural populations of adesert Drosophilid Mol Ecol 222698ndash2715

Sarup P Soslashrensen JG Kristensen TN Hoffmann AA Loeschcke V PaigeKN Soslashrensen P 2011 Candidate genes detected in transcriptomestudies are strongly dependent on genetic background PLoS One6e15644

Schiffer M Hangartner S Hoffmann AA 2013 Assessing the relativeimportance of environmental effects carry-over effects and speciesdifferences in thermal stress resistance a comparison ofDrosophilids across field and laboratory generations J Exp Biol2163790ndash3798

Schleurootterer C Kofler R Versace E Tobler R Franssen SU 2015Combining experimental evolution with next-generation sequen-cing A powerful tool to study adaptation from standing geneticvariation Heredity 114431ndash440

Sinclair BJ Gibbs AG Roberts SP 2007 Gene transcription during ex-posure to and recovery from cold and desiccation stress inDrosophila melanogaster Insect Mol Biol 16435ndash443

Sloggett C Wakefield M Philip G Pope B 2014 Rubramdashflexible distrib-uted pipelines figshare Available from httpdxdoiorg106084m9figshare895626

Smit AFA Hubley R Green P 2013-2015 RepeatMasker Open-403Available from httpwwwrepeatmaskerorg

Seurooderberg JA Birse RT Neuroassel DR 2011 Insulin production and signalingin renal tubules of Drosophila is under control of tachykinin-relatedpeptide and regulates stress resistance PLoS One 6e19866

Soslashrensen JG Nielsen MM Loeschcke V 2007 Gene expression profileanalysis of Drosophila melanogaster selected for resistance to envi-ronmental stressors J Evol Biol 201624ndash1636

St Pierre SE Ponting L Stefancsik R McQuilton P 2014 FlyBase102mdashadvanced approaches to interrogating FlyBase Nucleic AcidsRes 42D780ndashD788

Stevenson M Nunes T Heuser C Marshall J Sanchez J Thornton RReiczigel J Robison-Cox J Sebastiani P Solymos P et al 2015 epiRTools for the analysis of epidemiological data R package version09-62 Available from httpCRANR-projectorgpackagefrac14epiR

Storey JD Tibshirani R 2003 Statistical significance for genomewidestudies Proc Natl Acad Sci U S A 1009440ndash9445

Telonis-Scott M Gane M DeGaris S Sgro CM Hoffmann AA 2012 Highresolution mapping of candidate alleles for desiccation resistancein Drosophila melanogaster under selection Mol Biol Evol291335ndash1351

Telonis-Scott M Guthridge KM Hoffmann AA 2006 A new set oflaboratory-selected Drosophila melanogaster lines for the analysisof desiccation resistance response to selection physiology and cor-related responses J Exp Biol 2091837ndash1847

Terhzaz S Cabrero P Robben JH Radford JC Hudson BD Milligan GDow JA Davies SA 2012 Mechanism and function of Drosophilacapa GPCR a desiccation stress-responsive receptor with functionalhomology to human neuromedinU receptor PLoS One 7e29897

Terhzaz S Overend G Sebastian S Dow JA Davies SA 2014 The Dmelanogaster capa-1 neuropeptide activates renal NF-kB signalingPeptides 53218ndash224

Terhzaz S Teets NM Cabrero P Henderson L Ritchie MG Nachman RJDow JA Denlinger DL Davies SA 2015 Insect capa neuropeptidesimpact desiccation and cold tolerance Proc Natl Acad Sci U S A1122882ndash2887

Thorat LJ Gaikwad SM Nath BB 2012 Trehalose as an indicator ofdesiccation stress in Drosophila melanogaster larvae a potentialmarker of anhydrobiosis Biochem Biophys Res Commun419638ndash642

Tobler R Franssen SU Kofler R Orozco-terWengel P Nolte VHermisson J Schl eurootterer C 2014 Massive habitat-specific ge-nomic response in D melanogaster populations during exper-imental evolution in hot and cold environments Mol Biol Evol31364ndash375

Turner TL Bourne EC Von Wettberg EJ Hu TT Nuzhdin SV 2010Population resequencing reveals local adaptation of Arabidopsislyrata to serpentine soils Nat Genet 42260ndash263

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Vermeulen CJ Soslashrensen P Kirilova Gagalova K Loeschcke V 2013Transcriptomic analysis of inbreeding depression in cold-sensitiveDrosophila melanogaster shows upregulation of the immune re-sponse J Evol Biol 261890ndash1902

Wang B Goode J Best J Meltzer J Schilman PE Chen J Garza D ThomasJB Montminy M 2008 The insulin-regulated CREB coactivatorTORC promotes stress resistance in Drosophila Cell Metab7434ndash444

Wang J Kean L Yang J Allan AK Davies SA Herzyk P Dow JA 2004Function-informed transcriptome analysis of Drosophila renaltubule Genome Biol 5R69

Wang K Li M Hakonarson H 2010 Analysing biological pathways ingenome-wide association studies Nat Rev Genet 11843ndash854

Wang L Jia P Wolfinger RD Chen X Zhao Z 2011 Gene set analysis ofgenome-wide association studies methodological issues and per-spectives Genomics 981ndash8

Weber AL Khan GF Magwire MM Tabor CL Mackay TF Anholt RR2012 Genome-wide association analysis of oxidative stress resistancein Drosophila melanogaster PLoS One 7e34745

Weeks AR McKechnie SW Hoffmann AA 2002 Dissecting adaptiveclinal variation markers inversions and sizestress associations inDrosophila melanogaster from a central field population Ecol Lett5756ndash763

Zhu Y Bergland AO Gonzalez J Petrov DA 2012 Empirical validation ofpooled whole genome population re-sequencing in Drosophila mel-anogaster PLoS One 7e41901

15

Page 13: Cross-Study Comparison Reveals Common Genomic, Network, … · 2016-02-06 · resistance using a high-throughput sequencing Pool-GWAS (genome-wide association study) approach ( Bastide

region (Telonis-Scott et al 2012) The ldquooverlaprdquo analysis wasperformed similarly for the GWAS candidate gene and net-work-level approaches First the gene list overlap was testedbetween the two studies using Fisherrsquos exact test where the382 genes identified from the candidate SNPs (this study)were compared with 416 genes identified from candidatesingle feature polymorphisms (Telonis-Scott et al 2012)The contingency table was constructed using the numberof annotated genes in the D melanogaster R553 genomebuilt as an approximation for the total number of genestested in each study and took the following form

where A frac14 gene list from Telonis-Scott et al (2012)n frac14 416 B frac14 gene list from this study n frac14 382 N frac14 totalnumber of genes in D melanogaster reference genome r553n frac14 17106

To annotate genome features in the overlap candidategene list we used the SNP data (this study) which betterresolves alleles than array-based genotyping Due to this lim-itation we did not compare exact genomic coordinates be-tween the studies but considered overlap to occur whendifferentiated alleles were detected in the same gene inboth studies We did however examine the allele frequenciesaround the Dys-RF promoter which was sequenced sepa-rately revealing a selective sweep in the experimental evolu-tion lines (Telonis-Scott et al 2012)

The overlapping candidates were tested for functional en-richment using a basic gene list approach As genes ratherthan SNPs were analyzed here Fly Mine v400 (wwwflymineorg) was applied with the following parameters Gene lengthnormalization BenjaminindashHochberg FDR correction Plt 005Ontology frac14 biological process and default background (allGO-annotated D melanogaster genes)

Finally we constructed a second PPI network from thecandidate genes mapped in Telonis-Scott et al (2012) to ex-amine system-level overlap between the two studies The 416genes mapped to 337 protein seeds and the network wasconstructed as described above The degree of network over-lap between the two studies was tested using simulation Ineach iteration a gene set of the same length as the list fromthis study and a gene set of the same length as the list fromTelonis-Scott et al (2012) were resampled randomly from theentire D melanogaster mapped gene annotation (r553) Eachresampled set was then used to build a first-order network aspreviously described The overlap network between the tworesampled sets was calculated using the graphintersectioncommand from the igraph package and zero-degree nodes(proteins with no interactions) were removed Overlap wasquantified by counting the number of nodes and the numberof edges in this overlap network The observed overlap wascompared with a simulated null distribution for both nodeand edge number where n frac14 1000 simulation iterations

Supplementary MaterialSupplementary figures S1ndashS4 and tables S1ndashS5 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

We thank Robert Good for the fly sampling and JenniferShirriffs and Lea and Alan Rako for outstanding assistancewith laboratory culture and substantial desiccation assaysWe are grateful to Melissa Davis for advice on network build-ing and comparison approaches and to five anonymous re-viewers whose comments improved the manuscript Thiswork was supported by a DECRA fellowship DE120102575and Monash University Technology voucher scheme toMTS an ARC Future Fellowship and Discovery Grants toCMS an ARC Laureate Fellowship to AAH and a Scienceand Industry Endowment Fund grant to CMS and AAH

ReferencesAlexander JM Lomvardas S 2014 Nuclear architecture as an epigenetic

regulator of neural development and function Neuroscience26439ndash50

Anderson AR Hoffmann AA McKechnie SW Umina PA Weeks AR2005 The latitudinal cline in the In(3R)Payne inversion polymor-phism has shifted in the last 20 years in Australian Drosophilamelanogaster populations Mol Ecol 14851ndash858

Andolfatto P Wall JD Kreitman M 1999 Unusual haplotype structureat the proximal breakpoint of In(2L)t in a natural population ofDrosophila melanogaster Genetics 1531297ndash1311

Andrews S 2014 FastQC Available from httpwwwbioinformaticsbbsrcacukprojectsfastqc

Ashburner M Ball CA Blake JA Botstein D Butler H Cherry JM DavisAP Dolinski K Dwight SS Eppig JT et al 2000 Gene ontology toolfor the unification of biology The Gene Ontology Consortium NatGenet 2525ndash29

Aytes A Mitrofanova A Lefebvre C Alvarez Mariano J Castillo-MartinM Zheng T Eastham James A Gopalan A Pienta Kenneth J ShenMM et al 2014 Cross-species regulatory network analysis identifiesa synergistic interaction between FOXM1 and CENPF that drivesprostate cancer malignancy Cancer Cell 25638ndash651

Bastide H Betancourt A Nolte V Tobler R Steuroobe P Futschik ASchleurootterer C 2013 A genome-wide fine-scale map of natural pig-mentation variation in Drosophila melanogaster PLoS Genet9e1003534

Bergland AO Behrman EL OrsquoBrien KR Schmidt PS Petrov DA 2014Genomic evidence of rapid and stable adaptive oscillations overseasonal time scales in Drosophila PLoS Genet 10e1004775

Bonebrake TC Mastrandrea MD 2010 Tolerance adaptation and pre-cipitation changes complicate latitudinal patterns of climate changeimpacts Proc Natl Acad Sci U S A 10712581ndash12586

Bradley TJ 2009 Animal osmoregulation Oxford Oxford UniversityPress

Burke MK Liti G Long AD 2014 Standing genetic variation drivesrepeatable experimental evolution in outcrossing populations ofSaccharomyces cerevisiae Mol Biol Evol 313228ndash3239

Chown SL 2012 Trait-based approaches to conservation physiologyforecasting environmental change risks from the bottom upPhilos Trans R Soc B 3671615ndash1627

Chown SL Nicolson SW 2004 Insect physiological ecology mechanismsand patterns Oxford Oxford University Press

Chown SL Soslashrensen JG Terblanche JS 2011 Water loss in insectsan environmental change perspective J Insect Physiol571070ndash1084

Not in GWAS In GWAS

Not in Telonis-Scott et al (2012) N ndash intersect-AndashB A-intersect

In Telonis-Scott et al (2012) B-intersect Intersect(AB)

13

Chung H Loehlin DW Dufour HD Vaccarro K Millar JG Carroll SB2014 A single gene affects both ecological divergence and matechoice in Drosophila Science 3431148ndash1151

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly(Austin) 680ndash92

Civelek M Lusis AJ 2014 Systems genetics approaches to understandcomplex traits Nat Rev Genet 1534ndash48

Clusella-Trullas S Blackburn TM Chown SL 2011 Climatic predictors oftemperature performance curve parameters in ectotherms implycomplex responses to climate change Am Nat 177738ndash751

Csardi G Nepusz T 2006 The igraph software package for complexnetwork research InterJournal Complex Systems 1695 Availablefrom httpigraphorg

Davies SA Cabrero P Overend G Aitchison L Sebastian S Terhzaz SDow JA 2014 Cell signalling mechanisms for insect stress toleranceJ Exp Biol 217119ndash128

Davies SA Cabrero P Povsic M Johnston NR Terhzaz S Dow JA 2013Signaling by Drosophila capa neuropeptides Gen Comp Endocrinol18860ndash66

Davies SA Overend G Sebastian S Cundall M Cabrero P Dow JATerhzaz S 2012 Immune and stress response lsquocross-talkrsquo in theDrosophila Malpighian tubule J Insect Physiol 58488ndash497

Day JP Dow JA Houslay MD Davies SA 2005 Cyclic nucleotide phos-phodiesterases in Drosophila melanogaster Biochem J 388333ndash342

de Nadal E Ammerer G Posas F 2011 Controlling gene expression inresponse to stress Nat Rev Genet 12833ndash845

DePristo MA Banks E Poplin R Garimella KV Maguire JR Hartl CPhilippakis AA del Angel G Rivas MA Hanna M et al 2011 Aframework for variation discovery and genotyping using next-gen-eration DNA sequencing data Nat Genet 43491ndash498

Djawdan M Chippindale AK Rose MR Bradley TJ 1998 Metabolic re-serves and evolved stress resistance in Drosophila melanogasterPhysiol Zool 71584ndash594

Durham MF Magwire MM Stone EA Leips J 2014 Genome-wide anal-ysis in Drosophila reveals age-specific effects of SNPs on fitness traitsNat Commun 54338

Emilsson V Thorleifsson G Zhang B Leonardson AS Zink F Zhu JCarlson S Helgason A Walters GB Gunnarsdottir S et al 2008Genetics of gene expression and its effect on disease Nature452423ndash428

Fabian DK Kapun M Nolte V Kofler R Schmidt PS Schleurootterer C FlattT 2012 Genome-wide patterns of latitudinal differentiation amongpopulations of Drosophila melanogaster from North America MolEcol 214748ndash4769

Foley BR Telonis-Scott M 2011 Quantitative genetic analysis suggestscausal association between cuticular hydrocarbon composition anddesiccation survival in Drosophila melanogaster Heredity 10668ndash77

Folk DG Han C Bradley TJ 2001 Water acquisition and partitioning inDrosophila melanogaster effects of selection for desiccation-resis-tance J Exp Biol 2043323ndash3331

Fowler MA Montell C 2013 Drosophila TRP channels and animal be-havior Life Sci 92394ndash403

Fu J Keurentjes JJB Bouwmeester H America T Verstappen FWA WardJL Beale MH de Vos RCH Dijkstra M Scheltema RA et al 2009System-wide molecular evidence for phenotypic buffering inArabidopsis Nat Genet 41166ndash167

Futschik A Schleurootterer C 2010 The next generation of molecular mar-kers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Gibbs AG Fukuzato F Matzkin LM 2003 Evolution of water conserva-tion mechanisms in Drosophila J Exp Biol 2061183ndash1192

Gibbs AG Matzkin LM 2001 Evolution of water balance in the genusDrosophila J Exp Biol 2042331ndash2338

Goodstadt L 2010 Ruffus a lightweight Python library for computa-tional pipelines Bioinformatics 262778ndash2779

Hadley NF 1994 Water relations of terrestrial arthropods San Diego(CA) Academic Press

Hoffmann AA Hallas R Sinclair C Mitrovski P 2001 Levels of variationin stress resistance in Drosophila among strains local populationsand geographic regions patterns for desiccation starvation coldresistance and associated traits Evolution 551621ndash1630

Hoffmann AA Parsons PA 1989a An integrated approach to environ-mental stress tolerance and life-history variation desiccation toler-ance in Drosophila Biol J Linn Soc 37117ndash136

Hoffmann AA Parsons PA 1989b Selection for increased desiccationresistance in Drosophila melanogaster additive genetic control andcorrelated responses for other stresses Genetics 122837ndash845

Hoffmann AA Sgro CM 2011 Climate change and evolutionary adap-tation Nature 470479ndash485

Hoffmann AA Weeks AR 2007 Climatic selection on genes and traitsafter a 100 year-old invasion a critical look at the temperate-tropicalclines in Drosophila melanogaster from eastern Australia Genetica129133ndash147

Huang W Richards S Carbone MA Zhu D Anholt RR Ayroles JFDuncan L Jordan KW Lawrence F Magwire MM et al 2012Epistasis dominates the genetic architecture of Drosophila quanti-tative traits Proc Natl Acad Sci U S A 10915553ndash15559

Huang Z Tunnacliffe A 2004 Response of human cells to desiccationcomparison with hyperosmotic stress response J Physiol558181ndash191

Huang Z Tunnacliffe A 2005 Gene induction by desiccation stress inhuman cell cultures FEBS Lett 5794973ndash4977

Ichihashi Y Aguilar-Martınez JA Farhi M Chitwood DH Kumar RMillon LV Peng J Maloof JN Sinha NR 2014 Evolutionary develop-mental transcriptomics reveals a gene network module regulatinginterspecific diversity in plant leaf shape Proc Natl Acad Sci U S A111E2616ndashE2621

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Johnson WA Carder JW 2012 Drosophila nociceptors mediate larvalaversion to dry surface environments utilizing both the painless TRPchannel and the DEGENaC subunit PPK1 PLoS One 7e32878

Kahsai L Kapan N Dircksen H Winther AM Nassel DR 2010 Metabolicstress responses in Drosophila are modulated by brain neurosecre-tory cells that produce multiple neuropeptides PLoS One 5e11480

Kapun M van Schalkwyk H McAllister B Flatt T Schleurootterer C 2014Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogasterMol Ecol 231813ndash1827

Kawano T Shimoda M Matsumoto H Ryuda M Tsuzuki S Hayakawa Y2010 Identification of a gene Desiccate contributing to desicca-tion resistance in Drosophila melanogaster J Biol Chem28538889ndash38897

Kellermann V van Heerwaarden B Sgro CM Hoffmann AA 2009Fundamental evolutionary limits in ecological traits driveDrosophila species distributions Science 3251244ndash1246

Kofler R Nolte V Schleurootterer C 2016 The impact of library preparationprotocols on the consistency of allele frequency estimates in Pool-Seq data Mol Ecol Resour 16118ndash122

Kofler R Pandey RV Schleurootterer C 2011 PoPoolation2 identifying dif-ferentiation between populations using sequencing of pooled DNAsamples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schleurootterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics282084ndash2085

Leiserson MDM Eldridge JV Ramachandran S Raphael BJ 2013Network analysis of GWAS data Curr Opin Genet Dev 23602ndash610

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

14

Liu Y Luo J Carlsson MA Neuroassel DR 2015 Serotonin and insulin-likepeptides modulate leucokinin-producing neurons that affectfeeding and water homeostasis in Drosophila J Comp Neurol5231840ndash1863

Lohse M Bolger AM Nagel A Fernie AR Lunn JE Stitt M Usadel B 2012RobiNA a user-friendly integrated software solution for RNA-Seq-based transcriptomics Nucleic Acids Res 40W622ndashW627

Long A Liti G Luptak A Tenaillon O 2015 Elucidating the moleculararchitecture of adaptation via evolve and resequence experimentsNat Rev Genet 16567ndash582

Lynch M Bost D Wilson S Maruki T Harrison S 2014 Population-genetic inference from pooled-sequencing data Genome Biol Evol61210ndash1218

Mackay TFC Stone EA Ayroles JF 2009 The genetics of quantitativetraits challenges and prospects Nat Rev Genet 10565ndash577

Marron MT Markow TA Kain KJ Gibbs AG 2003 Effects of starvationand desiccation on energy metabolism in desert and mesicDrosophila J Insect Physiol 49261ndash270

Matzkin LM Markow TA 2009 Transcriptional regulation of metabo-lism associated with the increased desiccation resistance of thecactophilic Drosophila mojavensis Genetics 1821279ndash1288

Murali T Pacifico S Yu J Guest S Roberts GG Finley RL Jr 2011 DroID2011 a comprehensive integrated resource for protein transcrip-tion factor RNA and gene interactions for Drosophila Nucleic AcidsRes 39D736ndashD743

Nuzhdin SV Turner TL 2013 Promises and limitations of hitchhikingmapping Curr Opin Genet Dev 23694ndash699

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Picard Picard A set of Java command line tools for manipulating high-throughput seqeuncing (HTS) data and formats Available fromhttpbroadinstitutegithubiopicard

Qiu Y Tittiger C Wicker-Thomas C Le Goff G Young S Wajnberg EFricaux T Taquet N Blomquist GJ Feyereisen R 2012 An insect-specific P450 oxidative decarbonylase for cuticular hydrocarbon bio-synthesis Proc Natl Acad Sci U S A 10914858ndash14863

Rajpurohit S Oliveira CC Etges WJ Gibbs AG 2013 Functional genomicand phenotypic responses to desiccation in natural populations of adesert Drosophilid Mol Ecol 222698ndash2715

Sarup P Soslashrensen JG Kristensen TN Hoffmann AA Loeschcke V PaigeKN Soslashrensen P 2011 Candidate genes detected in transcriptomestudies are strongly dependent on genetic background PLoS One6e15644

Schiffer M Hangartner S Hoffmann AA 2013 Assessing the relativeimportance of environmental effects carry-over effects and speciesdifferences in thermal stress resistance a comparison ofDrosophilids across field and laboratory generations J Exp Biol2163790ndash3798

Schleurootterer C Kofler R Versace E Tobler R Franssen SU 2015Combining experimental evolution with next-generation sequen-cing A powerful tool to study adaptation from standing geneticvariation Heredity 114431ndash440

Sinclair BJ Gibbs AG Roberts SP 2007 Gene transcription during ex-posure to and recovery from cold and desiccation stress inDrosophila melanogaster Insect Mol Biol 16435ndash443

Sloggett C Wakefield M Philip G Pope B 2014 Rubramdashflexible distrib-uted pipelines figshare Available from httpdxdoiorg106084m9figshare895626

Smit AFA Hubley R Green P 2013-2015 RepeatMasker Open-403Available from httpwwwrepeatmaskerorg

Seurooderberg JA Birse RT Neuroassel DR 2011 Insulin production and signalingin renal tubules of Drosophila is under control of tachykinin-relatedpeptide and regulates stress resistance PLoS One 6e19866

Soslashrensen JG Nielsen MM Loeschcke V 2007 Gene expression profileanalysis of Drosophila melanogaster selected for resistance to envi-ronmental stressors J Evol Biol 201624ndash1636

St Pierre SE Ponting L Stefancsik R McQuilton P 2014 FlyBase102mdashadvanced approaches to interrogating FlyBase Nucleic AcidsRes 42D780ndashD788

Stevenson M Nunes T Heuser C Marshall J Sanchez J Thornton RReiczigel J Robison-Cox J Sebastiani P Solymos P et al 2015 epiRTools for the analysis of epidemiological data R package version09-62 Available from httpCRANR-projectorgpackagefrac14epiR

Storey JD Tibshirani R 2003 Statistical significance for genomewidestudies Proc Natl Acad Sci U S A 1009440ndash9445

Telonis-Scott M Gane M DeGaris S Sgro CM Hoffmann AA 2012 Highresolution mapping of candidate alleles for desiccation resistancein Drosophila melanogaster under selection Mol Biol Evol291335ndash1351

Telonis-Scott M Guthridge KM Hoffmann AA 2006 A new set oflaboratory-selected Drosophila melanogaster lines for the analysisof desiccation resistance response to selection physiology and cor-related responses J Exp Biol 2091837ndash1847

Terhzaz S Cabrero P Robben JH Radford JC Hudson BD Milligan GDow JA Davies SA 2012 Mechanism and function of Drosophilacapa GPCR a desiccation stress-responsive receptor with functionalhomology to human neuromedinU receptor PLoS One 7e29897

Terhzaz S Overend G Sebastian S Dow JA Davies SA 2014 The Dmelanogaster capa-1 neuropeptide activates renal NF-kB signalingPeptides 53218ndash224

Terhzaz S Teets NM Cabrero P Henderson L Ritchie MG Nachman RJDow JA Denlinger DL Davies SA 2015 Insect capa neuropeptidesimpact desiccation and cold tolerance Proc Natl Acad Sci U S A1122882ndash2887

Thorat LJ Gaikwad SM Nath BB 2012 Trehalose as an indicator ofdesiccation stress in Drosophila melanogaster larvae a potentialmarker of anhydrobiosis Biochem Biophys Res Commun419638ndash642

Tobler R Franssen SU Kofler R Orozco-terWengel P Nolte VHermisson J Schl eurootterer C 2014 Massive habitat-specific ge-nomic response in D melanogaster populations during exper-imental evolution in hot and cold environments Mol Biol Evol31364ndash375

Turner TL Bourne EC Von Wettberg EJ Hu TT Nuzhdin SV 2010Population resequencing reveals local adaptation of Arabidopsislyrata to serpentine soils Nat Genet 42260ndash263

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Vermeulen CJ Soslashrensen P Kirilova Gagalova K Loeschcke V 2013Transcriptomic analysis of inbreeding depression in cold-sensitiveDrosophila melanogaster shows upregulation of the immune re-sponse J Evol Biol 261890ndash1902

Wang B Goode J Best J Meltzer J Schilman PE Chen J Garza D ThomasJB Montminy M 2008 The insulin-regulated CREB coactivatorTORC promotes stress resistance in Drosophila Cell Metab7434ndash444

Wang J Kean L Yang J Allan AK Davies SA Herzyk P Dow JA 2004Function-informed transcriptome analysis of Drosophila renaltubule Genome Biol 5R69

Wang K Li M Hakonarson H 2010 Analysing biological pathways ingenome-wide association studies Nat Rev Genet 11843ndash854

Wang L Jia P Wolfinger RD Chen X Zhao Z 2011 Gene set analysis ofgenome-wide association studies methodological issues and per-spectives Genomics 981ndash8

Weber AL Khan GF Magwire MM Tabor CL Mackay TF Anholt RR2012 Genome-wide association analysis of oxidative stress resistancein Drosophila melanogaster PLoS One 7e34745

Weeks AR McKechnie SW Hoffmann AA 2002 Dissecting adaptiveclinal variation markers inversions and sizestress associations inDrosophila melanogaster from a central field population Ecol Lett5756ndash763

Zhu Y Bergland AO Gonzalez J Petrov DA 2012 Empirical validation ofpooled whole genome population re-sequencing in Drosophila mel-anogaster PLoS One 7e41901

15

Page 14: Cross-Study Comparison Reveals Common Genomic, Network, … · 2016-02-06 · resistance using a high-throughput sequencing Pool-GWAS (genome-wide association study) approach ( Bastide

Chung H Loehlin DW Dufour HD Vaccarro K Millar JG Carroll SB2014 A single gene affects both ecological divergence and matechoice in Drosophila Science 3431148ndash1151

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly(Austin) 680ndash92

Civelek M Lusis AJ 2014 Systems genetics approaches to understandcomplex traits Nat Rev Genet 1534ndash48

Clusella-Trullas S Blackburn TM Chown SL 2011 Climatic predictors oftemperature performance curve parameters in ectotherms implycomplex responses to climate change Am Nat 177738ndash751

Csardi G Nepusz T 2006 The igraph software package for complexnetwork research InterJournal Complex Systems 1695 Availablefrom httpigraphorg

Davies SA Cabrero P Overend G Aitchison L Sebastian S Terhzaz SDow JA 2014 Cell signalling mechanisms for insect stress toleranceJ Exp Biol 217119ndash128

Davies SA Cabrero P Povsic M Johnston NR Terhzaz S Dow JA 2013Signaling by Drosophila capa neuropeptides Gen Comp Endocrinol18860ndash66

Davies SA Overend G Sebastian S Cundall M Cabrero P Dow JATerhzaz S 2012 Immune and stress response lsquocross-talkrsquo in theDrosophila Malpighian tubule J Insect Physiol 58488ndash497

Day JP Dow JA Houslay MD Davies SA 2005 Cyclic nucleotide phos-phodiesterases in Drosophila melanogaster Biochem J 388333ndash342

de Nadal E Ammerer G Posas F 2011 Controlling gene expression inresponse to stress Nat Rev Genet 12833ndash845

DePristo MA Banks E Poplin R Garimella KV Maguire JR Hartl CPhilippakis AA del Angel G Rivas MA Hanna M et al 2011 Aframework for variation discovery and genotyping using next-gen-eration DNA sequencing data Nat Genet 43491ndash498

Djawdan M Chippindale AK Rose MR Bradley TJ 1998 Metabolic re-serves and evolved stress resistance in Drosophila melanogasterPhysiol Zool 71584ndash594

Durham MF Magwire MM Stone EA Leips J 2014 Genome-wide anal-ysis in Drosophila reveals age-specific effects of SNPs on fitness traitsNat Commun 54338

Emilsson V Thorleifsson G Zhang B Leonardson AS Zink F Zhu JCarlson S Helgason A Walters GB Gunnarsdottir S et al 2008Genetics of gene expression and its effect on disease Nature452423ndash428

Fabian DK Kapun M Nolte V Kofler R Schmidt PS Schleurootterer C FlattT 2012 Genome-wide patterns of latitudinal differentiation amongpopulations of Drosophila melanogaster from North America MolEcol 214748ndash4769

Foley BR Telonis-Scott M 2011 Quantitative genetic analysis suggestscausal association between cuticular hydrocarbon composition anddesiccation survival in Drosophila melanogaster Heredity 10668ndash77

Folk DG Han C Bradley TJ 2001 Water acquisition and partitioning inDrosophila melanogaster effects of selection for desiccation-resis-tance J Exp Biol 2043323ndash3331

Fowler MA Montell C 2013 Drosophila TRP channels and animal be-havior Life Sci 92394ndash403

Fu J Keurentjes JJB Bouwmeester H America T Verstappen FWA WardJL Beale MH de Vos RCH Dijkstra M Scheltema RA et al 2009System-wide molecular evidence for phenotypic buffering inArabidopsis Nat Genet 41166ndash167

Futschik A Schleurootterer C 2010 The next generation of molecular mar-kers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Gibbs AG Fukuzato F Matzkin LM 2003 Evolution of water conserva-tion mechanisms in Drosophila J Exp Biol 2061183ndash1192

Gibbs AG Matzkin LM 2001 Evolution of water balance in the genusDrosophila J Exp Biol 2042331ndash2338

Goodstadt L 2010 Ruffus a lightweight Python library for computa-tional pipelines Bioinformatics 262778ndash2779

Hadley NF 1994 Water relations of terrestrial arthropods San Diego(CA) Academic Press

Hoffmann AA Hallas R Sinclair C Mitrovski P 2001 Levels of variationin stress resistance in Drosophila among strains local populationsand geographic regions patterns for desiccation starvation coldresistance and associated traits Evolution 551621ndash1630

Hoffmann AA Parsons PA 1989a An integrated approach to environ-mental stress tolerance and life-history variation desiccation toler-ance in Drosophila Biol J Linn Soc 37117ndash136

Hoffmann AA Parsons PA 1989b Selection for increased desiccationresistance in Drosophila melanogaster additive genetic control andcorrelated responses for other stresses Genetics 122837ndash845

Hoffmann AA Sgro CM 2011 Climate change and evolutionary adap-tation Nature 470479ndash485

Hoffmann AA Weeks AR 2007 Climatic selection on genes and traitsafter a 100 year-old invasion a critical look at the temperate-tropicalclines in Drosophila melanogaster from eastern Australia Genetica129133ndash147

Huang W Richards S Carbone MA Zhu D Anholt RR Ayroles JFDuncan L Jordan KW Lawrence F Magwire MM et al 2012Epistasis dominates the genetic architecture of Drosophila quanti-tative traits Proc Natl Acad Sci U S A 10915553ndash15559

Huang Z Tunnacliffe A 2004 Response of human cells to desiccationcomparison with hyperosmotic stress response J Physiol558181ndash191

Huang Z Tunnacliffe A 2005 Gene induction by desiccation stress inhuman cell cultures FEBS Lett 5794973ndash4977

Ichihashi Y Aguilar-Martınez JA Farhi M Chitwood DH Kumar RMillon LV Peng J Maloof JN Sinha NR 2014 Evolutionary develop-mental transcriptomics reveals a gene network module regulatinginterspecific diversity in plant leaf shape Proc Natl Acad Sci U S A111E2616ndashE2621

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Johnson WA Carder JW 2012 Drosophila nociceptors mediate larvalaversion to dry surface environments utilizing both the painless TRPchannel and the DEGENaC subunit PPK1 PLoS One 7e32878

Kahsai L Kapan N Dircksen H Winther AM Nassel DR 2010 Metabolicstress responses in Drosophila are modulated by brain neurosecre-tory cells that produce multiple neuropeptides PLoS One 5e11480

Kapun M van Schalkwyk H McAllister B Flatt T Schleurootterer C 2014Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogasterMol Ecol 231813ndash1827

Kawano T Shimoda M Matsumoto H Ryuda M Tsuzuki S Hayakawa Y2010 Identification of a gene Desiccate contributing to desicca-tion resistance in Drosophila melanogaster J Biol Chem28538889ndash38897

Kellermann V van Heerwaarden B Sgro CM Hoffmann AA 2009Fundamental evolutionary limits in ecological traits driveDrosophila species distributions Science 3251244ndash1246

Kofler R Nolte V Schleurootterer C 2016 The impact of library preparationprotocols on the consistency of allele frequency estimates in Pool-Seq data Mol Ecol Resour 16118ndash122

Kofler R Pandey RV Schleurootterer C 2011 PoPoolation2 identifying dif-ferentiation between populations using sequencing of pooled DNAsamples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schleurootterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics282084ndash2085

Leiserson MDM Eldridge JV Ramachandran S Raphael BJ 2013Network analysis of GWAS data Curr Opin Genet Dev 23602ndash610

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

14

Liu Y Luo J Carlsson MA Neuroassel DR 2015 Serotonin and insulin-likepeptides modulate leucokinin-producing neurons that affectfeeding and water homeostasis in Drosophila J Comp Neurol5231840ndash1863

Lohse M Bolger AM Nagel A Fernie AR Lunn JE Stitt M Usadel B 2012RobiNA a user-friendly integrated software solution for RNA-Seq-based transcriptomics Nucleic Acids Res 40W622ndashW627

Long A Liti G Luptak A Tenaillon O 2015 Elucidating the moleculararchitecture of adaptation via evolve and resequence experimentsNat Rev Genet 16567ndash582

Lynch M Bost D Wilson S Maruki T Harrison S 2014 Population-genetic inference from pooled-sequencing data Genome Biol Evol61210ndash1218

Mackay TFC Stone EA Ayroles JF 2009 The genetics of quantitativetraits challenges and prospects Nat Rev Genet 10565ndash577

Marron MT Markow TA Kain KJ Gibbs AG 2003 Effects of starvationand desiccation on energy metabolism in desert and mesicDrosophila J Insect Physiol 49261ndash270

Matzkin LM Markow TA 2009 Transcriptional regulation of metabo-lism associated with the increased desiccation resistance of thecactophilic Drosophila mojavensis Genetics 1821279ndash1288

Murali T Pacifico S Yu J Guest S Roberts GG Finley RL Jr 2011 DroID2011 a comprehensive integrated resource for protein transcrip-tion factor RNA and gene interactions for Drosophila Nucleic AcidsRes 39D736ndashD743

Nuzhdin SV Turner TL 2013 Promises and limitations of hitchhikingmapping Curr Opin Genet Dev 23694ndash699

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Picard Picard A set of Java command line tools for manipulating high-throughput seqeuncing (HTS) data and formats Available fromhttpbroadinstitutegithubiopicard

Qiu Y Tittiger C Wicker-Thomas C Le Goff G Young S Wajnberg EFricaux T Taquet N Blomquist GJ Feyereisen R 2012 An insect-specific P450 oxidative decarbonylase for cuticular hydrocarbon bio-synthesis Proc Natl Acad Sci U S A 10914858ndash14863

Rajpurohit S Oliveira CC Etges WJ Gibbs AG 2013 Functional genomicand phenotypic responses to desiccation in natural populations of adesert Drosophilid Mol Ecol 222698ndash2715

Sarup P Soslashrensen JG Kristensen TN Hoffmann AA Loeschcke V PaigeKN Soslashrensen P 2011 Candidate genes detected in transcriptomestudies are strongly dependent on genetic background PLoS One6e15644

Schiffer M Hangartner S Hoffmann AA 2013 Assessing the relativeimportance of environmental effects carry-over effects and speciesdifferences in thermal stress resistance a comparison ofDrosophilids across field and laboratory generations J Exp Biol2163790ndash3798

Schleurootterer C Kofler R Versace E Tobler R Franssen SU 2015Combining experimental evolution with next-generation sequen-cing A powerful tool to study adaptation from standing geneticvariation Heredity 114431ndash440

Sinclair BJ Gibbs AG Roberts SP 2007 Gene transcription during ex-posure to and recovery from cold and desiccation stress inDrosophila melanogaster Insect Mol Biol 16435ndash443

Sloggett C Wakefield M Philip G Pope B 2014 Rubramdashflexible distrib-uted pipelines figshare Available from httpdxdoiorg106084m9figshare895626

Smit AFA Hubley R Green P 2013-2015 RepeatMasker Open-403Available from httpwwwrepeatmaskerorg

Seurooderberg JA Birse RT Neuroassel DR 2011 Insulin production and signalingin renal tubules of Drosophila is under control of tachykinin-relatedpeptide and regulates stress resistance PLoS One 6e19866

Soslashrensen JG Nielsen MM Loeschcke V 2007 Gene expression profileanalysis of Drosophila melanogaster selected for resistance to envi-ronmental stressors J Evol Biol 201624ndash1636

St Pierre SE Ponting L Stefancsik R McQuilton P 2014 FlyBase102mdashadvanced approaches to interrogating FlyBase Nucleic AcidsRes 42D780ndashD788

Stevenson M Nunes T Heuser C Marshall J Sanchez J Thornton RReiczigel J Robison-Cox J Sebastiani P Solymos P et al 2015 epiRTools for the analysis of epidemiological data R package version09-62 Available from httpCRANR-projectorgpackagefrac14epiR

Storey JD Tibshirani R 2003 Statistical significance for genomewidestudies Proc Natl Acad Sci U S A 1009440ndash9445

Telonis-Scott M Gane M DeGaris S Sgro CM Hoffmann AA 2012 Highresolution mapping of candidate alleles for desiccation resistancein Drosophila melanogaster under selection Mol Biol Evol291335ndash1351

Telonis-Scott M Guthridge KM Hoffmann AA 2006 A new set oflaboratory-selected Drosophila melanogaster lines for the analysisof desiccation resistance response to selection physiology and cor-related responses J Exp Biol 2091837ndash1847

Terhzaz S Cabrero P Robben JH Radford JC Hudson BD Milligan GDow JA Davies SA 2012 Mechanism and function of Drosophilacapa GPCR a desiccation stress-responsive receptor with functionalhomology to human neuromedinU receptor PLoS One 7e29897

Terhzaz S Overend G Sebastian S Dow JA Davies SA 2014 The Dmelanogaster capa-1 neuropeptide activates renal NF-kB signalingPeptides 53218ndash224

Terhzaz S Teets NM Cabrero P Henderson L Ritchie MG Nachman RJDow JA Denlinger DL Davies SA 2015 Insect capa neuropeptidesimpact desiccation and cold tolerance Proc Natl Acad Sci U S A1122882ndash2887

Thorat LJ Gaikwad SM Nath BB 2012 Trehalose as an indicator ofdesiccation stress in Drosophila melanogaster larvae a potentialmarker of anhydrobiosis Biochem Biophys Res Commun419638ndash642

Tobler R Franssen SU Kofler R Orozco-terWengel P Nolte VHermisson J Schl eurootterer C 2014 Massive habitat-specific ge-nomic response in D melanogaster populations during exper-imental evolution in hot and cold environments Mol Biol Evol31364ndash375

Turner TL Bourne EC Von Wettberg EJ Hu TT Nuzhdin SV 2010Population resequencing reveals local adaptation of Arabidopsislyrata to serpentine soils Nat Genet 42260ndash263

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Vermeulen CJ Soslashrensen P Kirilova Gagalova K Loeschcke V 2013Transcriptomic analysis of inbreeding depression in cold-sensitiveDrosophila melanogaster shows upregulation of the immune re-sponse J Evol Biol 261890ndash1902

Wang B Goode J Best J Meltzer J Schilman PE Chen J Garza D ThomasJB Montminy M 2008 The insulin-regulated CREB coactivatorTORC promotes stress resistance in Drosophila Cell Metab7434ndash444

Wang J Kean L Yang J Allan AK Davies SA Herzyk P Dow JA 2004Function-informed transcriptome analysis of Drosophila renaltubule Genome Biol 5R69

Wang K Li M Hakonarson H 2010 Analysing biological pathways ingenome-wide association studies Nat Rev Genet 11843ndash854

Wang L Jia P Wolfinger RD Chen X Zhao Z 2011 Gene set analysis ofgenome-wide association studies methodological issues and per-spectives Genomics 981ndash8

Weber AL Khan GF Magwire MM Tabor CL Mackay TF Anholt RR2012 Genome-wide association analysis of oxidative stress resistancein Drosophila melanogaster PLoS One 7e34745

Weeks AR McKechnie SW Hoffmann AA 2002 Dissecting adaptiveclinal variation markers inversions and sizestress associations inDrosophila melanogaster from a central field population Ecol Lett5756ndash763

Zhu Y Bergland AO Gonzalez J Petrov DA 2012 Empirical validation ofpooled whole genome population re-sequencing in Drosophila mel-anogaster PLoS One 7e41901

15

Page 15: Cross-Study Comparison Reveals Common Genomic, Network, … · 2016-02-06 · resistance using a high-throughput sequencing Pool-GWAS (genome-wide association study) approach ( Bastide

Liu Y Luo J Carlsson MA Neuroassel DR 2015 Serotonin and insulin-likepeptides modulate leucokinin-producing neurons that affectfeeding and water homeostasis in Drosophila J Comp Neurol5231840ndash1863

Lohse M Bolger AM Nagel A Fernie AR Lunn JE Stitt M Usadel B 2012RobiNA a user-friendly integrated software solution for RNA-Seq-based transcriptomics Nucleic Acids Res 40W622ndashW627

Long A Liti G Luptak A Tenaillon O 2015 Elucidating the moleculararchitecture of adaptation via evolve and resequence experimentsNat Rev Genet 16567ndash582

Lynch M Bost D Wilson S Maruki T Harrison S 2014 Population-genetic inference from pooled-sequencing data Genome Biol Evol61210ndash1218

Mackay TFC Stone EA Ayroles JF 2009 The genetics of quantitativetraits challenges and prospects Nat Rev Genet 10565ndash577

Marron MT Markow TA Kain KJ Gibbs AG 2003 Effects of starvationand desiccation on energy metabolism in desert and mesicDrosophila J Insect Physiol 49261ndash270

Matzkin LM Markow TA 2009 Transcriptional regulation of metabo-lism associated with the increased desiccation resistance of thecactophilic Drosophila mojavensis Genetics 1821279ndash1288

Murali T Pacifico S Yu J Guest S Roberts GG Finley RL Jr 2011 DroID2011 a comprehensive integrated resource for protein transcrip-tion factor RNA and gene interactions for Drosophila Nucleic AcidsRes 39D736ndashD743

Nuzhdin SV Turner TL 2013 Promises and limitations of hitchhikingmapping Curr Opin Genet Dev 23694ndash699

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Picard Picard A set of Java command line tools for manipulating high-throughput seqeuncing (HTS) data and formats Available fromhttpbroadinstitutegithubiopicard

Qiu Y Tittiger C Wicker-Thomas C Le Goff G Young S Wajnberg EFricaux T Taquet N Blomquist GJ Feyereisen R 2012 An insect-specific P450 oxidative decarbonylase for cuticular hydrocarbon bio-synthesis Proc Natl Acad Sci U S A 10914858ndash14863

Rajpurohit S Oliveira CC Etges WJ Gibbs AG 2013 Functional genomicand phenotypic responses to desiccation in natural populations of adesert Drosophilid Mol Ecol 222698ndash2715

Sarup P Soslashrensen JG Kristensen TN Hoffmann AA Loeschcke V PaigeKN Soslashrensen P 2011 Candidate genes detected in transcriptomestudies are strongly dependent on genetic background PLoS One6e15644

Schiffer M Hangartner S Hoffmann AA 2013 Assessing the relativeimportance of environmental effects carry-over effects and speciesdifferences in thermal stress resistance a comparison ofDrosophilids across field and laboratory generations J Exp Biol2163790ndash3798

Schleurootterer C Kofler R Versace E Tobler R Franssen SU 2015Combining experimental evolution with next-generation sequen-cing A powerful tool to study adaptation from standing geneticvariation Heredity 114431ndash440

Sinclair BJ Gibbs AG Roberts SP 2007 Gene transcription during ex-posure to and recovery from cold and desiccation stress inDrosophila melanogaster Insect Mol Biol 16435ndash443

Sloggett C Wakefield M Philip G Pope B 2014 Rubramdashflexible distrib-uted pipelines figshare Available from httpdxdoiorg106084m9figshare895626

Smit AFA Hubley R Green P 2013-2015 RepeatMasker Open-403Available from httpwwwrepeatmaskerorg

Seurooderberg JA Birse RT Neuroassel DR 2011 Insulin production and signalingin renal tubules of Drosophila is under control of tachykinin-relatedpeptide and regulates stress resistance PLoS One 6e19866

Soslashrensen JG Nielsen MM Loeschcke V 2007 Gene expression profileanalysis of Drosophila melanogaster selected for resistance to envi-ronmental stressors J Evol Biol 201624ndash1636

St Pierre SE Ponting L Stefancsik R McQuilton P 2014 FlyBase102mdashadvanced approaches to interrogating FlyBase Nucleic AcidsRes 42D780ndashD788

Stevenson M Nunes T Heuser C Marshall J Sanchez J Thornton RReiczigel J Robison-Cox J Sebastiani P Solymos P et al 2015 epiRTools for the analysis of epidemiological data R package version09-62 Available from httpCRANR-projectorgpackagefrac14epiR

Storey JD Tibshirani R 2003 Statistical significance for genomewidestudies Proc Natl Acad Sci U S A 1009440ndash9445

Telonis-Scott M Gane M DeGaris S Sgro CM Hoffmann AA 2012 Highresolution mapping of candidate alleles for desiccation resistancein Drosophila melanogaster under selection Mol Biol Evol291335ndash1351

Telonis-Scott M Guthridge KM Hoffmann AA 2006 A new set oflaboratory-selected Drosophila melanogaster lines for the analysisof desiccation resistance response to selection physiology and cor-related responses J Exp Biol 2091837ndash1847

Terhzaz S Cabrero P Robben JH Radford JC Hudson BD Milligan GDow JA Davies SA 2012 Mechanism and function of Drosophilacapa GPCR a desiccation stress-responsive receptor with functionalhomology to human neuromedinU receptor PLoS One 7e29897

Terhzaz S Overend G Sebastian S Dow JA Davies SA 2014 The Dmelanogaster capa-1 neuropeptide activates renal NF-kB signalingPeptides 53218ndash224

Terhzaz S Teets NM Cabrero P Henderson L Ritchie MG Nachman RJDow JA Denlinger DL Davies SA 2015 Insect capa neuropeptidesimpact desiccation and cold tolerance Proc Natl Acad Sci U S A1122882ndash2887

Thorat LJ Gaikwad SM Nath BB 2012 Trehalose as an indicator ofdesiccation stress in Drosophila melanogaster larvae a potentialmarker of anhydrobiosis Biochem Biophys Res Commun419638ndash642

Tobler R Franssen SU Kofler R Orozco-terWengel P Nolte VHermisson J Schl eurootterer C 2014 Massive habitat-specific ge-nomic response in D melanogaster populations during exper-imental evolution in hot and cold environments Mol Biol Evol31364ndash375

Turner TL Bourne EC Von Wettberg EJ Hu TT Nuzhdin SV 2010Population resequencing reveals local adaptation of Arabidopsislyrata to serpentine soils Nat Genet 42260ndash263

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Vermeulen CJ Soslashrensen P Kirilova Gagalova K Loeschcke V 2013Transcriptomic analysis of inbreeding depression in cold-sensitiveDrosophila melanogaster shows upregulation of the immune re-sponse J Evol Biol 261890ndash1902

Wang B Goode J Best J Meltzer J Schilman PE Chen J Garza D ThomasJB Montminy M 2008 The insulin-regulated CREB coactivatorTORC promotes stress resistance in Drosophila Cell Metab7434ndash444

Wang J Kean L Yang J Allan AK Davies SA Herzyk P Dow JA 2004Function-informed transcriptome analysis of Drosophila renaltubule Genome Biol 5R69

Wang K Li M Hakonarson H 2010 Analysing biological pathways ingenome-wide association studies Nat Rev Genet 11843ndash854

Wang L Jia P Wolfinger RD Chen X Zhao Z 2011 Gene set analysis ofgenome-wide association studies methodological issues and per-spectives Genomics 981ndash8

Weber AL Khan GF Magwire MM Tabor CL Mackay TF Anholt RR2012 Genome-wide association analysis of oxidative stress resistancein Drosophila melanogaster PLoS One 7e34745

Weeks AR McKechnie SW Hoffmann AA 2002 Dissecting adaptiveclinal variation markers inversions and sizestress associations inDrosophila melanogaster from a central field population Ecol Lett5756ndash763

Zhu Y Bergland AO Gonzalez J Petrov DA 2012 Empirical validation ofpooled whole genome population re-sequencing in Drosophila mel-anogaster PLoS One 7e41901

15