81
www.sciencemag.org/content/345/6194/1250091/suppl/DC1 Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen R. Sandve, Bujie Zhan, Heidi Rudi, Torgeir R. Hvidsten, International Wheat Genome Sequencing Consortium, Klaus F. X. Mayer, Odd-Arne Olsen* *Corresponding author. E-mail: [email protected] Published 18 July 2014, Science 345, 1250091 (2014) DOI: 10.1126/science.1250091 This PDF file includes: Materials and Methods Supplementary Text Figs. S1 to S25 Tables S5 to S31 Full Reference List IWGSC Author List Other Supplementary Material for this manuscript includes the following: (available at www.sciencemag.org/content/345/6194/1250091/suppl/DC1) Tables S1 to S4 as a separate Excel file

Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

www.sciencemag.org/content/345/6194/1250091/suppl/DC1

Supplementary Material for

Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen R. Sandve, Bujie Zhan, Heidi Rudi, Torgeir R.

Hvidsten, International Wheat Genome Sequencing Consortium, Klaus F. X. Mayer, Odd-Arne Olsen*

*Corresponding author. E-mail: [email protected]

Published 18 July 2014, Science 345, 1250091 (2014)

DOI: 10.1126/science.1250091

This PDF file includes:

Materials and Methods

Supplementary Text

Figs. S1 to S25

Tables S5 to S31

Full Reference List

IWGSC Author List Other Supplementary Material for this manuscript includes the following: (available at www.sciencemag.org/content/345/6194/1250091/suppl/DC1)

Tables S1 to S4 as a separate Excel file

Page 2: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

1

Contents: Genome interplay in the grain transcriptome of hexaploid bread wheat Supplementary Online Materials Section Page Supplementary Text……………………………………………………………………….2 1: RNA sequencing ............................................................................................................. 2 2: RNA-seq mapping, gene annotation, and expression profiling ...................................... 3 3: Identification of spatiotemporal co-expression clusters ................................................. 7 4: Analysis of homeologous gene expression ..................................................................... 9 5: Distribution of gene expression along Triticeae prototype (Tp) chromosomes ........... 11 6: Analysis of wheat grain quality genes .......................................................................... 13 Supplementary Figures ..................................................................................................... 15 Supplementary Tables ....................................................................................................... 44

Page 3: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

2

1: RNA sequencing Plant growth and endosperm tissue sampling - Seeds from the same variant of bread wheat cv Chinese Spring that was used for generating the reference genome sequence (6) were provided by Bikram Gill, Kansas State University. Plants were grown in phytotron chambers under a 16 hour photoperiod with 20°C and 15°C day and night temperatures, respectively. Each room contained 75 pots with three plants each. Plants were tagged at anthesis and the middle part of ears were harvested at 10, 20 and 30 DPA and stored at -80°C. Grains were dissected using dry ice and covered by RNAlater-ICE (Life Technologies, NY, USA) under dissecting microscopes. Embryos were removed and grains cut in slices for isolation of the aleuron layer, transfer cells and starchy endosperm. Dissected tissues were frozen in liquid nitrogen and stored at -80°C for subsequent RNA isolation. Tissues from 15 pots per room were pooled before RNA isolation. Two replicate samples were made per room, giving a total of four replicates per tissue and time point. RNA isolation - Total RNA was extracted from frozen plant material using the RNeasy Lipid Tissue Mini kit (QIAGEN, Hilden, Germany; http://www.qiagen.com/applications/plant). Frozen tissue was ground with a mortar and approximately 120 mg of powder tissue was sampled. For starchy endosperm and transfer cells a pre-extraction step to remove starch was carried out as follows: 0.5 ml extraction Buffer (50 mM Tris pH 9.0, 200 mM NaCl, 1% Sarcosyl, 20mM EDTA, and 5 mM DTT) was added to the ground tissue and vortexed before adding 0.5 ml Phenol/Chloroform/Isoamyl alcohol (49:49:2). The solution was vortexed again and spun for 5 minutes at 14,000 RPM in a Sorvall centrifuge at 4°C. The upper 0.5 ml of the aqueous phase was sampled. For all tissue types we then continued with lyzation in 1 ml QIAzol lysis reagent (Qiagen, Hilden, Germany). Two hundred microliters of chloroform was added and centrifuged. The upper aqueous phase was isolated and mixed with 1.5 volumes of 100 % ethanol before vortexing and applied to RNeasy mini spin columns where DNaseI treatment was performed to remove genomic DNA. The column was washed with buffer RPE, dried, and eluted in 45μl of RNase-free water. RNA concentration was measured using a Nanodrop 8000 spectrophotometer (ND8000, Thermo Scientific, Wilmington, USA). RNA integrity was assessed on an Agilent 2100 Bioanalyzer (DE54704553, Agilent Technologies, Inc., CA, USA) using an RNA 6000 LabChip kit. RNA samples were stored at -80°C until sent for sequencing. Transcriptome sequencing - Paired end (PE) sequencing libraries with average insert size 200bp were prepared with TruSeq RNA Sample Preparation Kit v2 (Illumina San Diego, USA) and sequenced on HiSeq2000 (Illumina, San Diego, USA) according to manufacturer’s standard protocols. Raw RNA reads were de-multiplexed, and filtered for contamination and low quality reads with fastx toolkit (http://hannonlab.cshl.edu/fastx_toolkit/). Table S5 provides an overview of the sequencing data set.

Page 4: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

3

2: RNA-seq mapping, gene annotation, and expression profiling Mapping of RNA-seq short reads to the IWGSC genome reference - Illumina PE reads (read length 101bp) were aligned against the repeat masked IWGSC sequence survey genome assembly (accessible at http://plants.ensembl.org/Triticum_aestivum, (6)) including all wheat chromosome arms using Bowtie2 (version 2.1.0, (35)) and TopHat (version 2.0.8, (36)) allowing maximum 2 mismatches per read alignment (parameters --read-mismatches 2 --segment mismatches 1 --max-multihits 20 --r 0) (Figure S1). Filtering of TopHat alignments - All RNA-seq reads were subject to stringent filtering in order to avoid biased expression estimates due to spurious assignment of transcriptome sequences to the incorrect wheat genome. Informative RNA-seq alignments were identified by screening the “accepted_hits.bam” files, which was returned by TopHat, using a custom perl script. Based on the mapping information of both reads the read pairs were categorized into nine alignment scenarios and filtered accordingly to the following rules (Figure S2): 1 & 2 Singletons (only one read of a pair mapped) are accepted in case of

unambiguous alignments (1); multiple aligned reads are discarded (2). 3 & 4 Read pair alignment is accepted if both reads are mapped unambiguous to the

same contig of the wheat genome assembly or on different contigs of the same chromosome arm (i.e. within the same genome).

5 Read pair alignment is accepted if the individual reads are mapped to contigs of different chromosome arms (i.e. genomes)

6 All alignments of a read pair are discarded if both reads are mapped ambiguously. 7 If one end of a read pair is mapped uniquely to contig X and the other read end is

mapped ambiguously to contig X, accept the read pair alignment on contig X. All other alignment combinations are discarded.

8 If one end of a read pair is mapped uniquely to contig Y and the other read ambiguously but only once to a contig Z on the same chromosome arm, accept the alignment to contig Y and Z. All other alignments are discarded.

9 If one read is mapped unique and the other ambiguously, but never on a contig on the same chromosome arm, discard all alignments.

Refinement of the IWGSC gene annotation by incorporation of the endosperm transcriptome sequencing data - The IWGSC reference gene annotation (6) provides the backbone for our endosperm transcriptome analysis. However, the present RNA-seq data set facilitates a more detailed screening for genes and alternative splicing variants active in endosperm development. Filtered alignments of RNA-seq reads of the four individual biological replicates (2 rooms x 2 plants) were merged for each sample and cufflinks (version 2.0.2, (37)) was applied to assemble transcript structures. Thereby, the IWGSC gene reference annotation was supplied as additional assembly information (parameter ‘-g’). Individual predicted transcript models were combined with cuffcompare (v2.0.2, (37)), which takes the cufflinks output structures and creates a non-redundant set of transcript structures by clustering redundant transcript structures predicted for two or more samples and sharing identical intron structure. The nucleotide sequences of novel

Page 5: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

4

assembled transcripts were extracted from the genome assembly based on the exon coordinates. For these transcripts putative open reading frames (ORFs) were predicted by using the OrfPredictor software (55). Sequence homology information was supplied to OrfPredictor for all transcript structures that are located at previously unknown loci (BLASTX against a combined plant protein data set including Brachypodium, rice, sorghum, maize and Arabidopsis protein sequences (40, 47-49, 56); e-value cutoff 1e-5) and strand direction was passed to the assembled transcript structures. For all novel transcripts located at known loci the strand direction was inferred from the IWGSC reference gene annotation before protein sequence prediction. All novel predicted gene loci were subject to a stringent confidence class assignment by using the same multi-level peptide-homology analysis that was applied for the IWGSC gene annotation (6). Transcripts of each gene locus were compared against known plant protein sequence data sets and predicted genes assigned to four high-confidence levels (HC1 to 4) based on sequence-similarity and template gene coverage (Table S6). The individual levels represented decreasing protein coding reliability, whereupon HC1 genes cover reference proteins with at least 70% coverage, while HC4 cover less than 30% (6). For all further analysis we considered HC1-3 wheat genes only, as HC4 genes are more likely to be (deteriorated) gene fragments or pseudogenes and have been shown to be less expressed than HC1-3 genes across multiple organs (6). This was also confirmed by using our RNA-seq data during endosperm (Figure S3). Gene ontology analysis - Gene ontology (GO (10)) terms were taken from the IWGSC bread wheat annotation (6). To test for over and under representation of GO terms we utilized hypergeometric tests as implemented in the R/Bioconductor package GOstats (44). Terms with a p-value below 0.05, computed by taking the structure of the GO graph into account (conditional=TRUE), were considered significantly enriched. Identification of homeologous triplets - The orthologous gene pairs between two genomes were defined using the best-hits approach (57). Bread wheat genes of the A-, B- and D-genome were compared against each other by using BLASTP (e-value cutoff 1e-5) considering only alignments with minimum 90% sequence similarity. We identified a total of 6,576 triplets of homeologous genes. The distribution of homeologous triplets for different pairs of chromosomes and the structural arrangement in the wheat genome is shown in Figure S4. To evaluate the functional representativeness of the selected triplet genes we compared the functional distribution of the triplet genes against the entire wheat gene space. Therefore, we compared the distribution of GOslim molecular function categories, which were derived using the R/Bioconductor GSEAbase package (version 1.24.0) and the goslim_plant.obo file provided by the same package. We then compared this distribution against 1,000 distributions derived by randomly sampling 19,728 genes from the bread wheat gene space (Figure S5). Reproducibility of expression measures - Gene expression levels were estimated for each sample as well as for each biological replicate by using cufflinks (version 2.0.2; parameters –G wheat_annotation.gtf -b wheat_reference.fa, (37)). We then quantified the correlation between pairs of samples using Pearson’s correlation coefficient on log2(FPKM+1) transformed data. On average we observed a correlation of 0.93 between

Page 6: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

5

biological replica from the same condition and 0.88 across rooms (Table S7). Between technical replicates of the sample 20 DPA AL replicate 1/room 1, we observe a mean correlation of 0.95. Principle component analysis indicated an overall good agreement between sample- and room replicates (Figure S6). However, for samples from 20 DPA W (room 2) and 30 DPA ALSE (room 2) we observed an unexpected sample clustering, which was also reflected by the corresponding correlation coefficients. As we could not exclude a potential swap of sample labels we therefore decided not to include these dubious samples in the analysis. Calculation of expression levels and identification of differentially expressed genes - We applied cuffdiff (version 2.0.2; parameters: –N –b wheat-reference.fa, (37)) to calculate FPKM (Fragments Per Kilobase of transcript per Million mapped reads) gene and transcript expression levels of high-confidence bread wheat genes and to identify significantly differentially expressed (DE) genes. Genes with a reported FDR-adjusted p-value below 0.05 were considered to show a statistically significant difference between conditions. For the estimation of expression levels and variation, biological replicates and rooms were combined for each condition. Because analysis of all transcripts with FPKM greater than zero would also include genes that are very close to zero, this method is expected to give an overestimate of gene expression. To remove false positives, we defined a lower limit of 0.02 FPKM based on the mean 10th percentile of the calculated gene expression levels for HC1-3 genes across all tested conditions (Table S8). Validation of tissue-specific gene expression and tissue purity – To assess the purity of the tissues and the quality of the dissection we made use of tissue marker genes. Ltp2 is an aleurone specific gene, which we expected to find in aleurone containing cells at 20 DPA and which is not expressed at 30DPA. Indeed we observed expression for this gene in AL and TC cells at 30 DPA only (Figure S7A). The transfer cell specific marker gene end1 showed a substantial enrichment in the TC sample at 20 DPA (Figure S7B). Figure 5 of the main text illustrates that the aleurone cell samples do not contain an appreciable amount of the highly abundant storage proteins found in the starchy endosperm. Validation of gene expression measurements - We evaluated the accuracy of gene expression measurement by an in silico simulation of a wheat RNA-seq experiment (Figure S8A) using FluxSimulator (version 1.2, (58)). Illumina-like artificial short read pairs (parameters: 101bp read length; 200bp paired-end insertion size; FluxSimulator error model for Illumina sequencing reads; random gene expression levels) were generated based on the transcript structure coordinates of the bread wheat annotation and the IWGSC reference genome assembly. The simulated reads were aligned to the reference genome sequence with the same parameters and tools as the experimental RNA-seq data and further filtered as described above (supplementary text sections 0 and 0). Ninety four percent (16.5mio) out of 17.5mio simulated RNA-seq read pairs were aligned back to the bread wheat genome assembly. The read filtering step removed a comparable number of simulated reads as compared to the real expression data. After filtering, more than 99% of RNA-seq reads were aligned correctly to the contig of origin in the wheat genome reference assembly (Figure S8B). To compare the simulated gene expression levels with estimated ones, cufflinks (version 2.0.2, (37)) was used to

Page 7: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

6

calculate FPKM values based on the simulated filtered RNA-seq alignments. Polynomial regression fit of the simulated versus estimated FPKM values was carried out using the loess.smooth function implemented in the R environment (span=0.2), both on the complete gene set and genes of the A, B, and D genomes that form strictly homeologous triplets (see supplementary text section 0). In general we observe good agreement between simulated expression levels and the observed measurements (Figure S8C), however, cuffdiff underestimates gene expression for low abundance genes. Hence, our FPKM measurement pipeline does not appear to introduce genome-specific bias, which is essential for a reliable analysis of differences in gene expression of homeologous genes. Hierarchical clustering of samples - Samples were subjected to hierarchical clustering using the correlation distance (1-Pearson’s correlation coefficient) for log2(FPKM+1) transformed expression values and average linkage. To assess the significance of the clustering we made use of the pvclust package (59) in R with bootstrap resampling 1,000 times, which provides the approximated unbiased (au) and the bootstrap probability (bp) p-value. Whereas the bp value is determined by normal bootstrap resampling, the au value is calculated by multiscale bootstrap resampling and constitutes an improved significance estimate (59). Preferentially expressed genes - Preferentially expressed genes (PEG) are genes that tend to be expressed at a higher level at certain developmental stages or in certain tissues compared to other developmental stages or tissues. To select these genes we applied two criteria: - Selection based on comparing the 95% confidence interval (CI) of gene expression

strength (measured in FPKM) between two sample groups as provided by Cufflinks. The lowest 95% CI of group 1 had to be larger than the highest 95% CI of group 2.

- Differential expression analysis between group 1 and group 2 using cuffdiff. Significant differences were defined as FDR adjusted p-values below 0.05.

Group definitions are given in Table S9.

Page 8: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

7

3: Identification of spatiotemporal co-expression clusters k-means co-expression clustering - To identify genes that exhibit similar gene expression across the different spatiotemporal samples, gene expression data was clustered using the k-means clustering algorithm as implemented in R. FPKM expression values of wheat genes expressed under at least one condition were log2-transformed (log2(FPKM+1)) and combined in a single matrix. To determine the appropriate group size k, we repeated the k-means clustering seven times with k ranging between 6 and 13 (maximum 200 iterations [iter.max=200]). For each iteration we calculated the silhouette coefficient (60) according to the given clustering using the silhouette function in the R package cluster. Large silhouette values (almost 1) indicate strong clustering, small values (around 0) indicates that data points falls between two clusters, while negative silhouette values is associated with uncertain clustering of observations. We determined the mean silhouette coefficient over all clusters discarding poorly-defined clusters with negative silhouette coefficient (Figure S9A). Based on the maximum silhouette value, we selected an initial cluster number of k=10 as most appropriate for the k-means clustering. According to the silhouette coefficient, seven groups showed well-defined cluster assignments and were used in further analysis, whereas three groups with negative silhouette coefficients, representing poorly-defined co-expression clusters, were discarded (Figure S9B). As shown in Figure S9 the detected co-expression clusters with positive average silhouette coefficient show distinct gene expression patterns that characterize the spatiotemporal endosperm samples. On the other hand, cluster 0, which represents the poorly-defined k-means clusters, shows similar expression levels over all samples. Similar numbers of genes from the A, B, and D genomes were clustered within each co-expression module. Moreover, we observed comparable gene expression levels between genomes indicating an overall balanced contribution to the entire protein levels. Additionally, to link co-expression with a functional assignment, we tested whether clusters were enriched for PEGs using the Pearson’s chi-squared test. A cluster was annotated as enriched for PEGs if the Bonferroni adjusted p-value was smaller than 0.05 (Figure S11). We also find that the contribution of the genomes is about equal within each cluster. GO enrichments were performed for each cluster. The resulting overrepresented biological processes and molecular functions are summarized in Table S2.

Analysis of expression transitions of homeologs between co-expression clusters - To investigate the spatiotemporal contribution of homeologs to the endosperm transcriptome, we analyzed the assignment of homeologous single-copy genes (triplets) to the identified co-expression clusters (Figure S12). In the case of 637 triplets no homeolog was assigned to any co-expression cluster. One or two homeologs were assigned to a cluster in 438 and 589 triplets, respectively. All three homeologs were assigned to co-expression clusters for 4,912 triplets. We then tallied the number of expression transitions (i.e. different cluster assignments) for pairs of homeologous genes. Numbers for transitions between A and B, A and D and B and D, respectively are given in Table S10 - Table S12. For analyzing cluster transitions, we used the sum of transitions (A<->B, A<->D, B<->D) between clusters. Significance was determined using a one-sided Fisher's exact test for transitions

Page 9: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

8

from one cluster to another. Within-cluster transitions and transitions with cluster 0 were not included in the analysis. Significant transitions had Bonferroni adjusted p-values below 0.05. Tallied transitions and the corresponding p-values are given in Table S13.

Additionally, we also inspected the transition behavior for 3,746 homeologous gene pairs. These are pairs of genes, for which we could find homeologs in only two out of three genomes (AB, BD, AD). Homeologous pairs were extracted from an OrthoMCL clustering, which is described in section 7. We found that for 1,840 genes both copies are being expressed, with 867 (47%) being placed into the same k-means co-expression cluster (Table S14). We then calculated the number of between-cluster transitions (Table S15). Again we found transitions between similar clusters to be more likely (Table S16) than between clusters of highly divergent expression, although due to the low number of transitions the results are less clear than those for the homeologous triplets.

Page 10: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

9

4: Analysis of homeologous gene expression Co-expression network inference and extraction of modules - To study difference in the expression patterns between homeologous genes, we analyzed expression variation among strict single-copy homeologous triplets (Figure S13). Triplet expression vectors were created by concatenating the observed gene expression (log2(FPKM+1)) values for the A, B and D homeolog and combined in a triplet expression matrix. We then inferred a weighted undirected co-expression network using the WGCNA method (42) with a soft thresholding power of 12 (Figure S13A). Next, groups of closely connected genes, so called modules, were identified by clustering genes based on the topological overlap matrix (61) and cutting the resulting dendrogram with the cutreeDynamic method (43) (parameters: deepSplit=2, pamRespectsDendro=FALSE, minModuleSize=50). Non-module genes were summarized by an artificial “grey” module (42). Initial modules whose expression profiles were very similar (eigengene correlation ≥0.75) were merged (Figure S13B,C). Gene expression distributions are visualized in Figure S14 and corresponding modules sizes are listed in Table S4. For visualization in Fig. 3A the weighted network was exported with an adjacency threshold of 0.1 and nodes were arranged with the “edge-weighted force directed layout algorithm” implemented in Cytoscape (62).

For each co-expression module, we determined significant overrepresented biological process and molecular functions (Table S3). Moreover, we compared the distribution of the significantly enriched biological processes for modules dominated by the A, B or D genome (Fig. 3). Therefore, we computed the two-dimensional semantic distribution of biological process GO terms, which were significantly over-represented in any transcriptional group, by using the REVIGO webserver (16). Separately for each genome, we then colored those terms green, purple or orange, which are significantly overrepresented in transcriptional groups dominated by the A, B or D genome, respectively.

Analysis of autonomous genome expression regulation - The triplet expression matrix was subjected to hierarchical clustering using the hclust function implemented in R with correlation distance and clustering method “average”. To estimate the uncertainty in the hierarchical clustering for the genome-specific samples (Figure S15) we additionally applied the pvclust command of the R package pvclust (59) with the same clustering parameters and applied bootstrapping (1,000 bootstrap replications). The heat map shown in Figure S15B was created using the heatmap.2 command from the R package gplots. Additionally, we conducted principal component analysis (PCA) by using the command prcomp in R (parameter: scale=TRUE) (Figure S16). Differentially expressed homeologous genes - To identify significantly differentially expressed (DE) homeologous genes, i.e. homeologs that are higher expressed in one or more tissues, pairwise log2 fold changes were calculated between all pairwise comparisons of homeologs for each sampled condition. To determine significance thresholds for DE homeologs, we randomly resampled FKPM values (1,000 iterations) for each genome and determined the 5th and 95th percentiles of log2 fold changes between

Page 11: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

10

pairs of homeologs, which corresponds to a one-sided significance test with significance threshold of 0.05. For each endosperm sample the number of DE homeologs that were determined is shown in Table S17. Genome dominance of co-expression modules – Besides comparison of gene expression levels, the genome dominance of each module was assessed by using an enrichment test for significantly high numbers of DE homeologous genes applying a one-sided Fisher’s exact test (Bonferroni corrected, p-value <0.05) (Table S4). The network nodes were colored with using a weighted mean based on the genome-specific average expression across all samples to visualize genome dominance in Fig. 3A. Assessment of cell type and stage-specificity for network modules and gene centrality – To assign cell type and stage specificity to the network modules the module eigengenes were correlated to cell type profile vectors utilizing the corPvalueStudent provided by the R package WGCNA (42). Assignment was then based on positive and significant correlations (Pearson’s correlation coefficient; Student’s t-test; p-value <0.05) and evaluation of module-wise expression patterns (Table S4). Analysis of network centrality and hub genes - To quantify gene centrality we made use of the graph strength, which is a weighted form of the degree centrality, as implemented in the igraph package (63). Genes within the top 10th percentile of this centrality measure were selected as hub genes. The number of hub genes per module is given in Table S4. Module enrichment for hub genes, has been assessed using one sided Fisher’s exact tests (Bonferroni corrected p-value <0.05). Comparison of transcription-based and sequence-based differences of homeologous genes – To test for indications of associations between transcriptional differences of homeologous genes and protein-coding sequence evolution, we performed a sequence divergence analysis based on the number of nonsynonymous per nonsynonymous site (KA) and synonymous substitutions per synonymous site (KS) between homeologous gene pairs. While the KA is a proxy for divergence in protein sequences, the Ks allows investigating evolutionary relationship and distances between homeologs. We used pairwise BLASP analysis to determine protein alignments between the above identified homeologous wheat genes of the A, B, and D genome for each triplet. The best scoring alignments between gene pairs were filtered and KA, KS, and KA/KS values were estimated using the yn00 module of the PAML 4 suite (46). Then, we compared distribution of transcription-based differences and sequence-based differences of homeologous genes (Figure S17, Figure S18). Expression divergences were determined based on the Pearson’s correlation of log2(FPKM+1)-transformed expression values and log2-fold changes of mean expression across all tissues. Significant differences between distributions were computed using Wilcoxon-Mann-Whitney-Test in R.

Page 12: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

11

5: Distribution of gene expression along Triticeae prototype (Tp) chromosomes

Creation of Triticeae prototype chromosome scaffolds - High conservation of synteny between different grass genomes and the availability of high-quality reference grass genome sequences facilitate the approximation of the linear gene order (23, 50, 64). To position the wheat genes of the A, B, and D genomes in a common sequential ordering, we generated seven chromosome scaffolds, named the Triticeae prototype (Tp) chromosomes, by integration of syntenic genes of Brachypodium (Brachypodium distachyon), rice (Oryza sativa) and sorghum (Sorghum bicolor) (Figure S19). The bread wheat genes were anchored along these scaffolds, which allow a direct comparative analysis of chromosomal characteristics between the three genomes discounting genome-specific rearrangements. Because of the close evolutionary relationship of barley and wheat (65) resulting in large conservation of genome structure, we utilized the ordering of more than 21,000 barley genes (50) as a proxy to identify syntenic blocks in Brachypodium, rice and sorghum, which also reflect the ancestral genomic organization of the three wheat genomes (Figure S 20). We extracted a total of 21,956 Brachypodium, 22,916 rice and 20,738 sorghum genes that are located in syntenic blocks, which are unambiguously assigned to one barley chromosome and assigned these to the corresponding Tp chromosomes (Table S18). The assigned genes were arranged based on their ordering on the reference genomes by applying the “principle of closest evolutionary” distance, starting with Brachypodium and successively using rice and sorghum according to the following rules (Figure S19): 1. Arrange all syntenic Brachypodium genes according to their ordering in the

Brachypodium genome for each block. Then, all blocks are concatenated as deduced by the comparison to the barley genome.

2. Best bidirectional blasts (BBH) between all rice and Brachypodium genes were determined and all syntenic rice genes anchored to their Brachypodium ortholog in the scaffold. Remaining syntenic rice genes (no BBH) were assigned to the anchored rice genes based on the minimal genomic distance.

3. These genes were anchored along the Tp scaffolds ordered by genomic position relative to that of the anchored genes.

4. Step two and three are repeated for syntenic sorghum genes (first considering BBH to Brachypodium, then to rice).

Overall, we identified between 4,133 (Tp chromosome 6) and 6,169 (Tp chromosome 2) per Triticeae prototype chromosome of which approximately one third are supported by all three reference genomes (Table S18 and Figure S21). Ordering bread wheat genes along the Triticeae prototype scaffolds - We arranged the bread wheat genes along the seven Triticeae prototype chromosomes based on the prototype scaffolds generated by the Tp-ordered Brachypodium, rice and sorghum genes. Therefore, wheat genes from each genome were compared separately against the entire gene sets for Brachypodium, rice and sorghum proteins by using BLASTP considering only reported first-best alignment. We required a minimum alignment similarity of 65% and 30aa minimum alignment length. According to the nearest evolutionary distance, wheat genes were anchored to the matched Brachypodium, rice and sorghum gene, which were previously integrated in the prototype scaffolds.

Page 13: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

12

As shown in Figure S22 we anchored a total of 57,903 bread wheat genes to the seven Tp chromosomes. Generally, genes of each wheat chromosome were assigned to the corresponding Triticeae prototype chromosome (Figure S23), but we also observe previously known structural re-arrangements in the wheat genomes (e.g. reciprocal translocation between 4AL/5AL and translocation 7BS/4AL) (66-68). Depending on the number of anchored genes, the total number of wheat genes varies between windows. However, we do not observe any large bias in the number of anchored genes per window between genomes. Small local regions with an extraordinary number of anchored genes are caused by Brachypodium, rice and sorghum genes that are classified as transposable elements and, thus, lead to an increased number of anchored bread wheat genes attracted from all chromosomes. Additionally, we observe a novel deletion on the short arm of chromosome 6D (see also supplementary text section 6).

Overall, we found high agreement in the positional ordering of genes for each prototype chromosome compared to the gene ordering in bread wheat, which was generated based on the GenomeZipper approach by the IWGSC (Figure S24) (6, 64). More than 60% of the anchored genes were positioned by both approaches in almost complete co-linearity. However, small-scale, genome-specific interruptions of co-linearity, intra-chromosomal inversions (e.g. chromosome 4A) and translocations (e.g. 7BS/4AL) underpin the necessarily to project the wheat genes onto the Tp chromosomes as a common ordering for further comparative analysis between the A, B, and D genomes.

Measuring gene expression along the Triticeae prototype chromosomes - To monitor gene expression along the Triticeae prototype chromosome we implemented a sliding window approach by using customized perl scripts. We identified all bread wheat genes anchored to Tp loci in a window including 50 loci. For these genes the median gene expression was calculated and visualized. We applied a window shift size of 10 Tp loci. For each endosperm sample the median gene expression distribution along each chromosome are shown in Figure S25. Furthermore, we also counted the number of differentially expressed homeologs (see supplementary text section 4: Analysis of homeologous gene expression) and tested each window for enrichment of DE homeologs by using a one-sided Fisher's exact test.

Page 14: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

13

6: Analysis of wheat grain quality genes

OrthoMCL clustering of bread wheat with Arabidopsis and Aegilops tauschii - OrthoMCL software version 2.0 (39) was used to define gene families containing wheat grain quality genes clusters in bread wheat genomes, Arabidopsis thaliana and Aegilops tauschii. In the first step, pairwise sequence similarities between all input protein sequences were calculated using BLASTP with an e-value cut-off of 1e-05. Then, protein data sets for the A-, B-, and D-genomes were considered as distinct genomes in OrthoMCL. Markov clustering of the resulting similarity matrix was applied to define the ortholog cluster structure, using an inflation value (-I) of 1.5 (OrthoMCL default). Identification of grain quality genes in the bread wheat gene annotation - To identify genes that contribute to the unique baking quality of bread wheat, we used corresponding orthologous genes from Aegilops tauschii (41) and publicly available wheat sequence information deposited in the NCBI sequence database. The wheat orthologs were detected by manual BLAST (45) and GenomeThreader (51) searches and by their membership in corresponding OrthoMCL gene families. Based on these evidences and alignment information of RNA-seq reads generated within this study, some wheat loci were manually curated and the refined transcript and protein sequences used for further analysis. According to the corresponding database entries and respective publications, names and descriptions of grain quality genes were assigned to the identified proteins (Table S19 - Table S21). Transcripts related to omega-gliandins were detected using the public available sequences AF280605 and AB181300 as previously described (34) and corresponding GenomeThreader (51) searches against contigs from homeologous chromosome 1. Reconstruction of phylogenic relationships - In order to investigate the phylogenetic relationship of genes of each gene family, we applied Jalview (version 2.8, (69)) to produce multiple protein sequence alignments (CLUSTALW algorithm, (53)) and constructed phylogenetic trees using the neighborhood joining algorithm and the average percent identity method. Adjusted calculation of gene expression for wheat grain quality genes - Due to their repetitive domains some of the identified grain quality genes have only been partially assembled on contigs of the IWGSC reference genome assembly and are not included in the high-confidence wheat gene set. As a consequence some grain quality genes had to be manually curated (Table S19-Table S21) and, thus, were not considered during the genome-wide calculation of gene expression levels by using cuffdiff (supplementary section 0). Using the manually curated gene structures we estimated mRNA abundance for the grain quality genes. Therefore, we manually determined gene expression levels for the corresponding gene loci based on the number of mapped RNA-seq reads. In a custom python script we applied the functions of the python package HTSeq (http://www-huber.embl.de/users/anders/HTSeq) to count the number of mapped reads for each grain quality locus across all endosperm samples and replicates and calculated mRNA abundances in RPKM (Reads per kilobase exon model per million mapped reads) (52).

Page 15: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

14

For each endosperm sample we determined the final gene expression level as mean of all replica RPKM values. The corresponding values and the relative gene-wise contribution to the overall expression of the gene family are listed in Table S25 - Table S31. Comparison of gene order between Aegilops tauschii and bread wheat - To compare the gene order between Ae. tauschii and bread wheat, first, we identified the putative orthologous genes partners in Ae. tauschii for each of the three wheat genomes A, B, and D, respectively (41). Therefore, protein sequences of annotated bread wheat genes were compared against Ae. tauschii proteins by using BLASTP with 30 amino acids minimum alignment length and 90% minimum alignment identity considering only the first-best blast alignment. Using a custom R script, matches were visualized along the chromosomes by using the published Ae. tauschii gene order (41) and the respective genome zipper ordering of bread wheat chromosomes (6).

Page 16: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

15

Supplementary Figures

Figure S1. Statistics of RNA-seq read mapping to the bread wheat reference genome assembly. (A) Number of RNA-seq read pairs of which both reads (dark blue), one read (light blue) or no read (grey) aligned against the IWGSC bread wheat reference genome assembly for each RNA-seq sample. (B) Fraction/Percentage of unique mapped reads (exact one mapping location; dark blue) and multiple mapped reads (multiple mapping locations with identical alignment score; light blue).

Page 17: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

16

Figure S2. Classification of RNA-seq read pair mappings to nine alignment scenarios for stringent reads filtering. Alignment of RNA-seq read pairs were categorized into nine groups and filtered in order to reduce impact of spurious mapping of transcriptome sequences on the gene expression estimation. Contigs of the genome assembly are visualized by bold lines and the coloring depicts chromosome arm/genome assignment. Reads are visualized as arrows and read-pairs connected by thin lines. The histogram shows the number of read pairs assigned to the corresponding alignment scenario. The pie chart shows the overall number of read pairs which were accepted and discarded for further analysis.

Page 18: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

17

Figure S3. Number of expressed high-confidence (HC) wheat genes during endosperm development. Genes were classified into confidence levels (HC1-4) with decreasing protein coding reliability based on sequence-homology and coverage of reference plant proteins. (6)

Page 19: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

18

Figure S4. Number of homeologous triplets across chromosome arms and structural comparision of positional location of homeologous across the wheat genome. (A) The bar charts count the number of homeologous triplets observed for the most frequent combination of different chromosome arms. (B) Number of homeologous triplets that were anchored in the wheat GenomeZipper of the A genome, B genome and/or D genome provided by the IWGSC (6). (C) Pairwise comparision of ordering of homeologous genes between the A, B, and D genomes along the GenomeZipper ordering for individual chromosome arms.

Page 20: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

19

Figure S5. Distribution of homeologous triplet genes GOslim molecular function categories (red dots) as compared to randomly sampled genes from the wheat gene space (boxplots).

Page 21: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

20

Figure S6. Principle component analysis of individual samples

Page 22: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

21

Figure S7. Expression of selected tissue marker genes. A) ltp2 is specific for aleurone containing cells at 20 DPA. B) end1 is enriched in transfer cells.

Page 23: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

22

Figure S8. Simulation of an RNA-seq experiment to validate homeologous gene expression measures - (A) Workflow of an RNA-seq simulation experiment implemented to validate the homeologous gene expression measures. (B) Fraction of aligned read pairs which are accepted and discarded, respectively, after the filtering step. (C) Comparison of simulated and measured gene expression levels. Dots show single measurements and lines represent a polynomial fit of the data points. Red solid line visualizes the fit of all measurement, whereas dashed lines show the fit only considering genes that form a homeologous triplet separately for the A (green), B (purple), and D (orange) genome, respectively.

Page 24: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

23

Figure S9. Selection of the most appropriate cluster number and silhouette plot for final k-means co-expression clustering (k=10). (A) In order to select the most appropriate cluster number, k-means clustering was repeated for different initial cluster numbers (k) and the mean silhouette coefficient of all well-defined cluster calculated. (B) Maximum silhouette value was achieved for k-means clustering with k=10. For each cluster the mean silhouette coefficient is shown to the right. Seven resulting clusters show positive silhouette coefficients (green) and were used in our analysis, whereas three clusters with negative silhouette (red) were summarized in cluster 0 and not used.

Page 25: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

24

Figure S10. Gene expression profiles for the identified k-means co-expression clusters. For each identified k-means cluster the gene expression profiles are visualized considering all clustered genes as well as considering only clustered genes of the A genome, B genome, and D genome, respectively (from left to right). The number of clustered genes is shown in brackets.

Page 26: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

25

Resuming previous figure.

Page 27: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

26

Figure S11. Enrichment for PEGs and genome distribution within co-expression clusters. For every cluster we tested for enrichment of PEGs. Asterisks over bars indicate significance (Pearson’s chi-squared test; Bonferroni adjusted p-value <0.05). The lower panel illustrates the genome distribution of genes within clusters (A green, B purple, D orange bars).

Page 28: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

27

Figure S12. Diverged co-expression cluster assignments for homeologous gene triplets. The distribution of co-expression cluster assignment was analysed for a total of 6,576 homeologous gene triplets, which consist of exactly one homeolog in the A genome, B genome, and D genome, respectively. (A) Observed frequency distribution of homeolog gene copies, which retain gene expression, are partially silenced (one or two homeologs expressed) or completely absent in the transcriptome data set during endosperm development (no homeolog detected) compared to a random simulation assuming complete independence between homeologs. (B) All possible clustering scenarios of a triplet are depicted by the illustrations along the x-axis. Genes, which were used in the clustering and placed into any co-expression cluster are visualized by filled circles, whereas empty circles visualize non-expressed genes. Grey backgrounds illustrate cluster assignment, whereupon homeologs that were placed in the same co-expression cluster are surrounded by a common background. The histogram counts the total number of homeologous triplets that were observed for each clustering scenario.

Page 29: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

28

Figure S13. Construction of a co-expression network for homeologous gene triplets. (A) A soft power threshold was selected based on the criterion of approximate scale-free topology by testing different candidate thresholds ranging from 2 to 20. (B) Modules of co-expressed genes were inferred network and highly similar initial modules were merged based on the module eigengene correlation analysis. (C) Clustering dendrogram of homeologous gene triplets together with the assigned module colors.

Page 30: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

29

Figure S14. Gene expression profiles for the identified co-expressed modules in the homeologous co-expression network. Coloring of the boxes indicate genomes (A: green, B: purple, D: orange). Numbers in brackets represent module sizes (i.e. number of triplets). The “grey” module (unspecific cluster assignments) is not shown.

Page 31: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

30

Figure S15. Hierarchical clustering analysis of the triplet expression matrix. (A) The triplet expression matrix was subject to hierarchical clustering analysis according to samples. Green and red numbers show significance of hierarchical clustering determined via normal bootstrapping (bootstrap probability, bp) and multiscale bootstrapping resampling (approximated unbiased p-value, au), respectively. (B) Heat map visualizes gene expression levels of the two-dimensionally clustered gene expression matrix.

Page 32: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

31

Figure S16. First and second principal component of the triplet expression matrix.

Page 33: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

32

Figure S17. Distribution of gene expression correlation, gene level dominance and sequence divergence in pairwise comparisons between genome. Boxplots show the (A) correlation in gene expression measured by Pearson’s correlation coefficient of expression values, (B) mean log2 fold changes over all endosperm samples, (C) evolutionary distances measured by the number of synonymous substitutions per synonymous site (Ks) and (D) protein divergence measured by the number of non-synonymous substitutions per non-synonymous site (Ka) for all pairwise comparisons of homeologous genes. Significant differences between distributions are marked by red stars (Wilcoxon-Mann-Whitney-Test, p-value<0.01).

Page 34: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

33

Figure S18. For each co-expression group, boxplots visualize the evolutionary distances measured as the rate of synonymous substitutions per synonymous site (Ks) in pairwise comparisons between homeologs of the A, B, and D genome, respectively. Letters in brackets indicates genome dominance in gene expression (A: A genome, B: B genome, D: D genome, N: similar expression in all genomes).

Page 35: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

34

Figure S19. Workflow for generation of Triticeae prototype chromosomes.

Page 36: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

35

Figure S 20. Identification of syntenic blocks in the reference species used for generation of the Triticeae prototype chromosomes. For the generation of the Triticeae prototype chromsomes, syntenic blocks in the reference genomes of (A) Brachypodium, (B) rice and (C) sorghum were identified by using the linear gene order of barley as proxy.

Page 37: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

36

Figure S21. Number of loci shared by Brachypodium, rice and sorghum genes for each Triticeae prototype chromosome. Each Venn diagram shows the number of loci that are supported by genes of Brachypodium, rice and sorghum for each Triticeae prototype chromosome.

Page 38: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

37

Figure S22. Assignment of wheat genes to the Triticeae prototype chromosomes. (A) Histogram visualizing the number of Tp loci at which any wheat gene of the three genomes (HC1-3) was anchored. (B) Venn diagram depicts the overlap of anchoring between the three wheat genomes (number of Tp loci). (C) Number of wheat genes anchored to any Tp chromosome.

Page 39: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

38

Figure S23. Number of bread wheat genes (HC1-3) anchored per window (sliding window with window size 50 Tp loci; window shift 10 Tp loci) for each Triticeae prototype chromosome.

Page 40: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

39

Figure S24. Structural comparison between the positional ordering of bread wheat genes (HC1-3) along the Triticeae prototype chromosomes and the wheat GenomeZippers (6). Separately for the (A) A genome, (B) B genome and (C) D genome, the dotplot visualize the position of bread wheat genes in the seven Triticeae prototype chromosomes and in the bread wheat genome. The Venn diagrams count the number of wheat genes that were anchored by one or both approach.

Page 41: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

40

(A) 10 DPA W

Figure S25. Distribution of gene expression levels along Triticeae prototype chromosomes measured by using a sliding window algorithm (window size 50 Tp loci; window shift 10 Tp loci). The top three panels count the number of significant differentially expressed (DE) homeologous triplets between the A and B genome, A and D genome and B and D genomes respectively (FDR <0.1). Dots in the following three panels mark windows that are significantly enriched for DE homeologous triplets (p <0.05). The heat maps show the pairwise log2-fold change of median gene expression of two windows, whereupon increased color intensity mark higher fold change towards one genome. The last panel show the median gene expression (FPKM) for each window. In all panels the A genome is colored green, the B genome purple and the D genome orange. Figure continues on the next three pages.

Page 42: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

41

(B) 20 DPA AL

(C) 20 DPA SE

Page 43: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

42

(D) 20 DPA TC

(E) 20 DPA W

Page 44: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

43

(F) 30 DPA ALSE

(G) 30 DPA SE

Page 45: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

Submitted Manuscript: Confidential 27 December 2011

44

Supplementary Tables Table S1. Gene Ontology enrichment results for preferentially expressed genes. (This table is available as Excel file on Science Online) Table S2. Gene Ontology enrichment results for k-means co-expression clusters. (This table is available as Excel file on Science Online) Table S3. Gene Ontology enrichment results for homeologous triplet clustering. (This table is available as Excel file on Science Online) Table S4. Triplet co-expression network module characteristics. (This table is available as Excel file on Science Online)

Page 46: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

45

Table S5. Overview of the wheat endosperm RNA sequencing data set. The table summarizes the data statistics of the endosperm transcriptome sequences. Replicate numbering indicates biological replicates of one sample and stars mark additionally sequenced technical replicates.

Sample Room Repl. Read pairs Reads Sequence (bp) ∑ Read pairs ∑ Sequence

(Gbp)

10 DPA W 1 1 20,361,333 40,722,666 4,112,989,266

110,801,679

10 DPA W 1 2 26,791,465 53,582,930 5,411,875,930

10 DPA W 2 1 30,235,123 60,470,246 6,107,494,846

10 DPA W 2 2 33,413,758 66,827,516 6,749,579,116 22,38

20 DPA AL 1 1 32,919,785 65,839,570 6,649,796,570

122,872,666

20 DPA AL 1 2 30,833,988 61,667,976 6,228,465,576

20 DPA AL 2 1 27,753,881 55,507,762 5,606,283,962

20 DPA AL 2 2 31,365,012 62,730,024 6,335,732,424 24,82

20 DPA W 1 1 34,617,242 69,234,484 6,992,682,884

125,395,827

20 DPA W 1 2 30,517,594 61,035,188 6,164,553,988

20 DPA W 2 1 28,011,277 56,022,554 5,658,277,954

20 DPA W 2 2 32,249,714 64,499,428 6,514,442,228 25,33

20 DPA SE 1 1 30,009,734 60,019,468 6,061,966,268

113,991,030

20 DPA SE 1 2 29,714,230 59,428,460 6,002,274,460

20 DPA SE 2 1 26,664,432 53,328,864 5,386,215,264

20 DPA SE 2 2 27,602,634 55,205,268 5,575,732,068 23,03

20 DPA TC 1 1 18,586,985 37,173,970 3,754,570,970

109,262,673

20 DPA TC 1 2 31,121,623 62,243,246 6,286,567,846

20 DPA TC 2 1 29,885,904 59,771,808 6,036,952,608

20 DPA TC 2 2 29,668,161 59,336,322 5,992,968,522 22,07

30 DPA ALSE 1 1 31,433,795 62,867,590 6,349,626,590

112,792,117

30 DPA ALSE 1 2 22,422,406 44,844,812 4,529,326,012

30 DPA ALSE 2 1 29,554,700 59,109,400 5,970,049,400

30 DPA ALSE 2 2 29,381,216 58,762,432 5,935,005,632 22,78

30 DPA SE 1 1 23,711,650 47,423,300 4,789,753,300

113,533,572

20 DPA SE 1 2 27,182,660 54,365,320 5,490,897,320

20 DPA SE 2 1 37,524,396 75,048,792 7,579,927,992

20 DPA SE 2 2 25,114,866 50,229,732 5,073,202,932 22,93

∑ 808,649,546 163,35

20 DPA AL 1 1* 32,374,902 64,749,804 6,539,730,204

20 DPA AL 1 1* 32,685,090 65,370,180 6,602,388,180 65,059,992 13,14

Page 47: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

46

Table S6. Overview of the gene, transcript, and exon numbers for IWGSC reference gene annotation (6) and the refined gene annotation incorporating the wheat endosperm transcriptome RNA-seq data. Predicted bread wheat genes are grouped into four confidence classes (HC1 – 4) based on sequence similarity and alignment coverage to wheat fl-cDNAs and proteins of Brachypodium, rice, sorghum and barley. Confidence class HC1 includes the most reliable (≥70% reference protein coverage) and HC2 (50% ≤ reference protein coverage <70%) and HC3 (30% ≤ reference protein coverage <50%) intermediate gene predictions. The lowest confidence are assigned to genes of HC4 (<30% reference protein coverage), which includes gene fragments and putative pseudogenes.

IWGSC gene annotation (6) Refined gene annotation

HC1 HC2 HC3 HC4 ∑ HC1-3 HC1 HC2 HC3 HC4 ∑ HC1-3

Gene loci 55,249 14,367 15,475 39,110 85,091 55,254 14,383 15,536 39,429 85,173

Single exon 9,181 (17%)

3,230 (22%)

4,906 (32%)

20,375 (52%)

17,317 (20%)

9,160 (17%)

3,237 (23%)

4,937 (32%)

20,578 (52%)

17,334 (20%)

Multi exon 46,068 (83%)

11,137 (78%)

10,569 (68%)

18,735 (48%)

67,774 (80%)

46,094 (83%)

11,146 (77%)

10,599 (68%)

18,851 (48%)

67,839 (80%)

Alternative spliced

38,059 (69%)

7,916 (55%)

6,465 (42%)

8,728 (22%)

52,440 (62%)

38,413 (70%)

8,016 (56%)

6,513 (42%)

8,664 (22%)

52,942 (62%)

Transcripts 194,624 37,116 31,957 61,450 263,697 206,601 38,472 32,494 62,205 277,567

Distinct exons 538,250 94,864 74,630 117,530 707,744 550,031 96,383 75,273 118,376 721,687

Page 48: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

47

Table S7. Pearson's correlation coefficient between biological replicates and across rooms. Sample Room 1 Room 2 Mean of pairwise comparison of

replicates between rooms

10 DPA W 0.9249 0.9285 0.9110

20 DPA AL 0.9541 0.9263 0.9053

20 DPA W 0.9399 0.9242 0.8717

20 DPA SE 0.9252 0.9125 0.8926

20 DPA TC 0.9367 0.9340 0.9018

30 DPA ALSE 0.9182 0.9229 0.8033

30 DPA SE 0.9163 0.9078 0.8991

Page 49: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

48

Table S8. Expression level statistics for HC1-3 genes. Sample Genes with

FPKM >0 Gene expression level (FPKM) Expressed

genes Mean Media

n 5th 10th 90th 95th

10 DPA W 41,642 (49%) 2.55 ± 44.39 0.17 0.01 0.02 1.92 4.58 37,046 (44%)

20 DPA AL 42,896 (50%) 1.91 ± 29.87 0.17 0.01 0.02 1.88 4.19 37,381 (44%)

20 DPA W 38,040 (45%) 2.29 ± 50.20 0.16 0.02 0.02 1.61 3.62 35,153 (41%)

20 DPA SE 37,293 (44%) 2.92 ± 72.07 0.17 0.02 0.03 1.66 3.93 35,097 (41%)

20 DPA TC 43,435 (51%) 2.18 ± 56.39 0.12 0.01 0.01 1.30 2.92 37,384 (44%)

30 DPA ALSE 37,075 (44%) 1.77 ± 23.84 0.19 0.02 0.03 1.74 3.85 34,588 (41%)

30 DPA SE 38,994 (46%) 2.38 ± 55.58 0.15 0.01 0.02 1.52 3.51 35,736 (42%)

Overall 50,510 (59%) 2.28 ± 47.48 0.16 0.01 0.02 1.66 3.80 46,487 (55%)

Page 50: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

49

Table S9. Number of preferentially expressed genes (PEG) for developmental stages and cell types.

Stage/Tissue PEG Group 1 Group 2 No. of PEGs

10 DPA W 10 DPA W All other samples 314

20 DPA SE 20 DPA SE 20 DPA TC, 20 DPA AL 83

20 DPA AL 20 DPA AL 20 DPA TC, 20 DPA SE 644

20 DPA TC 20 DPA TC 20 DPA AL, 20 DPA SE 136

30 DPA AL 30 DPA ALSE 30 DPA SE 430

30 DPA SE 30 DPA SE 30 DPA ALSE 243

Page 51: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

50

Table S10. Number of A (rows) and B (columns) homeologs that were placed into the same (diagonal) or into different k-means clusters.

- 0 I II III IV V VI VII

- 771 66 58 19 53 41 42 44 56

0 60 1840 187 251 100 85 33 133 87

I 51 150 220 58 34 21 17 22 16

II 29 199 29 77 29 42 11 21 17

III 42 106 43 32 30 18 7 12 28

IV 28 120 17 37 18 78 9 24 12

V 25 31 13 12 21 8 32 12 4

VI 58 177 20 20 16 21 7 162 22

VII 39 96 21 21 21 11 7 27 42

Page 52: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

51

Table S11. Number of A (rows) and D (columns) homeologs that were placed into the same or into different k-means clusters.

- 0 I II III IV V VI VII

- 787 50 58 26 44 39 32 61 53

0 48 1822 169 237 98 107 28 171 96

I 61 154 202 59 41 18 21 18 15

II 29 205 35 83 34 22 11 17 18

III 34 99 37 32 44 14 17 18 23

IV 28 110 25 32 18 84 8 19 19

V 33 29 15 10 9 10 29 6 17

VI 59 168 17 18 16 24 13 152 36

VII 44 89 25 15 14 14 14 29 41

Page 53: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

52

Table S12. Number of B (rows) and D (columns) homeologs that were placed into the same or into different k-means clusters.

- 0 I II III IV V VI VII

- 791 44 53 25 41 26 24 49 50

0 56 1804 173 237 102 105 42 177 89

I 58 166 217 51 40 20 15 19 22

II 23 252 38 97 32 40 15 13 17

III 40 100 31 29 47 17 16 18 24

IV 32 104 15 33 7 77 10 25 22

V 42 20 18 9 17 7 27 6 19

VI 42 143 14 18 16 22 15 155 32

VII 39 93 24 13 16 18 9 29 43

Page 54: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

53

Table S13. Number of aggregated transitions between clusters. Bonferroni adjusted p-values are given in parenthesis. Green cells highlight significant numbers (p<0.05) while grey cells show the unconsidered within cluster transitions (ND: not determined). I II III IV V VI VII

I ND

168 (<0.001)

115 (<0.001) 59 (1.00) 53 (1.00) 59 (1.00) 53 (1.00)

II 102 (0.004) ND 95 (0.003)

104 (<0.001) 37 (1.00) 51 (1.00) 52 (1.00)

III 111 (<0.001) 93 (1.00) ND 49 (1.00) 40 (1.00) 48 (1.00) 75 (0.275)

IV 57 (1.00)

102 (<0.001) 43 (1.00) ND 27 (1.00) 68 (0.005) 53 (1.00)

V 46 (1.00) 31 (1.00) 47 (0.173) 25 (1.00) ND 24 (1.00) 40 (1.00)

VI 51 (1.00) 56 (1.00) 48 (1.00) 67 (0.021) 35 (1.00) ND 90 (<0.001)

VII 70 (0.542) 49 (1.00) 51 (1.00) 43 (1.00) 30 (1.00) 85 (<0.001) ND

Page 55: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

54

Table S14. Summary of expression behavior for homeologous gene pairs.

1A:1B:0D 1A:0B:1D 0A:1B:1D ∑ % Both expressed 605 614 621 1,840

Same cluster 287 275 305 867 47% Different clusters 318 339 316 973 52%

Page 56: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

55

Table S15. Summarized transitions between clusters. Numbers for AB, BD, and AD transitions were tallied in this table.

- 0 I II III IV V VI VII

- 1318 42 39 29 33 36 34 38 43

0 35 538 35 70 31 42 18 38 26

I 47 57 79 17 13 6 15 8 6

II 29 88 21 47 9 19 12 9 6

III 33 25 21 15 19 7 8 5 10

IV 37 34 8 20 5 65 5 6 7

V 38 21 11 9 7 11 32 7 9

VI 26 44 10 7 2 8 4 52 5

VII 49 34 12 9 6 8 11 16 35

Page 57: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

56

Table S16. P-values for enrichment of between cluster transitions. Note, that these are raw p-values. P-values, that contain a p-value smaller than 0.1, after multiple testing correction are marked with ‘*’ (ND: not determined).

I II III IV V VI VII

I ND 0.0719 0.0073 0.9375 0.0143 0.5814 0.7098

II 0.0558 ND 0.3703 0.0045 0.3057 0.6329 0.8481

III 0.0106 0.2310 ND 0.8771 0.6946 0.9419 0.1305

IV 0.8546 0.0002* 0.6223 ND 0.8491 0.6345 0.2759

V 0.5518 0.7240 0.3076 0.1295 ND 0.5217 0.0927

VI 0.1673 0.5309 0.9061 0.1264 0.7422 ND 0.3214

VII 0.6327 0.8681 0.6364 0.7035 0.1864 0.0013* ND

Page 58: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

57

Table S17. Number of up-regulated differentially expressed homeologs for each pairwise comparison of genomes and across all endosperm sample (p-value <0.05). A vs B A vs. D B vs. D

A B A D B D

10 DPA W 89 71 94 92 83 90

20 DPA AL 86 81 94 96 82 98

20 DPA W 119 98 113 111 89 115

20 DPA SE 101 99 119 119 99 108

20 DPA TC 114 92 109 106 103 112

30 DPA ALSE 80 87 75 94 71 101

30 DPA SE 97 95 97 105 99 105

Page 59: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

58

Table S18. Number of Brachypodium, rice and sorghum genes in the seven Triticeae prototype chromosome scaffolds and number of loci per chromosome. Genes (#) Tp1 Tp2 Tp3 Tp4 Tp5 Tp6 Tp7 ∑

Brachypodium 3,108 3,647 3,391 3,003 3,135 2,538 3,134 21,956

rice 3,158 3,806 4,003 3,306 2,991 2,651 3,001 22,916

sorghum 2,746 3,276 3,471 2,242 3,217 2,473 3,313 20,738

∑ of Tp loci 5,210 6,169 5,972 4,907 5,654 4,133 5,563 37,608

Page 60: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

59

Table S19. Low molecular weight glutenin subunit (LMW-Glu). Genome

LMW-Glu(1) queries Bread wheat gene annotation Comment

Subunit Gene IWGSC locus Gene (Transcript) Genome assembly Pseudogene

A A3-4 JF339169 Ta1asLoc014212 LOC_1as_363472.1 3313506_1as no -

A PG JF339156 Ta1asLoc015866 LOC_1as_504315.90 398094_1as yes novel transcript annotated based on RNA-seq alignments supporting single-exon gene structure; disrupted protein sequence by frameshift in frame +1

B B3-1

JF339163 Ta1bsLoc019184 LOC_1bs_504417.90 39874_1bs no novel transcript annotated based on RNA-seq alignments supporting single-exon gene structure

B B3-2 JF339170 Ta1bsLoc018593 LOC_1bs_425313.90 3483987_1bs no novel transcript annotated based on RNA-seq alignments supporting single-exon gene structure

B PG JF339166

Ta1bsLoc010284 LOC_1bs_416501.90 3452147_1bs yes novel transcript annotated based on RNA-seq alignments supporting single-exon gene structure; internal stop codon interrupts predicted protein sequences

B PG JF339164 Ta1bsLoc000742 LOC_1bs_172535.1 1987634_1bs yes internal stop codon interrupts predicted protein sequences

B - - Ta1bsLoc008360 LOC_1bs_412998.1 3442889_1bs no gene locus not identical to any wheat LMW-Glu gene reported in Zhang et al. 2011

D D3-1 JF339165 Ta1dsLoc011735 LOC_1ds_168790.1 1913693_1ds no locus was not used for furthe r analysis, because it is located on duplicated sequences between contigs 1913693_1ds, 1913694_1ds and 1876036_1ds

D

D3-1 JF339165 Ta1dsLoc001713 LOC_1ds_157624.1 1876036_1ds no locus was not used for furthe r analysis, because it is located on duplicated sequences between contigs 1913693_1ds, 1913694_1ds and 1876036_1ds

D D3-3 JF339167 - - - - query not found in bread wheat gene annotation as well as in the reference genome assembly; ortholog to AEGTA27321(2)

D D3-4 JF339155 Ta1dsLoc009245 LOC_1ds_166028.1 1904775_1ds no ortholog to AEGTA28761(2)

D D3-2 JF339160 Ta1dsLoc013431 LOC_1ds_299090.1 294262_1ds no ortholog to AEGTA27322(2)

D D3-6 JF339162 Ta1dsLoc006831 LOC_1ds_163362.90 1895533_1ds no novel transcript annotated based on RNA-seq alignments supporting single-exon gene structure; ortholog to AEGTA05053(2)

D D3-7 JF339158

Ta1dsLoc010852 LOC_1ds_167826.2 1910908_1ds no ortholog to AEGTA26574(2)

D PG JF339157 - - - - query not found in bread wheat gene annotation as well as in the reference genome assembly

D PG JF339168 - - - - query not found in bread wheat gene annotation as well as in the reference genome assembly

1) BLAST serach of LMW-Glu candidate genes identified by Zhang et al. (2011) against wheat gene set as well as reference genome assemblies

2) Putative orthologous genes from Aegilops tauschii (Jia et al. 2013) identified by BLASTP sequence searches and multiple alignments/pyholgenetic trees

Page 61: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

60

Table S20. High molecular weight glutenin subunit (HMW-Glu).

Genome HMW-Glu Queries Bread wheat gene annotation

Comment Ta Glu orthologs(1)

Aegilops tauschii(2)

IWGSC locus Gene (Transcript) Genome assembly

Pseudo-gene

A x: 2e-62|86% y: 0.0|88%

x: - y: -

Ta1alLoc006256 LOC_1al_462374.1 3892569_1al no -

A x: e-127|85% y: 6e-84|88%

x: 0.529 y: 0.822

Ta1alLoc011638 LOC_1al_481131.90 3923345_1al yes novel transcript annotated based on RNA-seq alignments supporting single-exon gene structure; internal stop codon interrupts predicted protein sequences

B x:e-160|84% y: 5e-69|89%

x: - y: -

Ta1blLoc001936 LOC_1bl_431351.1 3794722_1bl no -

B x: 6e-84|87% y: 0.0|99%

x:0.776 y: 0.814

Ta1blLoc018914 LOC_1bl_462530.90 3892878_1bl no novel transcript annotated based on RNA-seq alignments supporting single-exon gene structure

D x: 6e-84|89% y: 0.0|93%

x: 0.663 y: 0.948

Ta1dlLoc010520 LOC_1dl_205398.90 2251676_1dl no novel transcript annotated based on RNA-seq alignments supporting single-exon gene structure

D x: 6e-84|89% y: 0.0|93%

x: 0.663 y: 0.948

Ta1dlLoc017934 LOC_1dl_221768.90 2284912_1dl no novel transcript annotated based on RNA-seq alignments supporting single-exon gene structure; locus was not used for furthe r analysis, because it is located on duplicated sequences between contigs 2284912_1dl and 2251676_1dl

D x: e-103|92% y: 1e-51|86%

x: 0.956 y: 0.603

Ta1dlLoc019931 LOC_1dl_225816.90 2289899_1dl no protein sequence of transcript LOC_1dl_225816.6 updated, because in silico prediction selected wrong reading frame

(1) BLASTP search of candidate genes from De Bustos et al (2001) against wheat gene set (e-value and alignment identity are shown) x = GI:14329761 (high molecular weight glutenin subunit x) y = GI:14329763 (high molecular weight glutenin subunit y) (2) GenomeThreader alignments of Aegilops tauschii HMW-Glu genes reported by Jia et al. (2013) against wheat reference genome assembly (minimum 50% query coverage; genome threader alignment score is shown) x = Contig118695 y = Contig97145 (protein sequence manually curated because of wrongly selected open reading frame in Jia et al. (2013); manually curated)

Page 62: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

61

Table S21. Puroindoline (pin), A, B and B-2, grain softness proteins (GSP) and storage protein activators (SPA). Genome Genes Bread wheat gene annotation

Comment

Subunit Query gene(s) IWGSC locus Gene (Transcript) Genome assembly

Pseudogene

D Pin-A

AEGTA26570(1) Ta5dsLoc012937 LOC_5ds_282930.1 no -

D Pin-B AEGTA26569(1) Ta5dsLoc012939 LOC_5ds_282932.1 no -

A Pin-B2 (2v4)

AEGTA13909(1) HM780498.1(2)

Ta7alLoc010232 LOC_7al_599404.90 4486146_7al ? novel transcript annotated based on RNA-seq alignments supporting single-exon gene structure extending 5' sequence; 5' sequence region is truncated due to insertion of two repeats (TIGR_TRSiTERTOOT00060, LTR_HVVMRX83KhA0166P08_ipk23_c2_528)

A Pin-B2 (2v4)

AEGTA13909(1) HM780498.1(2)

Ta7alLoc010233 LOC_7al_599405.90 4486147_7al ? see comment for LOC_7al_599404.90; locus was not used for furthe r analysis, because it is located on duplicated sequences between contigs 4486147_7al and 4486146_7al

B Pin-B2 (2v2)

AEGTA13909(1) GQ496617.1(2)

Ta7blLoc004687 LOC_7bl_812773.1 6632749_7bl no -

B Pin-B2 (2v3)

AEGTA13909(1) GQ496618.1(2)

- - - - query not found in bread wheat gene annotation as well as in the reference genome assembly

D Pin-B2 (2v1)

AEGTA13909(1) GQ496616.1(2)

Ta7dlLoc003299 LOC_7dl_367169.90 33291727_7dl no novel transcript annotated based on RNA-seq alignments supporting single-exon gene structure

D Pin-B2 (2v1)

AEGTA13909(1) GQ496616.1(2)

Ta7dlLoc005356 LOC_7dl_371835.90 3318143_7dl no novel transcript annotated based on RNA-seq alignments supporting single-exon gene structure; locus was not used for furthe r analysis, because it is located on duplicated sequences between contigs 3318143_7dl and 33291727_7dl

A GSP AEGTA26568(1) Ta5asLoc012000 LOC_5as_407535.1 342860_5as no -

B GSP AEGTA26568(1) Ta5bsLoc006218 LOC_5bs_210968.1 2265004_5bs no -

D GSP AEGTA26568(1) Ta5dsLoc010845 LOC_5ds_279057.1 2772740_5ds no -

A SPA AEGTA25550(1)

Y09013(3)

Ta1alLoc012037 LOC_1al_482034.6 3926248_1al no -

B SPA AEGTA25550(1)

Y09013(3)

Ta1blLoc022748 LOC_1bl_473748.2 3915060_1bl no -

B SPA AEGTA25550(1) Y09013(3)

Ta1blLoc018836 LOC_1bl_462299.2 3892408_1bl no locus was not used for furthe r analysis, because it is located on duplicated sequences between contigs 3915060_1bl and 3892408_1bl

B SPA AEGTA25550(1)

Y09013(3)

LOC_1bl_462300 LOC_1bl_462300.2 3892408_1bl no locus was not used for furthe r analysis, because it is located on duplicated sequences between contigs 3915060_1bl and 3892408_1bl

D SPA AEGTA25550(1)

Y09013(3)

Ta1dlLoc011015 LOC_1dl_206309.4 2253910_1dl no -

Page 63: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

62

Table S22. α-gliadines.

ID Query Annotation(3)

Ref.(1) ID Cov. (%)(2) IWGSC locus Gene ID Assembly locus(4)

α-A1 [1] U50984.1 100 Ta6asLoc005162 LOC_6as_553553 m:4354750_6as(1799-2660)

α-A2 [1] U50984.1 46 Ta6asLoc009806 LOC_6as_563968 m:4382162_6as(196-603)

α-A3 [2] Sample1_alpha_contig1 26 Ta6asLoc001794 LOC_6as_544266 m: 4323718_6as(1-252)

α-B1 [1] U50984.1 100 Ta6bsLoc010939 LOC_6bs_306732 m: 2997322_6bs(2741-3589)

α-B2 [1] U50984.1 100 Ta6bsLoc007953 LOC_6bs_303321 m: 2972598_6bs(1887-2715)

α-B3 [1] U50984.1 100 Ta6bsLoc018432 LOC_6bs_315815 m: 3046196_6bs(297-1160)

α-B4 [1] U50984.1 52 Ta6bsLoc019648 LOC_6bs_951525 m: 841447_6bs(1-427)

α-B5 [1] U50984.1 44 Ta6bsLoc009369 LOC_6bs_304963 m: 2984541_6bs(88-486)

α-B6 [1] U50984.1 37 Ta6bsLoc016924 LOC_6bs_314262 m: 3043657_6bs(1-320)

α-B7 [2] Sample1_alpha_contig9 25 Ta6bsLoc018105 LOC_6bs_315479 m: 3045611_6bs(2323-2554)

α-B8 [1] U50984.1 40 Ta6bsLoc006214 LOC_6bs_301302 u: 2957250_6bs(274-894)

α-B9 [1] U50984.1 43 Ta6bsLoc010246 LOC_6bs_305956 u: 2991407_6bs(3-387)

α-B10 [1] U50984.1 69 Ta6bsLoc008026 LOC_6bs_303421 m: 2973254_6bs(153-752)

α-B11 [1] U50984.1 75 Ta6bsLoc012848 LOC_6bs_309094 m: 3014203_6bs(3-635)

α-B12 [1] U50984.1 70 Ta6bsLoc003554 LOC_6bs_298198 m: 2936214_6bs(193-767)

α-B13 [2] Sample1_alpha_contig4 33 Ta6bsLoc000315 LOC_6bs_104325 u: 1321572_6bs(1-298)

α-B14 [1] U50984.1 93 Ta6bsLoc018846 LOC_6bs_316269 m: 3048908_6bs(1041-1933)

α-B15 [2] Sample1_alpha_contig15 31 Ta6bsLoc003188 LOC_6bs_297782 m: 2932966_6bs(597-856)

α-B16 [2] Sample1_alpha_contig15 30 Ta6bsLoc000622 LOC_6bs_150552 u: 1660622_6bs(146-402)

α-B17 [2] Sample1_alpha_contig11 69 Ta6bsLoc013250 LOC_6bs_309637 m: 3017932_6bs(1-647)

α-B18 [1] U50984.1 100 Ta6bsLoc018432 LOC_6bs_315815 m: 3046196_6bs(297-1160)

(1) Reference queries used for GenomeThreader alignments to bread wheat genome assembly [1] Anderson et al, 1996 [2] Zhang et al. 2013 (2) Coverage of query sequence reported by GenomeThreader (3) Overlap to gene loci of the IWGSC and updated gene annotation (4) Locus (contig_chromosome(start-end)) in the repeat-masked (m) and unmasked (u) genome sequence assembly.

Page 64: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

63

Table S23. ω-gliadines.

Query min. Alignment identity(4)

ID QueryRef(1) Genome(2) Coverage(3) 0.98% 0.95% 0.90% 0.80% Assembly

ω-A1 AF280605 A/D 0.825 - - X X 3275229_1as

ω-A2 AF280605 A/D 0.881 - X X X 3281757_1as

ω-A3 AF280605 A/D 0.71 - - - X 3299090_1as

ω-A4 AF280605 A/D 0.865 - - X X 3312479_1as

ω-A5 AF280605 A/D 0.825 - - X X 3275229_1as

ω-A6 AF280605 A/D 0.893 - X X X 3314504_1as

ω-B1 AF280605 A/D 0.702 - - - X 3438818_1bs

ω-B2 AB181300 B 0.792 - - X X 3413201_1bs

ω-B3 AB181300 B 0.745 - - X X 3420099_1bs

ω-B4 AB181300 B 0.761 - - - X 3424359_1bs

ω-B5 AB181300 B 0.697 X X X X 3424360_1bs

ω-B6 AB181300 B 0.667 - - X X 3431184_1bs

ω-B7 AB181300 B 0.608 - - - X 3450071_1bs

ω-B8 AB181300 B 0.851 - - - X 3457881_1bs

ω-D1 AF280605 A/D 0.658 X X X X 1871682_1ds

ω-D2 AF280605 A/D 0.942 X X X X 1914507_1ds

ω-D3 AB181300 B 0.657 - - - X 1895025_1ds (1) Reference queries used for GenomeThreader alignments to bread wheat genome assembly taken from Anderson et al., 2009 (2) Genome assignment of query gene by Anderson et al. 2013 (3) Coverage of query sequence reported by GenomeThreader

(4) minimum BLASTP sequence identity between query genes from Anderson et al. 2013 and contig

Page 65: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

64

Table S24. γ-gliadines. ID

Reference gene min. alignment identity(4) Gene Annotation Comment

Query(1) Ass.(2) Type(3) 98% 95% 90% 80% Gene-ID Assembly γ-A1 JX679678.1

JX679679.1 (g6a, g6b)

A II X X X X LOC_1as_242328 2418887_1as 5' fragment

γ-A2 gamma-14 (pseudogene)

- I - - X X LOC_1as_364348 3314543_1as best match to functional gene: JX679673.1 (94% identity)

γ-A3 JX679673.1 (g1)

- I X X X X LOC_1as_352304 3284622_1as 3' fragment

γ-B1 gamma-13 (pseudogene)

- II - X X X LOC_1bs_403475 3414793_1bs best match to functional gene: JX679682.1 (95% identity)

γ-B2 gamma-13 (pseudogene)

- II X X X X LOC_1bs_417383 3456068_1bs

γ-B3 JX679677.1 (g5)

- II X X X X LOC_1bs_417504 3456729_1bs

γ-B4 JX679677.1 (g5)

- II - X X X LOC_1bs_424383 3482337_1bs

γ-B5 JX679680.1 (g7)

- II X X X X LOC_1bs_103207 1288180_1bs 5' end missing

γ-B6 JX679675.1 (g3)

- I X X X X LOC_1bs_403859 3416197_1bs

γ-B7 JX679681.1 (g9)

- I X X X X LOC_1bs_410447 3438230_1bs 5' fragment of gene

γ-D1 JX679674.1 (g2)

D II X X X X LOC_1ds_157729 1876432_1ds

γ-D2 JX679683.1 (g12)

D II X X X X LOC_1ds_157919 1877171_1ds

γ-D3 JX679682.1 (g11)

D II X X X X LOC_1ds_161972 1890655_1ds

γ-D4 JX679676.1 (g4)

D I X X X X LOC_1ds_159209 1881608_1ds

- JX679680.1 (g7)

- X X X X LOC_1bs_417384 3456068_1bs only very short fragments on contig

- JX679673.1 (g1)

- X X X X LOC_1as_335122 3191141_1as Putative 5' fragment to γ-A3

- JX679677.1 (g5)

- - X X X LOC_1bs_424610 3482688_1bs Putative 3' fragment to γ-B3

- JX679678.1 JX679679.1 (g6a, g6b)

A X X X X LOC_1as_159735 188346_1as Putative 3' fragment to γ-A1

- JX679678.1 JX679679.1 (g6a, g6b)

A X X X X LOC_1as_913249 730462_1as Putative center part of γ-A1

- JX679681.1 (g9)

- X X X X LOC_1bs_421787 3476038_1bs Putative 3' fragment to γ-B7

(1) Reference queries used for GenomeThreader alignments to bread wheat genome assembly taken from Anderson et al. 2013 (2) Genome assignment of query gene by Anderson et al. 2013 (“-“: genome of query unknown) (3) Type of gene (γ-Gli type I or γ-Gli type II) due to best BLASTP alignment to protein sequences from Wang et al. 2012 (4) minimum BLASTP sequence identity between query genes from Anderson et al. 2013 and the gene annotation

Page 66: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

65

Table S25. Adjusted RPKM values for LMW-Glu.

10DPA 20DPA 30DPA

geneId W AL REF SE TC ALSE SE sum %

LOC_1ds_299090 11,933.54 3,379.59 14,324.64 20,122.52 10,688.84 7,122.99 22,643.84 90,215.96 14.73

LOC_1ds_167826 1,289.19 490.51 1,446.14 1,804.64 1,519.49 1,208.75 1,561.37 9,320.10 1.52

LOC_1ds_168791 3,586.20 936.41 3,698.59 4,145.65 4,972.95 2,418.12 4,198.53 23,956.45 3.91

LOC_1bs_504417 5,983.22 1,161.66 5,737.38 6,359.53 7,010.43 3,518.33 5,485.54 35,256.10 5.76

LOC_1bs_172535 582.01 104.71 331.12 442.48 425.21 169.65 353.64 2,408.82 0.39

LOC_1as_363472 1,365.49 714.03 2,559.63 3,017.76 1,995.02 908.76 2,464.71 13,025.39 2.13

LOC_1bs_425313 13,049.48 8,359.90 31,800.26 36,556.14 25,566.09 15,284.38 39,606.24 170,222.50 27.80

LOC_1bs_412998 11,917.18 7,780.44 32,940.36 39,122.58 27,531.97 13,746.32 37,131.99 170,170.85 27.79

LOC_1bs_416501 3,252.85 1,674.83 7,024.22 8,757.96 5,831.68 3,095.36 8,440.60 38,077.49 6.22

LOC_1ds_163362 5,410.35 2,140.77 9,036.99 10,208.85 9,966.13 3,355.50 8,106.18 48,224.77 7.88

LOC_1as_504315 65.53 23.58 58.14 68.60 52.12 28.92 35.30 332.18 0.05

LOC_1ds_166028 2,172.78 397.90 1,818.59 2,227.99 2,002.22 797.04 1,646.59 11,063.12 1.81

Page 67: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

66

Table S26. Adjusted RPKM for HMW-Glu.

10DPA 20DPA 30DPA

geneId W AL REF SE TC ALSE SE sum %

LOC_1al_481131 56.37 27.94 58.51 78.68 55.45 56.23 54.64 387.82 0.32

LOC_1dl_225816 5,096.71 1,859.34 6,076.59 8,373.50 5,321.19 2,559.56 6,282.95 35,569.83 29.20

LOC_1bl_431351 2,906.07 374.82 2,862.94 4,006.84 2,377.68 896.79 3,479.25 16,904.40 13.88

LOC_1dl_205398 6,326.17 2,659.88 6,747.04 10,735.72 7,728.81 3,196.98 7,066.04 44,460.64 36.50

LOC_1al_462374 346.38 162.01 302.35 546.54 330.79 99.11 211.26 1,998.45 1.64

LOC_1bl_462530 3,037.98 1,212.33 2,934.37 5,152.34 3,023.26 2,337.18 4,783.94 22,481.41 18.46

Page 68: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

67

Table S27. Adjusted RPKM for puroindoline.

10DPA 20DPA 30DPA

geneId W AL REF SE TC ALSE SE sum %

LOC_5ds_282932 2,518.02 1,457.32 8,139.10 8,738.59 9,947.38 2,496.52 8,312.93 41,609.85 43.69

LOC_5ds_282930 2,338.70 1,907.62 10,867.31 11,493.33 9,845.03 1,953.57 9,670.80 48,076.37 50.48

LOC_7al_599404 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.00

LOC_7dl_371835 309.83 80.36 165.58 197.28 157.91 23.33 32.12 966.41 1.01

LOC_7bl_812773 231.84 251.24 1,059.04 1,183.63 981.00 236.06 645.18 4,587.98 4.82

Page 69: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

68

Table S28. Adjusted RPKM for SPA.

10DPA 20DPA 30DPA

geneId W AL REF SE TC ALSE SE sum %

LOC_1bl_473748 36.47 8.74 12.92 14.13 11.49 8.02 6.12 97.88 54.90

LOC_1dl_206309 10.72 3.34 4.13 4.55 4.12 3.15 2.92 32.94 18.47

LOC_1al_482034 8.89 2.62 7.52 8.26 8.57 5.23 6.39 47.49 26.63

Page 70: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

69

Table S29. Adjusted RPKM for α-gliadins.

10DPA 20DPA 30DPA

geneId name W AL REF SE TC ALSE SE sum %

gth2_U50984.1_4354750_6as α-A1 1,790.62 755.82 2,666.39 3,149.54 2,233.73 930.33 2,329.93 13,856.37 6.17

gth1_U50984.1_4382162_6as α-A2 2,474.59 3,517.85 10,053.09 11,870.57 8,109.94 6,497.11 14,664.83 57,187.97 25.45

gth313_contig1_4323718_6as α-A3 1,074.06 941.39 2,708.29 3,692.36 2,173.55 1,326.34 3,428.72 15,344.73 6.83

gth12_U50984.1_2997322_6bs α-B1 63.32 31.46 35.38 44.21 20.53 25.74 29.69 250.33 0.11

gth15_U50984.1_2972598_6bs α-B2 120.56 106.86 260.09 315.60 151.61 92.06 213.90 1,260.68 0.56

gth7_U50984.1_3040422_6bs α-B3 97.28 41.38 81.30 126.51 62.38 32.29 68.62 509.75 0.23

gth10_U50984.1_841447_6bs α-B4 2,061.81 186.06 342.75 607.55 393.41 196.37 558.47 4,346.42 1.93

gth11_U50984.1_2984541_6bs α-B5 6,058.60 1,805.83 8,289.71 11,035.18 7,534.39 2,703.84 10,535.41 47,962.96 21.35

gth6_U50984.1_3043657_6bs α-B6 2,777.62 531.09 2,272.54 4,413.67 1,994.06 526.87 2,505.79 15,021.63 6.69

gth561_contig9_3045611_6bs α-B7 0.22 0.09 1.63 1.08 0.63 0.42 1.42 5.50 0.00

gth9_U50984.1_2957250_6bs α-B8 21.11 16.13 24.01 35.73 19.94 14.28 21.10 152.29 0.07

gth13_U50984.1_2991407_6bs α-B9 24.49 6.23 14.03 38.48 21.61 4.08 28.87 137.80 0.06

gth8_U50984.1_2973254_6bs α-B10 150.82 89.73 365.16 428.82 237.50 69.77 278.84 1,620.63 0.72

gth5_U50984.1_3014203_6bs α-B11 69.24 49.98 98.52 163.23 67.52 30.51 84.04 563.03 0.25

gth4_U50984.1_2936214_6bs α-B12 35.68 26.61 55.02 85.93 40.27 21.98 54.08 319.57 0.14

gth367_contig4_1321572_6bs α-B12 639.91 84.69 211.28 250.84 191.68 81.48 224.11 1,683.98 0.75

gth3_U50984.1_3048908_6bs α-B13 0.07 0.10 0.12 0.22 0.12 0.24 0.18 1.07 0.00

gth383_contig15_2932966_6bs α-B14 355.99 349.47 1,614.32 1,457.95 1,027.92 283.47 1,079.24 6,168.37 2.75

gth385_contig15_1660622_6bs α-B15 784.35 239.42 438.53 504.87 396.53 241.61 586.67 3,191.99 1.42

gth408_contig11_3017932_6bs α-B16 2,764.46 2,800.49 11,771.99 12,103.82 9,784.99 3,755.91 12,108.99 55,090.64 24.52

gth14_U50984.1_3046196_6bs α-B17 0.00 0.02 0.02 0.06 0.01 0.00 0.04 0.15 0.00

Page 71: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

70

Table S30. Adjustes RPKM for γ-gliadins.

10DPA 20DPA 30DPA

geneId name W AL REF SE TC ALSE SE sum %

LOC_1as_242328 γ-A1 6,686.70 3,375.86 7,585.07 8,169.26 9,462.71 6,405.66 8,152.29 49,837.54 6.42

LOC_1bs_403475 γ-B1 108.87 32.95 48.67 72.65 39.52 30.05 39.28 372.01 0.05

LOC_1bs_417383 γ-B2 11.27 17.35 21.98 20.13 10.13 13.80 11.97 106.61 0.01

LOC_1bs_417504 γ-B3 13,748.79 8,092.90 32,263.44 36,713.30 30,097.64 21,595.21 49,651.88 192,163.17 24.76

LOC_1bs_424383 γ-B4 193.92 176.20 219.35 284.00 198.13 252.45 248.13 1,572.17 0.20

LOC_1bs_103207 γ-B5 11,285.92 3,874.48 13,402.90 15,111.28 14,808.05 4,385.03 13,093.36 75,961.02 9.79

LOC_1ds_157729 γ-D1 13,379.98 8,728.73 21,280.16 25,937.53 16,640.13 17,409.33 32,653.23 136,029.09 17.53

LOC_1ds_157919 γ-D2 6,236.85 435.78 514.82 717.68 548.64 722.28 663.55 9,839.60 1.27

LOC_1ds_161972 γ-D3 12,529.86 1,939.30 6,482.44 7,555.89 9,373.29 5,786.33 10,151.43 53,818.54 6.94

LOC_1ds_159209 γ-D4 9,989.12 4,394.45 10,703.52 11,984.00 10,550.20 9,150.93 11,746.70 68,518.92 8.83

LOC_1bs_403859 γ-B6 8,669.35 4,217.19 12,497.64 15,382.60 12,297.18 7,141.37 14,415.96 74,621.29 9.62

LOC_1bs_410447 γ-B7 8,611.11 2,769.66 4,969.81 6,188.71 5,273.84 6,067.35 6,466.88 40,347.36 5.20

LOC_1as_364348 γ-A2 655.74 314.69 647.95 835.71 847.72 541.91 771.79 4,615.51 0.59

LOC_1as_352304 γ-A3 7,335.26 4,452.57 11,876.04 13,124.21 10,222.87 8,256.40 12,881.37 68,148.74 8.78

Page 72: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

71

Table S31. Adjusted RPKM for ω-gliadins.

10DPA 20DPA 30DPA

geneId name W AL REF SE TC ALSE SE sum %

gth5_3275229_1as ω-A1 71.26 47.15 26.99 46.00 28.89 24.44 32.91 277.65 0.11

gth1_3281757_1as ω-A2 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.01 0.00

gth3_3299090_1as ω-A3 3,189.47 1,037.68 1,078.65 3,207.61 1,779.71 902.37 2,489.72 13,685.20 5.49

gth4_3312479_1as ω-A4 284.07 162.99 122.22 222.46 121.95 71.41 175.72 1,160.82 0.47

gth6_3275229_1as ω-A5 93.68 65.68 80.27 116.24 77.87 53.25 143.48 630.46 0.25

gth2_3314504_1as ω-A6 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.00

gth16_1871682_1ds ω-D1 18,653.53 6,006.88 8,997.11 17,816.48 11,225.16 9,050.87 21,444.64 93,194.68 37.39

gth17_1914507_1ds ω-D2 1,569.39 161.36 267.22 1,096.97 436.75 246.00 782.50 4,560.19 1.83

gth15_1895025_1ds ω-D3 1,752.33 1,109.26 614.39 1,460.90 737.61 245.89 556.48 6,476.87 2.60

gth11_3438818_1bs ω-B1 1,163.14 349.85 411.27 885.97 565.46 382.89 1,177.13 4,935.70 1.98

gth12_3413201_1bs ω-B2 202.86 99.67 86.24 189.53 86.42 84.19 139.05 887.95 0.36

gth8_3420099_1bs ω-B3 5,341.34 3,297.42 3,241.83 5,491.54 2,569.01 2,638.92 4,476.92 27,056.98 10.86

gth13_3424359_1bs ω-B4 701.05 220.39 64.40 219.09 94.94 55.31 82.68 1,437.84 0.58

gth9_3424360_1bs ω-B5 13,734.20 12,466.63 11,021.07 17,449.57 8,979.08 10,234.58 16,257.63 90,142.74 36.17

gth10_3431184_1bs ω-B6 1.90 0.88 0.87 1.98 0.82 1.00 1.66 9.12 0.00

gth14_3450071_1bs ω-B7 468.34 596.05 697.67 849.85 449.34 492.90 885.48 4,439.64 1.78

gth7_3457881_1bs ω-B8 143.54 53.03 19.31 57.60 32.54 7.42 14.72 328.16 0.13

Page 73: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

The International Wheat Genome Sequencing Consortium (IWGSC)

Klaus F. X. Mayer,1 Jane Rogers,2 Jaroslav Doležel,3 Curtis Pozniak,4 Kellye Eversole,2 Catherine Feuillet,5 Bikram Gill,6 Bernd Friebe,6 Adam J. Lukaszewski,7 Pierre Sourdille,14 Takashi R. Endo,8 Marie Kubaláková,3 Jarmila Číhalíková,3 Zdeňka Dubská,3 Jan Vrána,3 Romana Šperková,3 Hana Šimková,3 Melanie Febrer,9 Leah Clissold,10 Kirsten McLay,10 Kuldeep Singh,11 Parveen Chhuneja,11 Nagendra K. Singh,12 Jitendra Khurana,13 Eduard Akhunov,6 Frédéric Choulet,14 Adriana Alberti,15 Valérie Barbe,15 Patrick Wincker,15 Hiroyuki Kanamori,16 Fuminori Kobayashi,16 Takeshi Itoh,16 Takashi Matsumoto,16 Hiroaki Sakai,16 Tsuyoshi Tanaka,16 Jianzhong Wu,16 Yasunari Ogihara,17 Hirokazu Handa,16 P. Ron Maclachlan,4 Andrew Sharpe,18 Darrin Klassen,18 David Edwards,19 Jacqueline Batley,19 Odd-Arne Olsen,20,21 Simen Rød Sandve,20 Sigbjørn Lien,37 Burkhard Steuernagel,22 Brande Wulff,22 Mario Caccamo,10 Sarah Ayling,10 Ricardo H. Ramirez-Gonzalez,10 Bernardo J. Clavijo,10 Jonathan Wright,10 Matthias Pfeifer,1 Manuel Spannagl,1 Mihaela M. Martis,1 Martin Mascher,23 Jarrod Chapman,24 Jesse A. Poland,25 Uwe Scholz,23 Kerrie Barry,24 Robbie Waugh,26 Daniel S. Rokhsar,24 Gary J. Muehlbauer,27 Nils Stein,28 Heidrun Gundlach,1 Matthias Zytnicki,29 Véronique Jamilloux,29 Hadi Quesneville,29 Thomas Wicker,30 Primetta Faccioli,31 Moreno Colaiacovo,31 Antonio Michele Stanca,31 Hikmet Budak,32 Luigi Cattivelli,31 Natasha Glover,14 Lise Pingault,14 Etienne Paux,14 Sapna Sharma,1 Rudi Appels,33 Matthew Bellgard,33 Brett Chapman,33 Thomas Nussbaumer,1 Kai Christian Bader,1 Hélène Rimbert,36 Shichen Wang,6 Ron Knox,34 Andrzej Kilian,35 Michael Alaux,29 Françoise Alfama,29 Loïc Couderc,29 Nicolas Guilhot,14 Claire Viseux,29 Mikaël Loaec,29 Beat Keller,30 Sebastien Praud36

1Plant Genome and Systems Biology, Helmholtz Zentrum Munich, Ingolstädter Landstrasse 1, 85764 Neuherberg, Germany. 2IWGSC, Eversole Associates, 5207 Wyoming Road, Bethesda, MD 20816, USA. 3Institute of Experimental Botany, Center of Plant Structural and Functional Genomics, Šlechtitelů 31, 783 71 Olomouc, Czech Republic. 4Crop Development Centre, Department of Plant Sciences, College of Agriculture and Bioresources, University of Saskatchewan, 51 Campus Drive, Saskatoon SK, Canada. 5Bayer Crop Science, 3500 Paramount Parkway, Morrisville, NC 27560, USA. 6Kansas State University, Department of Plant Pathology, Manhattan, KS 66506–5502, USA. 7College of Natural and Agricultural Sciences, Botany and Plant Sciences, University of California, Riverside, CA 92521, USA. 8Laboratory of Plant Genetics, Graduate School of Agriculture, Kyoto University, Kyoto 606-8502, Japan. 9Genomic Sequencing Unit, University of Dundee, Dow Street, Dundee DD1 5EH, UK. 10Genome Analysis Centre, Norwich Research Park, Norwich, NR4 7UH, UK. 11School of Agrictural Biotechnology, Punjab Agricultural University, Ludhiana 141 004, India. 12National Research Centre on Plant Biotechnology, Indian Agricultural Research Institute, New Delhi 110 012, India. 13Interdisciplinary Centre for Plant Genomics and Department of Plant Molecular Biology, University of Delhi, South Campus, New Delhi 110 021, India. 14INRA–University Blaise Pascal UMR1095 Genetics, Diversity and Ecophysiology of Cereals, 5 chemin de Beaulieu, 63039 Clermont-Ferrand, France. 15Commissariat à l’Energie Atomique Genoscope, Centre National de Séquençage, 2 rue Gaston Crémieux, CP5706, 91057 Evry, France.

Page 74: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

16Plant Genome Research Unit, National Institute of Agrobiological Sciences, 2-1-2, Kan-non-dai, Tsukuba 305-8602, Japan. 17Kihara Institute for Biological Research, Yokohama City University, Maioka-cho 641-12, Totsuka-ku, 244-0813 Yokohama, Japan. 18National Research Council Canada, 110 Gymnasium Place, Saskatoon, SK, S7N 0W9, Canada. 19Australian Centre for Plant Functional Genomics, School of Agriculture and Food Sciences, University of Queensland, St. Lucia, QLD 4072, Australia, and School of Plant Biology, University of Western Australia, WA 6009, Australia. 20Department of Plant Science, Center for Integrative Genetics (CIGENE), Norwegian University of Life Sciences, 1432 Ås, Norway. 21Department of Natural Science and Technology, Hedmark University College, N-2318, Norway. 22Sainsbury Laboratory, Norwich Research Park, Norwich, NR4 7UH, UK. 23Bioinformatics and Information Technology, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), D-06466 Seeland OT Gatersleben, Germany. 24U.S. Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA. 25USDA-ARS Hard Winter Wheat Genetics Research Unit and Department of Agronomy, Kansas State University, Manhattan KS, 66506-5502, USA. 26James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK. 27Department of Agronomy and Plant Genetics, Department of Plant Biology, University of Minnesota, St. Paul, MN 55108, USA. 28Genome Diversity, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), D-06466 Seeland OT Gatersleben, Germany. 29INRA, UR1164 URGI–Research Unit in Genomics-Info, INRA de Versailles, Route de Saint-Cyr, Versailles, 78026, France. 30Institute of Plant Biology, University of Zurich, Zollikerstrasse 107, CH-8008 Zurich, Switzerland. 31Consiglio per la Ricerca e la sperimentazione in Agricoltura–Genomics Research Centre, via San Protaso 302, I-29017 Fjorenzuola d’Arda, Italy. 32Sabanci University Biological Sciences and Bioengineering Program, 34956 Istanbul, Turkey. 33Centre for Comparative Genomics, Murdoch University, Perth, WA 6150, Australia. 34Semiarid Prairie Agricultural Research Centre, Post Office Box 1030, Swift Current, Saskatchewan S9H 3X2, Canada. 35Diversity Arrays Technology Pty Limited, 1 Wilf Crane Crescent, Yarralumla ACT2600, Australia. 36Biogemma, Centre de Recherche de Chappes, Route d’Ennezat, 63720 Chappes, France. 37Department of Animal and Aquicultural Sciences, CIGENE, Norwegian University of Life Sciences, Arboretvelen 6, 1432 Ås, Norway.

Page 75: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

References and Notes 1. M. Feldman, A. A. Levy, T. Fahima, A. Korol, Genomic asymmetry in allopolyploid plants:

Wheat as a model. J. Exp. Bot. 63, 5045–5059 (2012). Medline doi:10.1093/jxb/ers192

2. A. R. Akhunova, R. T. Matniyazov, H. Liang, E. D. Akhunov, Homoeolog-specific transcriptional bias in allopolyploid wheat. BMC Genomics 11, 505 (2010). Medline doi:10.1186/1471-2164-11-505

3. T. K. Pellny, A. Lovegrove, J. Freeman, P. Tosi, C. G. Love, J. P. Knox, P. R. Shewry, R. A. Mitchell, Cell walls of developing wheat starchy endosperm: Comparison of composition and RNA-Seq transcriptome. Plant Physiol. 158, 612–627 (2012). Medline doi:10.1104/pp.111.189191

4. S. A. Gillies, A. Futardo, R. J. Henry, Gene expression in the developing aleurone and starchy endosperm of wheat. Plant Biotechnol. J. 10, 668–679 (2012). Medline doi:10.1111/j.1467-7652.2012.00705.x

5. S. Drea, D. J. Leader, B. C. Arnold, P. Shaw, L. Dolan, J. H. Doonan, Systematic spatial analysis of gene expression during wheat caryopsis development. Plant Cell 17, 2172–2185 (2005). Medline doi:10.1105/tpc.105.034058

6. International Wheat Genome Sequencing Consortium, A chromosome-based draft sequence of the hexaploid bread wheat genome. Science 345, 1251788 (2014).

7. Materials and methods are available as supplementary materials on Science Online.

8. M. F. Belmonte, R. C. Kirkbride, S. L. Stone, J. M. Pelletier, A. Q. Bui, E. C. Yeung, M. Hashimoto, J. Fei, C. M. Harada, M. D. Munoz, B. H. Le, G. N. Drews, S. M. Brady, R. B. Goldberg, J. J. Harada, Comprehensive developmental profiles of gene activity in regions and subregions of the Arabidopsis seed. Proc. Natl. Acad. Sci. U.S.A. 110, E435–E444 (2013). Medline doi:10.1073/pnas.1222061110

9. N. Sreenivasulu, B. Usadel, A. Winter, V. Radchuk, U. Scholz, N. Stein, W. Weschke, M. Strickert, T. J. Close, M. Stitt, A. Graner, U. Wobus, Barley grain maturation and germination: Metabolic pathway and regulatory network commonalities and differences highlighted by new MapMan/PageMan profiling tools. Plant Physiol. 146, 1738–1758 (2008). Medline doi:10.1104/pp.107.111781

10. M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, G. Sherlock, The Gene Ontology Consortium, Gene ontology: Tool for the unification of biology. Nat. Genet. 25, 25–29 (2000). Medline doi:10.1038/75556

11. W. H. Vensel, C. K. Tanaka, N. Cai, J. H. Wong, B. B. Buchanan, W. J. Hurkman, Developmental changes in the metabolic protein profiles of wheat endosperm. Proteomics 5, 1594–1611 (2005). Medline doi:10.1002/pmic.200401034

12. A. Serna, M. Maitz, T. O’Connell, G. Santandrea, K. Thevissen, K. Tienens, G. Hueros, C. Faleri, G. Cai, F. Lottspeich, R. D. Thompson, Maize endosperm secretes a novel antifungal protein into adjacent maternal tissue. Plant J. 25, 687–698 (2001). Medline doi:10.1046/j.1365-313x.2001.01004.x

Page 76: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

13. O. A. Olsen, Nuclear endosperm development in cereals and Arabidopsis thaliana. Plant Cell 16, S214–S227 (2004). Medline doi:10.1105/tpc.017111

14. P. R. Shewry, N. G. Halford, Cereal seed storage proteins: Structures, properties and role in grain utilization. J. Exp. Bot. 53, 947–958 (2002). Medline doi:10.1093/jexbot/53.370.947

15. S. Horvath, J. Dong, Geometric interpretation of gene coexpression network analysis. PLOS Comput. Biol. 4, e1000117 (2008). Medline doi:10.1371/journal.pcbi.1000117

16. F. Supek, M. Bošnjak, N. Škunca, T. Šmuc, REVIGO summarizes and visualizes long lists of gene ontology terms. PLOS ONE 6, e21800 (2011). Medline doi:10.1371/journal.pone.0021800

17. T. Marcussen et al., Ancient hybridizations among the ancestral genomes of bread wheat. Science 245, 1250092 (2014).

18. L. Comai, The advantages and disadvantages of being polyploid. Nat. Rev. Genet. 6, 836–846 (2005). Medline doi:10.1038/nrg1711

19. N. Shitsukawa, C. Tahira, K. Kassai, C. Hirabayashi, T. Shimizu, S. Takumi, K. Mochida, K. Kawaura, Y. Ogihara, K. Murai, Genetic and epigenetic alteration among three homoeologous genes of a class E MADS box gene in hexaploid wheat. Plant Cell 19, 1723–1737 (2007). Medline doi:10.1105/tpc.107.051813

20. Z. Hu, Z. Han, N. Song, L. Chai, Y. Yao, H. Peng, Z. Ni, Q. Sun, Epigenetic modification contributes to the expression divergence of three TaEXPA1 homoeologs in hexaploid wheat (Triticum aestivum). New Phytol. 197, 1344–1352 (2013). Medline doi:10.1111/nph.12131

21. E. J. Finnegan, C. C. Sheldon, F. Jardinaud, W. J. Peacock, E. S. Dennis, A cluster of Arabidopsis genes with a coordinate response to an environmental stimulus. Curr. Biol. 14, 911–916 (2004). Medline doi:10.1016/j.cub.2004.04.045

22. G. Moore, T. Foote, T. Helentjaris, K. Devos, N. Kurata, M. Gale, Was there a single ancestral cereal chromosome? Trends Genet. 11, 81–82 (1995). Medline doi:10.1016/S0168-9525(00)89005-8

23. S. Bolot, M. Abrouk, U. Masood-Quraishi, N. Stein, J. Messing, C. Feuillet, J. Salse, The ‘inner circle’ of the cereal genomes. Curr. Opin. Plant Biol. 12, 119–125 (2009). Medline doi:10.1016/j.pbi.2008.10.011

24. K. Nakabayashi, M. Okamoto, T. Koshiba, Y. Kamiya, E. Nambara, Genome-wide profiling of stored mRNA in Arabidopsis thaliana seed germination: Epigenetic and genetic regulation of transcription in seed. Plant J. 41, 697–709 (2005). Medline doi:10.1111/j.1365-313X.2005.02337.x

25. C. Rustenholz, F. Choulet, C. Laugier, J. Safár, H. Simková, J. Dolezel, F. Magni, S. Scalabrin, F. Cattonaro, S. Vautrin, A. Bellec, H. Bergès, C. Feuillet, E. Paux, A 3,000-loci transcription map of chromosome 3B unravels the structural and functional features of gene islands in hexaploid wheat. Plant Physiol. 157, 1596–1608 (2011). Medline doi:10.1104/pp.111.183921

Page 77: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

26. C. F. Morris, Puroindolines: The molecular genetic basis of wheat grain hardness. Plant Mol. Biol. 48, 633–647 (2002). Medline doi:10.1023/A:1014837431178

27. C. Ravel, P. Martre, I. Romeuf, M. Dardevet, R. El-Malki, J. Bordes, N. Duchateau, D. Brunel, F. Balfourier, G. Charmet, Nucleotide polymorphism in the wheat transcriptional activator Spa influences its pattern of expression and has pleiotropic effects on grain protein composition, dough viscoelasticity, and grain hardness. Plant Physiol. 151, 2133–2144 (2009). Medline doi:10.1104/pp.109.146076

28. P. Payne, G. Lawrence, Catalogue of alleles for the complex gene loci, Glu-A1, Glu-B1, and Glu-D1 which code for high-molecular-weight subunits of glutenin in hexaploid wheat. Cereal Res. Commun. 11, 29–35 (1983).

29. R. D. Thompson, D. Bartels, N. P. Harberd, R. B. Flavell, Characterization of the multigene family coding for HMW glutenin subunits in wheat using cDNA clones. Theor. Appl. Genet. 67, 87–96 (1983). Medline doi:10.1007/BF00303930

30. N. Chantret, J. Salse, F. Sabot, S. Rahman, A. Bellec, B. Laubin, I. Dubois, C. Dossat, P. Sourdille, P. Joudrier, M. F. Gautier, L. Cattolico, M. Beckert, S. Aubourg, J. Weissenbach, M. Caboche, M. Bernard, P. Leroy, B. Chalhoub, Molecular basis of evolutionary events that shaped the hardness locus in diploid and polyploid wheat species (Triticum and Aegilops). Plant Cell 17, 1033–1045 (2005). Medline doi:10.1105/tpc.104.029181

31. M. Wilkinson, Y. Wan, P. Tosi, M. Leverington, J. Snape, R. A. C. Mitchell, P. R. Shewry, Identification and genetic mapping of variant forms of puroindoline b expressed in developing wheat grain. J. Cereal Sci. 48, 722–728 (2008). doi:10.1016/j.jcs.2008.03.007

32. F. M. Dupont, W. H. Vensel, C. K. Tanaka, W. J. Hurkman, S. B. Altenbach, Deciphering the complexities of the wheat flour proteome using quantitative two-dimensional electrophoresis, three proteases and tandem mass spectrometry. Proteome Sci. 9, 10 (2011). Medline doi:10.1186/1477-5956-9-10

33. O. D. Anderson, N. Huo, Y. Q. Gu, The gene space in wheat: The complete γ-gliadin gene family from the wheat cultivar Chinese Spring. Funct. Integr. Genomics 13, 261–273 (2013). Medline doi:10.1007/s10142-013-0321-8

34. O. D. Anderson, Y. Q. Gu, X. Kong, G. R. Lazo, J. Wu, The wheat omega-gliadin genes: Structure and EST analysis. Funct. Integr. Genomics 9, 397–410 (2009). Medline doi:10.1007/s10142-009-0122-2

35. B. Langmead, C. Trapnell, M. Pop, S. L. Salzberg, Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009). Medline doi:10.1186/gb-2009-10-3-r25

36. C. Trapnell, L. Pachter, S. L. Salzberg, TopHat: Discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009). Medline doi:10.1093/bioinformatics/btp120

37. C. Trapnell, B. A. Williams, G. Pertea, A. Mortazavi, G. Kwan, M. J. van Baren, S. L. Salzberg, B. J. Wold, L. Pachter, Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010). Medline doi:10.1038/nbt.1621

Page 78: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

38. C. Trapnell, D. G. Hendrickson, M. Sauvageau, L. Goff, J. L. Rinn, L. Pachter, Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol. 31, 46–53 (2013). Medline doi:10.1038/nbt.2450

39. L. Li, C. J. Stoeckert Jr., D. S. Roos, OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003). Medline doi:10.1101/gr.1224503

40. The Arabidopsis Genome Initiative, Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815 (2000). Medline doi:10.1038/35048692

41. J. Jia, S. Zhao, X. Kong, Y. Li, G. Zhao, W. He, R. Appels, M. Pfeifer, Y. Tao, X. Zhang, R. Jing, C. Zhang, Y. Ma, L. Gao, C. Gao, M. Spannagl, K. F. Mayer, D. Li, S. Pan, F. Zheng, Q. Hu, X. Xia, J. Li, Q. Liang, J. Chen, T. Wicker, C. Gou, H. Kuang, G. He, Y. Luo, B. Keller, Q. Xia, P. Lu, J. Wang, H. Zou, R. Zhang, J. Xu, J. Gao, C. Middleton, Z. Quan, G. Liu, J. Wang, H. Yang, X. Liu, Z. He, L. Mao, J. Wang, International Wheat Genome Sequencing Consortium, Aegilops tauschii draft genome sequence reveals a gene repertoire for wheat adaptation. Nature 496, 91–95 (2013). Medline doi:10.1038/nature12028

42. P. Langfelder, S. Horvath, WGCNA: An R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008). Medline doi:10.1186/1471-2105-9-559

43. P. Langfelder, B. Zhang, S. Horvath, Defining clusters from a hierarchical cluster tree: The Dynamic Tree Cut package for R. Bioinformatics 24, 719–720 (2008). Medline doi:10.1093/bioinformatics/btm563

44. S. Falcon, R. Gentleman, Using GOstats to test gene lists for GO term association. Bioinformatics 23, 257–258 (2007). Medline doi:10.1093/bioinformatics/btl567

45. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, D. J. Lipman, Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990). Medline doi:10.1016/S0022-2836(05)80360-2

46. Z. Yang, PAML 4: Phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007). Medline doi:10.1093/molbev/msm088

47. International Brachypodium Initiative, Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463, 763–768 (2010). Medline doi:10.1038/nature08747

48. International Rice Genome Sequencing Project, The map-based sequence of the rice genome. Nature 436, 793–800 (2005). Medline doi:10.1038/nature03895

49. A. H. Paterson, J. E. Bowers, R. Bruggmann, I. Dubchak, J. Grimwood, H. Gundlach, G. Haberer, U. Hellsten, T. Mitros, A. Poliakov, J. Schmutz, M. Spannagl, H. Tang, X. Wang, T. Wicker, A. K. Bharti, J. Chapman, F. A. Feltus, U. Gowik, I. V. Grigoriev, E. Lyons, C. A. Maher, M. Martis, A. Narechania, R. P. Otillar, B. W. Penning, A. A. Salamov, Y. Wang, L. Zhang, N. C. Carpita, M. Freeling, A. R. Gingle, C. T. Hash, B. Keller, P. Klein, S. Kresovich, M. C. McCann, R. Ming, D. G. Peterson, D. Mehboob-ur-Rahman, P. Ware, K. F. Westhoff, J. Mayer, D. S. Messing, Rokhsar, The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551–556 (2009). Medline doi:10.1038/nature07723

Page 79: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

50. K. F. Mayer, M. Martis, P. E. Hedley, H. Simková, H. Liu, J. A. Morris, B. Steuernagel, S. Taudien, S. Roessner, H. Gundlach, M. Kubaláková, P. Suchánková, F. Murat, M. Felder, T. Nussbaumer, A. Graner, J. Salse, T. Endo, H. Sakai, T. Tanaka, T. Itoh, K. Sato, M. Platzer, T. Matsumoto, U. Scholz, J. Dolezel, R. Waugh, N. Stein, Unlocking the barley genome by chromosomal and comparative genomics. Plant Cell 23, 1249–1263 (2011). Medline doi:10.1105/tpc.110.082537

51. G. Gremme, V. Brendel, M. E. Sparks, S. Kurtz, Engineering a software tool for gene structure prediction in higher organisms. Inf. Softw. Technol. 47, 965–978 (2005). doi:10.1016/j.infsof.2005.09.005

52. A. Mortazavi, B. A. Williams, K. McCue, L. Schaeffer, B. Wold, Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008). Medline doi:10.1038/nmeth.1226

53. M. A. Larkin, G. Blackshields, N. P. Brown, R. Chenna, P. A. McGettigan, H. McWilliam, F. Valentin, I. M. Wallace, A. Wilm, R. Lopez, J. D. Thompson, T. J. Gibson, D. G. Higgins, Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947–2948 (2007). Medline doi:10.1093/bioinformatics/btm404

54. N. Darzentas, Circoletto: Visualizing sequence similarity with Circos. Bioinformatics 26, 2620–2621 (2010). Medline doi:10.1093/bioinformatics/btq484

55. X. J. Min, G. Butler, R. Storms, A. Tsang, OrfPredictor: Predicting protein-coding regions in EST-derived sequences. Nucleic Acids Res. 33 (Web Server), W677–W680 (2005). Medline doi:10.1093/nar/gki394

56. P. S. Schnable, D. Ware, R. S. Fulton, J. C. Stein, F. Wei, S. Pasternak, C. Liang, J. Zhang, L. Fulton, T. A. Graves, P. Minx, A. D. Reily, L. Courtney, S. S. Kruchowski, C. Tomlinson, C. Strong, K. Delehaunty, C. Fronick, B. Courtney, S. M. Rock, E. Belter, F. Du, K. Kim, R. M. Abbott, M. Cotton, A. Levy, P. Marchetto, K. Ochoa, S. M. Jackson, B. Gillam, W. Chen, L. Yan, J. Higginbotham, M. Cardenas, J. Waligorski, E. Applebaum, L. Phelps, J. Falcone, K. Kanchi, T. Thane, A. Scimone, N. Thane, J. Henke, T. Wang, J. Ruppert, N. Shah, K. Rotter, J. Hodges, E. Ingenthron, M. Cordes, S. Kohlberg, J. Sgro, B. Delgado, K. Mead, A. Chinwalla, S. Leonard, K. Crouse, K. Collura, D. Kudrna, J. Currie, R. He, A. Angelova, S. Rajasekar, T. Mueller, R. Lomeli, G. Scara, A. Ko, K. Delaney, M. Wissotski, G. Lopez, D. Campos, M. Braidotti, E. Ashley, W. Golser, H. Kim, S. Lee, J. Lin, Z. Dujmic, W. Kim, J. Talag, A. Zuccolo, C. Fan, A. Sebastian, M. Kramer, L. Spiegel, L. Nascimento, T. Zutavern, B. Miller, C. Ambroise, S. Muller, W. Spooner, A. Narechania, L. Ren, S. Wei, S. Kumari, B. Faga, M. J. Levy, L. McMahan, P. Van Buren, M. W. Vaughn, K. Ying, C. T. Yeh, S. J. Emrich, Y. Jia, A. Kalyanaraman, A. P. Hsia, W. B. Barbazuk, R. S. Baucom, T. P. Brutnell, N. C. Carpita, C. Chaparro, J. M. Chia, J. M. Deragon, J. C. Estill, Y. Fu, J. A. Jeddeloh, Y. Han, H. Lee, P. Li, D. R. Lisch, S. Liu, Z. Liu, D. H. Nagel, M. C. McCann, P. SanMiguel, A. M. Myers, D. Nettleton, J. Nguyen, B. W. Penning, L. Ponnala, K. L. Schneider, D. C. Schwartz, A. Sharma, C. Soderlund, N. M. Springer, Q. Sun, H. Wang, M. Waterman, R. Westerman, T. K. Wolfgruber, L. Yang, Y. Yu, L. Zhang, S. Zhou, Q. Zhu, J. L. Bennetzen, R. K. Dawe, J. Jiang, N. Jiang, G. G. Presting, S. R. Wessler, S. Aluru, R. A. Martienssen, S. W. Clifton, W. R. McCombie, R. A. Wing, R. K. Wilson,

Page 80: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

The B73 maize genome: Complexity, diversity, and dynamics. Science 326, 1112–1115 (2009). Medline doi:10.1126/science.1178534

57. R. L. Tatusov, E. V. Koonin, D. J. Lipman, A genomic perspective on protein families. Science 278, 631–637 (1997). Medline doi:10.1126/science.278.5338.631

58. T. W. Binsl, K. M. Mullen, I. H. van Stokkum, J. Heringa, J. van Beek, FluxSimulator: An R package to simulate isotopomer distributions in metabolic networks. J. Stat. Softw. 18, 1–18 (2007).

59. R. Suzuki, H. Shimodaira, Pvclust: An R package for assessing the uncertainty in hierarchical clustering. Bioinformatics 22, 1540–1542 (2006). Medline doi:10.1093/bioinformatics/btl117

60. P. J. Rousseeuw, Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987). doi:10.1016/0377-0427(87)90125-7

61. A. M. Yip, S. Horvath, Gene network interconnectedness and the generalized topological overlap measure. BMC Bioinformatics 8, 22 (2007). Medline doi:10.1186/1471-2105-8-22

62. P. Shannon, A. Markiel, O. Ozier, N. S. Baliga, J. T. Wang, D. Ramage, N. Amin, B. Schwikowski, T. Ideker, Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003). Medline doi:10.1101/gr.1239303

63. G. Csardi, T. Nepusz, The igraph software package for complex network research. InterJournal Complex Systems 1695, 1695 (2006).

64. K. F. Mayer, S. Taudien, M. Martis, H. Simková, P. Suchánková, H. Gundlach, T. Wicker, A. Petzold, M. Felder, B. Steuernagel, U. Scholz, A. Graner, M. Platzer, J. Dolezel, N. Stein, Gene content and virtual gene order of barley chromosome 1H. Plant Physiol. 151, 496–505 (2009). Medline doi:10.1104/pp.109.142612

65. J. Salse, S. Bolot, M. Throude, V. Jouffe, B. Piegu, U. M. Quraishi, T. Calcagno, R. Cooke, M. Delseny, C. Feuillet, Identification and characterization of shared duplications between rice and wheat provide new insight into grass genome evolution. Plant Cell 20, 11–24 (2008). Medline doi:10.1105/tpc.107.056309

66. P. Hernandez, M. Martis, G. Dorado, M. Pfeifer, S. Gálvez, S. Schaaf, N. Jouve, H. Šimková, M. Valárik, J. Doležel, K. F. Mayer, Next-generation sequencing and syntenic integration of flow-sorted arms of wheat chromosome 4A exposes the chromosome structure and gene content. Plant J. 69, 377–386 (2012). Medline doi:10.1111/j.1365-313X.2011.04808.x

67. K. Miftahudin, K. Ross, X. F. Ma, A. A. Mahmoud, J. Layton, M. A. Milla, T. Chikmawati, J. Ramalingam, O. Feril, M. S. Pathan, G. S. Momirovic, S. Kim, K. Chema, P. Fang, L. Haule, H. Struxness, J. Birkes, C. Yaghoubian, R. Skinner, J. McAllister, V. Nguyen, L. L. Qi, B. Echalier, B. S. Gill, A. M. Linkiewicz, J. Dubcovsky, E. D. Akhunov, J. Dvorák, M. Dilbirligi, K. S. Gill, J. H. Peng, N. L. Lapitan, C. E. Bermudez-Kandianis, M. E. Sorrells, K. G. Hossain, V. Kalavacharla, S. F. Kianian, G. R. Lazo, S. Chao, O. D. Anderson, J. Gonzalez-Hernandez, E. J. Conley, J. A. Anderson, D. W. Choi, R. D. Fenton, T. J. Close, P. E. McGuire, C. O. Qualset, H. T. Nguyen, J. P. Gustafson,

Page 81: Supplementary Material for - Science...Supplementary Material for Genome interplay in the grain transcriptome of hexaploid bread wheat Matthias Pfeifer, Karl G. Kugler, Simen …

Analysis of expressed sequence tag loci on wheat chromosome group 4. Genetics 168, 651–663 (2004). Medline doi:10.1534/genetics.104.034827

68. K. M. Devos, J. Dubcovsky, J. Dvořák, C. N. Chinoy, M. D. Gale, Structural evolution of wheat chromosomes 4A, 5A, and 7B and its impact on recombination. Theor. Appl. Genet. 91, 282–288 (1995). Medline doi:10.1007/BF00220890

69. A. M. Waterhouse, J. B. Procter, D. M. Martin, M. Clamp, G. J. Barton, Jalview Version 2—A multiple sequence alignment editor and analysis workbench. Bioinformatics 25, 1189–1191 (2009). Medline doi:10.1093/bioinformatics/btp033