6
Wenger and Galliot Supplementary Information Emergence of Genetic and Phenotypic Innovations in Eumetazoan, Bilaterian, Euteleostomi and Hominidae Ancestors Yvan WENGER and Brigitte GALLIOT Department of Genetics and Evolution, Institute of Genetics and Genomics in Geneva (iGE3), University of Geneva, Geneva, Switzerland. Corresponding author: [email protected] Table S1: Human orthologs in 23 species as retrieved by RBH (BlastP+, E-value ≤10 -10 , soft masking). Numbers of a given row indicate RBHs scores obtained by the human protein sequence indicated on the left (SwissProt AC). A value >10 indicates orthology, whereas value of 1 indicate that no blast hit was retrieved and value of 10 that no RBH was retrieved. This table is provided as supplementary data to Fig. 2, Fig. 3, Fig. 4. Table S2: List of novel human biological processes (huBPs) as deduced from protein- enriched gene ontologies (GOs) at selected evolutionary steps in inferred ancestors. Protein- enriched GOs were assessed using the human background. GOs for Biological Processes, Molecular Functions, Cellular Compartments with a corrected P-value ≤0.05 and fold enrichment ≥1.5x are shown; GOs with corrected P-values ≤10 -5 and fold enrichment ≥2x are highlighted in color. This table is provided as supplementary data to Fig. 5. A view of this Table with the 10 most significantly protein-enriched BPs is provided as Table 2. Table S3: List of novel biological processes (BPs) as deduced from protein-enriched gene ontologies (GOs) in cnidarian proteomes (Acropora, Nematostella, Hydra and Clytia). Protein-enriched GOs were assessed using the non-bilaterian background (Groups I+II+III in Fig.3). Protein-enriched GOs with a corrected P-value ≤0.05 and fold enrichment ≥1.5x are shown; those with corrected P-values ≤10 -5 and fold enrichment ≥2x are highlighted in color. This table is provided as supplementary data to Fig. 6. FIGURE S1: Human orthologomes obtained after RBH computing (this study) or using the InParanoid software. 2 FIGURE S2: Comparative analysis of the timing of emergences of founder domains and human orthologs of 900 human gatekeeper cancer genes obtained by two different methods, Phylostratigraphy (Domazet-Loso et al. BMC Biol. 2010) and RBHs (this study). 3 FIGURE S3. Limited impact of testing a higher number of Fungi species on the timing of emergences of human orthologs in unikonts evolution. 4 FIGURE S4: Low level of redundancy between human Biological Processes (huBPs) detected in the LCAs of each group. 5 FIGURE S5: Robustness of protein-enriched Biological Processes (BPs) in cnidarians (Group-III) when different backgrounds are used. 6 1

2013 WENGER Supplementary information GBE

  • Upload
    unige

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Wenger and Galliot Supplementary Information

Emergence of Genetic and Phenotypic Innovations

in Eumetazoan, Bilaterian, Euteleostomi and Hominidae Ancestors

Yvan WENGER and Brigitte GALLIOT

Department of Genetics and Evolution, Institute of Genetics and Genomics in Geneva (iGE3),

University of Geneva, Geneva, Switzerland.

Corresponding author: [email protected]

Table S1: Human orthologs in 23 species as retrieved by RBH (BlastP+, E-value ≤10-10, soft masking). Numbers of a given row indicate RBHs scores obtained by the human protein sequence indicated on the left (SwissProt AC). A value >10 indicates orthology, whereas value of 1 indicate that no blast hit was retrieved and value of 10 that no RBH was retrieved. This table is provided as supplementary data to Fig. 2, Fig. 3, Fig. 4.

Table S2: List of novel human biological processes (huBPs) as deduced from protein-enriched gene ontologies (GOs) at selected evolutionary steps in inferred ancestors. Protein-enriched GOs were assessed using the human background. GOs for Biological Processes, Molecular Functions, Cellular Compartments with a corrected P-value ≤0.05 and fold enrichment ≥1.5x are shown; GOs with corrected P-values ≤10-5 and fold enrichment ≥2x are highlighted in color. This table is provided as supplementary data to Fig. 5. A view of this Table with the 10 most significantly protein-enriched BPs is provided as Table 2.

Table S3: List of novel biological processes (BPs) as deduced from protein-enriched gene ontologies (GOs) in cnidarian proteomes (Acropora, Nematostella, Hydra and Clytia). Protein-enriched GOs were assessed using the non-bilaterian background (Groups I+II+III in Fig.3). Protein-enriched GOs with a corrected P-value ≤0.05 and fold enrichment ≥1.5x are shown; those with corrected P-values ≤10-5 and fold enrichment ≥2x are highlighted in color. This table is provided as supplementary data to Fig. 6.

FIGURE S1: Human orthologomes obtained after RBH computing (this study) or using the InParanoid software. 2

FIGURE S2: Comparative analysis of the timing of emergences of founder domains and human orthologs of 900 human gatekeeper cancer genes obtained by two different methods, Phylostratigraphy (Domazet-Loso et al. BMC Biol. 2010) and RBHs (this study). 3

FIGURE S3. Limited impact of testing a higher number of Fungi species on the timing of emergences of human orthologs in unikonts evolution. 4

FIGURE S4: Low level of redundancy between human Biological Processes (huBPs) detected in the LCAs of each group. 5

FIGURE S5: Robustness of protein-enriched Biological Processes (BPs) in cnidarians (Group-III) when different backgrounds are used. 6

1

Wenger and Galliot Supplementary Information

FIGURE S1: Human orthologomes obtained after RBH computing (this study) or using the InParanoid software. Comparison of the number of human orthologs retrieved either by InParanoid (red) or by RBH (blue). Both types of analysis were performed with the datasets indicated in Table S1 or with the InParanoid datasets version 7.0 http://inparanoid.sbc.su.se/cgi-bin/summary.cgi.

2

Wenger and Galliot Supplementary Information

FIGURE S2: Comparative analysis of two different methods, phylostratigraphy (Domazet-Loso et al. BMC Biol. 2010) and RBHs (this study), to map the emergence of founder domains and human orthologs of 900 human gatekeeper cancer genes. The results obtained by Domazet-Lozo & Tautz (2010) to deduce the emergence of “founder domains” of 900 gatekeeper proteins after BLASTp on all sequences present at the time of their study in the NCBI nr database (10-3 threshold, blue bars), were compared to the results obtained when a BLASTp of the same 900 proteins was performed on the 23 species protein dataset used in this study (10-10 threshold, green bars, see Materials and Methods for the protein dataset). Results are roughly similar with a majority of protein domains already present before the emergence of opisthokonts. However, emergences traced from the opisthokont LCA to bilaterian LCA tend to be shifted to more recent periods in the second test, likely as a consequence of the lower blast E-value threshold taken in this analysis. When these 900 proteins were extracted from the RBH dataset (see Table S1), we noted a broader distribution of their emergence in LCAs of pre-opisthokonts, opisthokonts, eumetazoans, and vertebrates respectively (red bars). Practically the supplementary tables containing 900 human Entrez Gatekeepers genes were retrieved from Domazet-Loso and Tautz, 2010 (see “Entrez Gatekeepers” in Table S1). Of the 900 genes, 891 were present in our dataset. Species indicated in the box are modern descendants of extinct lineages and were used here to infer the gene complement of the last common ancestors (LCAs) of the indicated group. The phylostrata 1 to 19 described in Domazet-Loso and Tautz, 2010 (Figure 1) were adjusted to reflect the strata covered in the present study: phylostrata 1-3 were grouped into pre-opisthokonts, phylostrata 4-5 to the opisthokont LCA, phylostratum 6 to the metazoan LCA, phylostratum 7 to the eumetazoan LCA, phylostratum 8 to the bilaterian LCA, phylostratum 9 to the deuterostome LCA, phylostrata 10-11 to the chordate LCA, phylostrata 12-14 to the tetrapod LCA, phylostratum 15 to the amniote LCA, and phylostrata 16-19 to the catarrhini LCA.

3

Wenger and Galliot Supplementary Information

FIGURE S3. Limited impact of testing a higher number of Fungi species on the timing of emergence of human orthologs in unikonts evolution. A) As Saccharomyces cerevisiae is known to have undergone severe genome reduction (Cliften et al. 2006 - see also the lower S. cerevisae orthologome size among non-metazoans in Figure 2-), we measured the orthologomes of the Zygomycetes Rhizopus oryzae (Ma et al. 2009) and Phycomyces blakesleeanus (Joint Genome Institute assembly v2), the Basidiomycete Cryptococcus neoformans (Loftus et al. 2005) and the Ascomycete Aspergillus fumigatus (Nierman et al. 2005). Except for P. blakesleeanus that was downloaded from the Joint Genome Institute website, new fungal datasets are Uniprot reference proteome sets obtained on August 14th 2013.

B) Timing of emergence of human orthologs in metazoan evolution when Fungi are represented by S. cerevisiae only (green bars, as in Fig. 4A) or by five fungal species (blue bars), R. oryzae, P. blakesleeanus, C. neoformans, A. fumigatus, S. cerevisiae. As in Figure 4, gains in human orthologs were obtained by testing the complete human proteome against the proteomes of species belonging to phyla branching at various steps of metazoan evolution. As throughout the main study, other species considered as non-metazoan are : A. thaliana, D. discoideum, S. cerevisiae, C. owczarzaki, M. brevicollis, S. rosetta.

Cliften, PF, RS Fulton, RK Wilson, M Johnston. 2006. After the duplication: gene loss and adaptation in Saccharomyces genomes. Genetics 172:863-872.

Loftus, BJ, et al. 2005. The genome of the basidiomycetous yeast and human pathogen Cryptococcus neoformans. Science 307:1321-1324.

Ma, LJ, et al. 2009. Genomic analysis of the basal lineage fungus Rhizopus oryzae reveals a whole-genome duplication. PLoS Genet 5:e1000549.

Nierman, WC, et al. 2005. Genomic sequence of the pathogenic and allergenic filamentous fungus Aspergillus fumigatus. Nature 438:1151-1156.

4

Wenger and Galliot Supplementary Information

FIGURE S4: Low level of redundancy between human Biological Processes (huBPs) detected in the LCAs of each group. A-G) To assess whether the number of novel BPs shown in Fig. 4A was biased by highly similar BPs, protein-enriched BPs (pink circles) at each phylostratum were compared for their protein content and were linked together (blue edges) when sharing more than 90% of their proteins (bi-directionally). H) To evaluate the redundancy effect among the protein-enriched BPs, BPs sharing more than 90% proteins are clustered. Bars represent the percentages of protein-enriched BPs measured without (grey) and with (orange) 90% redundancy over the total number of BPs.

5

Wenger and Galliot Supplementary Information

FIGURE S5: Robustness of protein-enriched Biological Processes (BPs) in cnidarians (Group-III) when different backgrounds are used. A) Comparison of the corrected P-values of the 30 most significantly protein-enriched BPs when the non-bilaterian (blue bars) or the human (pink bars) backgrounds are used. The non-bilaterian background corresponds to all proteins having a RBH in non-bilaterian species (Groups I+II+III as defined in Fig. 3A), thus assumed to be already present in the eumetazoan last common ancestor (LCA). Most protein-enriched BPs are stable, except cell-cell signaling (bold) that shows a lower enrichment when tested on the human background. This is explained by the expansion of proteins involved in signaling after cnidarian divergence as the numerous proteins involved in the human immune response (e.g. chemokines). Thus enrichment of proteins involved in cell-cell signaling is very significant in cnidarians when compared to Ur-eumetazoa, but less pronounced when compared to human.

B) Total number of protein-enriched BPs in cnidarians when different corrected P-value thresholds are applied. Note the limited effect of the background on the number of BPs that are significantly protein-enriched. The enrichment with 10-3 threshold and human background is shown in Fig. 6A.

6