Literature DB >> 24065732

Punctuated emergences of genetic and phenotypic innovations in eumetazoan, bilaterian, euteleostome, and hominidae ancestors.

Yvan Wenger1, Brigitte Galliot.   

Abstract

Phenotypic traits derive from the selective recruitment of genetic materials over macroevolutionary times, and protein-coding genes constitute an essential component of these materials. We took advantage of the recent production of genomic scale data from sponges and cnidarians, sister groups from eumetazoans and bilaterians, respectively, to date the emergence of human proteins and to infer the timing of acquisition of novel traits through metazoan evolution. Comparing the proteomes of 23 eukaryotes, we find that 33% human proteins have an ortholog in nonmetazoan species. This premetazoan proteome associates with 43% of all annotated human biological processes. Subsequently, four major waves of innovations can be inferred in the last common ancestors of eumetazoans, bilaterians, euteleostomi (bony vertebrates), and hominidae, largely specific to each epoch, whereas early branching deuterostome and chordate phyla show very few innovations. Interestingly, groups of proteins that act together in their modern human functions often originated concomitantly, although the corresponding human phenotypes frequently emerged later. For example, the three cnidarians Acropora, Nematostella, and Hydra express a highly similar protein inventory, and their protein innovations can be affiliated either to traits shared by all eumetazoans (gut differentiation, neurogenesis); or to bilaterian traits present in only some cnidarians (eyes, striated muscle); or to traits not identified yet in this phylum (mesodermal layer, endocrine glands). The variable correspondence between phenotypes predicted from protein enrichments and observed phenotypes suggests that a parallel mechanism repeatedly produce similar phenotypes, thanks to novel regulatory events that independently tie preexisting conserved genetic modules.

Entities:  

Keywords:  eumetazoan innovations; gene ontology enrichment; macroevolution of human orthologs; orthologomes; reciprocal best hits (RBHs); regulatory-based parallel evolution

Mesh:

Substances:

Year:  2013        PMID: 24065732      PMCID: PMC3814200          DOI: 10.1093/gbe/evt142

Source DB:  PubMed          Journal:  Genome Biol Evol        ISSN: 1759-6653            Impact factor:   3.416


Introduction

The appearance of novel phenotypic traits results from genetic changes that affect developmental processes; if subsequently selected, innovations are maintained as robust attributes or modulated to introduce morphological variations (Gould 1992; Carroll 2008). At the molecular level, genetic changes mostly result from rearrangements of preexisting genetic material producing novel coding units and/or novel regulations, which participate to a variable extent to new phenotypes (Lowe et al. 2011). The production of novel genetic coding units can arise from DNA-based or RNA-based gene duplication, a major evolutionary driving force in bilaterian animals (Ohno 1999; Conant and Wolfe 2008), as well as from the transformation of noncoding sequences to coding ones (Kaessmann 2010). When resulting from duplication of preexisting coding units, novel genes can either be maintained functional with highly similar function, creating redundancy or subfunctionalization, or rather bear relaxed constraints, being free to diverge and to lead to novel functions through neofunctionalization (Conant and Wolfe 2008). Beside gene gains, gene losses also contribute to shape phenotypic traits, for example, by creating taxon-specific genetic landscape (Foret et al. 2010). The identification and precise dating of these changes represent the groundwork to the understanding of complex evolutionary mechanisms, and the recent accumulation of genomic-scale data from species that represent a variety of nonmetazoan and metazoan phyla provides the material to measure genomic changes and to investigate when those changes took place. Previous work showed that phylostratigraphy data associated with gene annotation are useful to uncover macroevolutionary adaptive events (Domazet-Loso et al. 2007). The phylostratigraphy approach is designed to capture the birth of “founders” genes or protein domains (Domazet-Loso et al. 2007; Domazet-Loso and Tautz 2010). Practically, it consists in dating the emergence of proteins or protein domains of a reference organism by identifying the contemporary organism with the greatest phylogenetic distance that possess corresponding proteins retrieved by Basic Local Alignment Search Tool (Blast). The timing of the origin of each protein is then deduced from the evolutionary position of the last common ancestor (LCA) of these two species. This approach detects the first occurrence in macroevolutionary times of proteins harboring similar domains but not necessarily orthologs. To systematically trace the origins of all human orthologs, Huerta-Cepas et al. (2007) established a genomewide pipeline to derive the human phylome defined as a complete collection of all gene phylogenies of the human genome. To detect orthology, they automatized many steps of a “classical” phylogenetic analysis using phylogenetic tree and developed pipelines to reduce computing requirements. Both approaches provided fruitful methodologies for complementing time- and power-consuming traditional phylogenetic approaches and to extend them on a genomic scale. However, these studies were performed at a time when proteomic data from poriferan and cnidarian species were not yet available and thus largely ignored the eumetazoan transition. Metazoans are characterized by multicellularity and embryonic development involving gastrulation. Phylogenomic studies recently confirmed the basal origin of Porifera (sponges) among metazoans and the sister group position of bilaterians and coelenterates (cnidarians, ctenophores) to form eumetazoans (Philippe et al. 2009). Compared with porifers, eumetazoans develop a tissue-layered anatomy, differentiate a gut, and possess a nervous system that regulates their muscle activity. Moreover, numerous cnidarian species differentiate sensory organs including eyes (Collins et al. 2006; Galliot et al. 2009; Technau and Steele 2011). However, compared with bilaterians, cnidarians lack a typical mesodermal layer and show a body anatomy radially organized, although sea anemones exhibit some bilaterality. The nervous systems of most cnidarian species include nerve rings but lack a typical central nervous system present in most bilaterian species. As sister group to bilaterians, cnidarians are particularly suitable to trace back the emergence of eumetazoan traits. Bilaterians, whose LCAs arose after Cnidaria divergence, are characterized by a triploblastic body organization along two axes, anterior–posterior and dorsal–ventral, and a centralized nervous system. They are divided into two major groups, the protostomes, itself divided in lophotrochozoans and ecdysozoans, and the deuterostomes (Adoutte et al. 1999; Philippe et al. 2009). Deuterostomes that include echinoderms, hemichordates, and chordates show a large variety of body plans but share a mouth opening secondarily formed during embryonic development as a synapomorphy (Gerhart et al. 2005; Swalla and Smith 2008; Hejnol et al. 2009). Here we took advantage of the genomic and transcriptomic material recently made available, including that from poriferan and cnidarian species, to trace the emergence of human orthologs and predict innovations in prebilaterians and deuterostomes. We deliberately chose to focus on the emergence of human genes, given the current quality and completeness of the human proteome and its good annotation. Practically, we used the human proteome as a reference data set in reciprocal best hits (RBHs, also referred as bidirectional best hits) (Overbeek et al. 1999) to identify human orthologs among 22 species chosen for their phylogenetic positions and for the completeness of their proteomes. We computed “orthologomes,” defined as collections of 1:1 orthologs between two species, to deduce protein gains and losses based on the presence or absence of RBH orthologs. By combining the data on orthology with phylogenetic information, we inferred the gains of the modern human proteins as well as losses in sister groups at nine evolutionary steps. We relied on the widely accepted assumption that multiple independent events of gene loss represent a more parsimonious scenario than the convergent evolution of protein sequences. Previous phylostratigraphic analyses (Domazet-Loso et al. 2007; Huerta-Cepas et al. 2007) rely on the idea that the coordinated emergence of genes sharing an involvement in a particular phenotype represents a “footprint” of the emergence of this phenotypic trait. Here, we used the inferences on human ortholog origins with the detailed annotation of the human proteome and the grouping of its proteins into biological processes (BPs) (Ashburner et al. 2000) to quantify the most significant protein enrichments for BPs active in humans (huBPs) at specific evolutionary steps. Next, we interpreted the protein-enriched huBPs as molecular signatures of phenotypic innovations and thus predicted the different types of innovations that possibly emerged at each considered period. As a result, we identified three periods of high innovations in metazoan LCAs, eumetazoan LCAs, and euteleostome LCAs, and two periods of low innovations, in deuterostome LCAs and chordate LCAs. Finally, considering the variable phenotypes observed in cnidarian species that nevertheless share a similar protein complement, we discuss a scenario of parallelism (Gould 2002) to explain the recurrent but independent emergence of similar phenotypes in periods of high innovation.

Materials and Methods

Selection of the Sequence Data Sets Used to Support the Inferences of Protein Gains and Losses over Time

To form a reference data set, we selected proteomes for their completeness and limited redundancy as indicated in table 1. To represent plants, amoeba, and fungi, we used the proteomes from Arabidopsosis thaliana (Initiative 2000), Dictyostelium discoideum (Eichinger et al. 2005), and Saccharomyces cerevisiae (Giaever et al. 2002), respectively. To infer the gene complement of metazoan-LCAs, we used the proteomes of species belonging to the unicellular amoeba Capsaspora owczarzaki (Ruiz-Trillo et al. 2008; Suga et al. 2013) and to the sister group choanoflagellates, the solitary Monosiga brevicollis and the colonial Salpingoeca rosetta (King et al. 2008; Dayel et al. 2011). To infer metazoan and eumetazoan innovations, we used the proteomes of the sponge Amphimedon queenslandica (Srivastava et al. 2010) and of four cnidarian species, the sea anemone Nematostella vectensis (Putnam et al. 2007), the coral Acropora digitifera (Shinzato et al. 2011), the hydrozoan polyp Hydra, and the hydrozoan jellyfish Clytia hemisphaerica (Foret et al. 2010). For Hydra proteins, a single comprehensive set of 57,611 lowly redundant sequences that include splice variants was produced from the Hydra magnipapillata genome-predicted transcriptome (Chapman et al. 2010) and the H. vulgaris RNA-seq transcriptome (Wenger and Galliot 2013). To infer the protein complement of the bilaterian-LCAs, we tested five protostome proteomes, from Drosophila melanogaster (Adams et al. 2000), Tribolium castaneum (Richards, Gibbs, et al. 2008), Caenorhabditis elegans (Chervitz et al. 1998; Thomas 2008), Trichinella spiralis (Mitreva et al. 2011), and Capitella teleta (Blake et al. 2009). For both insects and nematodes, we used two species data sets, as the classical model systems D. melanogaster and C. elegans express fast-evolving genes whereas T. castaneum and T. spiralis express slow-evolving ones (Aboobaker and Blaxter 2003; Savard et al. 2006). To infer gene gains that took place in deuterostome-LCAs, we used the proteome of the hemichordate Saccoglossus kowalevskii (Lowe 2008; Pani et al. 2012). For tracing innovations that took place in chordate-LCAs, we selected species from two additional invertebrate deuterostome phyla, the cephalochordate amphioxus Branchiostoma floridae (Putnam et al. 2008; Louis et al. 2012), and the urochordate Ciona intestinalis (Dehal et al. 2002; Delsuc et al. 2006). For predicting proteins that emerged with vertebrates, we used the proteomes of Danio rerio (Howe et al. 2013), Xenopus tropicalis (Hellsten et al. 2010), and Gallus gallus (Groenen et al. 2000), and for tracing primate innovations, we used the Macaca mulatta proteome (Gibbs et al. 2007). Each of these proteomes was compared with the human proteome (Lander et al. 2001; Venter et al. 2001), specifically, the Swiss-Prot release 2011_07 was used (20,231 proteins). After conceptual translation, the redundant sequences were removed using the usearch software (Edgar 2010) when necessary.
Table 1

Sources and Characteristics of the Different Proteome Data Sets Used in This Study

LineageSpecies IncludedNumber of SequencesTotal Sequences per GroupType of SequencesRepository
HominidaeH. sapiens20,23120,231Reference proteome setUniProtKB

Nonhominidae primatesM. mulata34,43434,434Reference proteome setUniProtKB

Nonprimate vertebratesX. tropicalis23,34485,224Reference proteome setUniProtKB
G. gallus21,541Reference proteome setUniProtKB
D. rerio40,339Reference proteome setUniProtKB

CephalochordatesB. floridae28,54542,547Reference proteome setUniProtKB

UrochordatesC. intestinalis14,002Genome-predicted proteomeJGI

HemichordatesS. kowalevskii43,57256,156Genome-predicted proteomeJGI
12,584RefSeqNCBI

ProtostomesT. spiralis16,246106,607ProteomeUniProtKB
D. melanogaster17,563Reference proteome setUniProtKB
T. castaneum16,986Complete proteome setUniProtKB
C. teleta32,415Genome-predicted proteomeJGI
C. elegans23,397Reference proteome setUniProtKB

CnidariansN. vectensis24,435199,482Reference proteome setUniProtKB
A. digitifera30,666Assembled ESTsCompagen
23,677Genome-predicted proteomeOIST -MGU
H. vulgaris H. magnipapillata36,780RNA-seqOIST -MGU
57,611RNA-seqENA
Genome predictedNCBI, JGI
C. hemisphaerica26,313Single-pass ESTsNCBI, Compagen

PoriferansA. queenslandica30,06030,060Genome-predicted proteomeJGI

Non-metazoansA. thaliana27,41675,642Genome-predicted proteomeTAIR
S. cerevisiae6,643Reference proteome setUniProtKB
D. discoideum12,318ProteomedictyBase
C. owczarzaki8,374Complete proteome setUniProtKB
M. brevicollis9,188Reference proteome setUniProtKB
S. rosetta11,703Complete proteome setUniProtKB

Total650,383

Note.—ENA, European Nucleotide Archive; JGI, Joint Genome Institute; NCBI, National Center for Biotechnology Information; OIST-MGU, Okinawa Institute of Science and Technology—Marine Genomics Unit. For references, see Results.

Sources and Characteristics of the Different Proteome Data Sets Used in This Study Note.—ENA, European Nucleotide Archive; JGI, Joint Genome Institute; NCBI, National Center for Biotechnology Information; OIST-MGU, Okinawa Institute of Science and Technology—Marine Genomics Unit. For references, see Results.

Analysis of RBHs

The human, Capitella, Drosophila, and Hydra proteomes were used as input for BlastP+ using a maximum e value threshold of 10−10, with soft masking as suggested by (Moreno-Hagelsieb and Latimer 2008). Relations retained as RBHs fulfilled two criteria (fig. 1): 1) best score between a given query and the different hits (red arrow), 2) best score between a given hit and the different queries (blue arrows). Query/hit pairs satisfying only one of the two criteria above were assigned an alignment bit score of 10, whereas queries with no blast hit were assigned an alignment bit score of 1.
F

RBH computing. The RBH process takes place after a reasonably complete proteome (here human) is aligned unidirectionally to another whole proteome (here Hydra). (A) After BlastP+ (e value 10−10) relations between the human and Hydra protein sets are established, represented by a series of basal hits between either a given human protein and several Hydra proteins (black arrows) or inferred between a given Hydra protein and several human proteins (gray arrows). Each of these relationships receives a Blast score (numbers next to the arrows) that is valid for both the query–hit and the hit–query relationships. (B) Relations that are retained as RBHs fulfill two criteria: 1) Best score between a given query and the different hits (red arrow) and 2) best score between a given hit and the different queries (blue arrow). (C) In the case where two or more query/hit relationships with a shared query or hit qualify as RBH, one pair is selected randomly. This scenario typically takes places when nearly identical paralog sequences are present in the query or target proteomes.

RBH computing. The RBH process takes place after a reasonably complete proteome (here human) is aligned unidirectionally to another whole proteome (here Hydra). (A) After BlastP+ (e value 10−10) relations between the human and Hydra protein sets are established, represented by a series of basal hits between either a given human protein and several Hydra proteins (black arrows) or inferred between a given Hydra protein and several human proteins (gray arrows). Each of these relationships receives a Blast score (numbers next to the arrows) that is valid for both the query–hit and the hit–query relationships. (B) Relations that are retained as RBHs fulfill two criteria: 1) Best score between a given query and the different hits (red arrow) and 2) best score between a given hit and the different queries (blue arrow). (C) In the case where two or more query/hit relationships with a shared query or hit qualify as RBH, one pair is selected randomly. This scenario typically takes places when nearly identical paralog sequences are present in the query or target proteomes.

Analysis of Gene Ontologies enrichments

The human Uniprot accessions were used as an input to Gene Ontology (GO)::TermFinder (Boyle et al. 2004). The gene ontology file (format 1.2, data version v1.1.2455) and the Gene association file (UniprotKB-GOA v1.220) were downloaded from the gene ontology consortium website (www.geneontology.org, last accessed October 7, 2013). The background used for the identification of protein-enriched BPs is the full human reference proteome. For emergences inferred in early-branching eumetazoans, the background comprises human Swiss-Prot proteins with an RBH in any of the nonbilaterian species. GO::TermFinder provides P values calculated using the hypergeometric distribution. Only protein enrichments with corrected P value ≤10−3 were considered. In addition, the program corrects for multiple hypothesis testing by dividing raw P values by the total number of nodes to which the provided list of genes are annotated (only nodes containing two or more annotation in the background are counted). These corrected P values were used in the analyses performed here.

Results

The RBH Strategy to Trace the Emergence of Genetic Novelties

To monitor the origin and evolution of eumetazoan genes, we relied on orthology prediction based on the RBH approach (Overbeek et al. 1999; Moreno-Hagelsieb and Latimer 2008). RBH is a fast and efficient method geared toward detecting the closest 1:1 orthologs (fig. 1). However, being designed to detect orthology between two species at a time, it may also identify in one of the tested species a close paralog rather than the genuine ortholog, when the data set is incomplete or when the ortholog was lost in one of the two tested lineages (see Discussion). To characterize the number of shared proteins with plants, fungi, choanoflagellates, and metazoans, we selected four species with rather complete proteomes, human, Capitella, Drosophila, and Hydra, and independently aligned their respective sequences to the protein sequences of 23 species. The number of shared proteins (RBHs orthologs) between two species defines the size of the orthologome. The computation of orthologome sizes yielded similar results when compared with the inParanoid software on independent data sets or with the data sets provided by inParanoid (Ostlund et al. 2010) (supplementary fig. S1, Supplementary Material online). We also performed a comparative analysis of the phylostratigraphic and RBH approaches. For this, we analyzed by both methods the timing of emergences of founder domains and human orthologs of 900 human gatekeeper cancer genes as reported by Domazet-Loso and Tautz (2010). We performed a BLASTp analysis of these 900 proteins on the species data set used in this study and found results roughly similar to those obtained by Domazet-Loso and Tautz on the NCBI nr data set, with a majority of protein domains already present in preopisthokonts (supplementary fig. S2, Supplementary Material online, compare green bars with blue bars). However, the emergence of founders identified by BLASTp differs from the emergence of orthologs identified by RBH (supplementary fig. S2, Supplementary Material online, red bars): those appear more recent with four major periods of emergences (>100 genes), in the LCAs of preopisthokonts, opisthokonts, eumetazoans, and vertebrates, respectively. The different distributions yielded by the “founder domains” and the RBH ortholog detection methods indicate that most protein domains indeed originated in preopisthokonts, whereas less than 20% of the gatekeeper genes can be identified as orthologs in this period. This result suggests that the founder domains were secondarily recruited as gatekeeper genes. These later genetic rearrangements led to the appearance of proteins that have then been under sufficient selective pressure to be characterized as orthologs in the crown organisms considered here.

Comparative Analysis of Orthologomes in Opisthokonts

To measure the variations in orthologome sizes between different phyla, we first tested the human, Drosophila, Capitella, and Hydra proteomes on noneumetazoan species and recorded similar orthologome sizes, 3,000–3,200 large with Arabidopsis and Dictyostelium, 2,000 with S. cerevisiae, 3,500–4,000 with Capsaspora and the choanoflagellates Monosiga and Salpingoeca, up to 5,200–5,500 with Amphimedon (fig. 2). The Saccharomyces orthologomes do not reflect the protein equipment of the fungi LCA, because the S. cerevisiae genome underwent a drastic reduction when compared with other fungi (Cliften et al. 2006). Indeed, we detected larger orthologomes for four other fungi, ranging from 2,410 to 3,315 (supplementary fig. S3, Supplementary Material online). Similarly, Drosophila orthologomes are consistently smaller (yellow bars), indicating that this species also underwent significant gene losses. Indeed, none of the Drosophila-cnidarian orthologomes reach 5,000, whereas the human, Capitella, and Hydra orthologomes tested on cnidarian proteomes exhibit significantly larger sizes (6,696, 7,191, and 7,138, respectively, with Nematostella).
F

Evolution of the respective sizes of the human, Drosophila, Capitella, and Hydra orthologomes. Sequences of the Hydra, Drosophila, Capitella, and human proteomes were used to size independently orthologomes on representative eukaryotes. Timings of radiations were taken from Battacharya et al. (2009) for holozoans; from Peterson et al. (2008) for metazoans, eumetazoans, bilaterians, deuterostomes, and vertebrates; from Steiper and Young (2009) for primates. We arbitrarily placed Chordata origin at midtime between Deuterostomia and Vertebrata origins in agreement with Ayala et al. (1998). Each bar represents the number of RBHs obtained between human (black), Capitella (red), Drosophila (yellow), and Hydra (gray) proteomes and the indicated species. Size of the orthologomes is given for human and Hydra. Note the impact of proteome completeness with the two Saccoglossus data sets.

Evolution of the respective sizes of the human, Drosophila, Capitella, and Hydra orthologomes. Sequences of the Hydra, Drosophila, Capitella, and human proteomes were used to size independently orthologomes on representative eukaryotes. Timings of radiations were taken from Battacharya et al. (2009) for holozoans; from Peterson et al. (2008) for metazoans, eumetazoans, bilaterians, deuterostomes, and vertebrates; from Steiper and Young (2009) for primates. We arbitrarily placed Chordata origin at midtime between Deuterostomia and Vertebrata origins in agreement with Ayala et al. (1998). Each bar represents the number of RBHs obtained between human (black), Capitella (red), Drosophila (yellow), and Hydra (gray) proteomes and the indicated species. Size of the orthologomes is given for human and Hydra. Note the impact of proteome completeness with the two Saccoglossus data sets. When tested on bilaterian invertebrates, human, Capitella, and Hydra share the largest orthologomes with the hemichordate Saccoglossus (7,830, 8,254, and 6,631, respectively), the cephalochordate Branchiostoma (7,508, 7,640, and 5,976, respectively), but also the polychaete Capitella (7,444 for human, 6,361 for Hydra, fig. 2). As previously noted, the Drosophila orthologomes are significantly smaller, reaching 5,950 with Capitella, but never exceed 6,000 except with the closely related beetle Tribolium (7,158). When tested on nematodes, the orthologome sizes drop even more drastically, Capitella orthologomes showing the highest numbers with 4,693 on C. elegans and 3,701 on T. spiralis. In fact, all ecdysozoan proteomes tested here exhibit smaller orthologomes than the Capitella, deuterostome or cnidarian proteomes used here, suggesting that ecdysozoan LCAs either lost a significant number of metazoan gene families and/or were submitted to a faster sequence evolution. As expected, human orthologomes become much larger when tested on vertebrate proteomes (∼12,000 for nonprimates, 16,930 for Macaca), reflecting their closer evolutionary relationships. The complete human RBH orthologomes are detailed in supplementary table S1, Supplementary Material online. In conclusion, the RBH approach provides a fast, efficient, and stringent although not exhaustive methodology to identify pools of orthologs between species when extensive proteomes are available. The concurrent increase in the sizes of the human, Capitella, and Drosophila orthologomes when tested on sponge and cnidarian proteomes indicate that both the metazoan-LCAs and the eumetazoan-LCAs acquired a significant number of novel genes.

Emergence of Human Orthologs in Metazoan, Eumetazoan, and Bilaterian LCAs

To analyze the origins of the human protein complement, we first extracted the core metazoan orthologome, which comprises orthologs shared between humans and at least one cnidarian and one noneumetazoan species (fig. 3, Group I). This core metazoan orthologome contains 6,701 proteins that account for 33.1% of the 20,231 human proteins used in this study; 4,043 proteins (60%) are affiliated to huBPs containing the word “metabolic” and are thus presumably involved in metabolic functions (ribosome biogenesis, transcription, translation, cell cycle regulation). We then inferred two complementary groups that originated prior to bilaterians. Group II contains 1,087 human orthologs (5.4%) detected in noneumetazoan species but no longer found in cnidarians, thus originating before eumetazoans but lost or highly divergent in cnidarians. Group III contains 2,422 human proteins (12%) that emerged with eumetazoan LCAs as evidenced by their presence in at least one cnidarian species but their absence in noneumetazoan species (figs. 3A and 3B). Thus, Group III represents potential eumetazoan novelties. Finally, 10,021 human proteins (49.5%, Group IV) could not be affiliated to orthologs in nonbilaterian proteomes, indicating that they most likely emerged after Cnidaria divergence. Hence, by analyzing the orthologous relationships of each human protein, we could deduce the period when most of them emerged, premetazoan for 38.5%, protoeumetazoan for 12%, and protobilaterian or bilaterian for 49.5%.
F

Expansion of human orthologs in the LCAs of metazoans, eumetazoans, and bilaterians. (A) Plot showing the RBH scores obtained by 20,231 human proteins tested on seven noneumetazoan proteomes (x axis, Groups I and II) and on four cnidarian proteomes (y axis, Group III). Among these, 7,789 were present in the LCAs of metazoans (I, II), 2,422 (12%) originated in the LCAs of eumetazoans (Group III), and 10,020 (49.5%) represent postcnidarian novelties (Group IV). Note the distribution of proteins involved in human mesoderm development (blue) and ribosome biogenesis (red). (B) Scheme recapitulating the prebilaterian evolutionary events of human proteins: Emergences (star), losses (empty square), and family expansions (triangle). For species abbreviations, see figure 2.

Expansion of human orthologs in the LCAs of metazoans, eumetazoans, and bilaterians. (A) Plot showing the RBH scores obtained by 20,231 human proteins tested on seven noneumetazoan proteomes (x axis, Groups I and II) and on four cnidarian proteomes (y axis, Group III). Among these, 7,789 were present in the LCAs of metazoans (I, II), 2,422 (12%) originated in the LCAs of eumetazoans (Group III), and 10,020 (49.5%) represent postcnidarian novelties (Group IV). Note the distribution of proteins involved in human mesoderm development (blue) and ribosome biogenesis (red). (B) Scheme recapitulating the prebilaterian evolutionary events of human proteins: Emergences (star), losses (empty square), and family expansions (triangle). For species abbreviations, see figure 2.

Gene Expansions and Gene Losses across Metazoans

Next, we focused our interest on the innovations that took place in metazoan, eumetazoan, deuterostome, chordate, vertebrate, and primate LCAs. To characterize gains and losses of proteins over each evolutionary period, we mapped the 20,231 human proteins to the proteomes of 21 holozoan species, as shown in figure 2, and inferred that protein gain had taken place in the LCA of a given clade when i) species derived from this LCA possess a human ortholog and ii) no occurrence is observed in species branching from more ancient ancestors (figs. 4A and 4B). As S. cerevisiae underwent severe genome reduction, a complementary analysis was performed on 25 holozoan species that include four additional fungal species. This analysis yields highly similar results (supplementary fig. S3, Supplementary Material online). We also inferred protein loss within a given clade when orthologs to human proteins were not found in species of this clade but were present in sister groups or in phyla having diverged earlier (fig. 4B). In this study, losses that affect branches or ancestors with human descendants cannot be traced.
F

Timing of emergences of human orthologs and related Biological Processes (huBPs) in metazoan evolution. (A) Parallel bursts in human orthologs’ gains (green bars) and emergence of huBPs (gray bars, corrected P value ≤10−3). (B) Gains (green bars) and losses (blue bars) in human orthologs obtained by testing the complete human proteome against the proteomes of species belonging to phyla branching at various steps of metazoan evolution. (C) Rates of emergence of human orthologs across metazoan evolution expressed as numbers of novel ortholog proteins (y axis) detected by million year (Myr). Rates were deduced from the protein gains shown in A and B over the time periods separating the LCAs of two clades as indicated by inverted arrows at the bottom. References for each time period are given in the legend of figure 2.

Timing of emergences of human orthologs and related Biological Processes (huBPs) in metazoan evolution. (A) Parallel bursts in human orthologs’ gains (green bars) and emergence of huBPs (gray bars, corrected P value ≤10−3). (B) Gains (green bars) and losses (blue bars) in human orthologs obtained by testing the complete human proteome against the proteomes of species belonging to phyla branching at various steps of metazoan evolution. (C) Rates of emergence of human orthologs across metazoan evolution expressed as numbers of novel ortholog proteins (y axis) detected by million year (Myr). Rates were deduced from the protein gains shown in A and B over the time periods separating the LCAs of two clades as indicated by inverted arrows at the bottom. References for each time period are given in the legend of figure 2. This approach confirmed important gains (>1,000 novel proteins) in the LCAs of metazoans (1,119), eumetazoans (2,422), bilaterians (1,054), euteleostomes (2,347), amniotes (1,119), primates (2,446), and hominidae (2,206) (fig. 4A). By contrast, in hemichordate, cephalochordate, and urochordate species used to infer novelties in the LCAs of deuterostomes and chordates, we detected important protein losses (>3,000) and limited genetic gains (fig. 4B). Similar patterns were observed for developmental proteins, except for hominidae that show a limited gain in such proteins (data not shown).

Unequal Rates of Protein Repertoire Diversifications across Evolution

To verify the nonlinear pattern of emergence of genetic novelties across metazoans, we evaluated the rates of human ortholog gains per million year (Myr) and indeed found highly variable rates of genetic changes (fig. 4C). We measured the highest rate of innovations during the hominidae transition after Macaca divergence (88 novel proteins/Myr (np/Myr)); we then observed high rates (>20 np/Myr) in LCAs of eumetazoans, bilaterians, and euteleostomes. By contrast, we recorded low rates (<12 np/Myr) at five distinct periods, in the LCAs of metazoans, deuterostomes, chordates, amniotes, and primates. The large number of novel proteins detected in Xenopus, Gallus (1,119), and Macaca (2,446), emerged over long periods of time (∼450 Myr), causing the acquisition rate to be low (fig. 4C). Given the uncertainty on the dating of some periods, as for example the chordate speciation (Ayala et al. 1998; Peterson et al. 2008), the absolute value of these rates should be considered with caution. However, the contrast between the various periods is striking, particularly the protoeumetazoan and the protoeuteleostome periods, when a high number of novel orthologs (>2,300) emerged at a high rate (>20 np/Myr) and associate with high numbers of huBPs (>180).

Sequential Emergence of Innovations in Metazoans Predicted from Protein Enrichment

To predict innovations linked to the emergence of novel human orthologs, we compared the quantitative representation of protein associated with huBPs gained in each lineage (observed frequency) to the quantitative representation in the human proteome (expected frequency). We then extracted groups that were significantly enriched for huBPs (fig. 4A and supplementary table S2, Supplementary Material online). Overall, we recorded a significant correlation (R2 = 82%, P < 0.001) between the number of novel human orthologs and the number of BPs that show protein enrichment (protein-enriched BPs) at the various evolutionary steps investigated here, but this correlation does not hold for most recent expansions within vertebrates. To assess the potential bias introduced by proteins involved in multiple but very related BPs, we identified BPs that share a large number of proteins (>90%, supplementary fig. S4, Supplementary Material online) and found that protein redundancy between BPs indeed affects the numbers of protein-enriched BP novelties but does not alter the historical profiles on gene gains and associated conclusions, except for the hominidae category where the reduction is important (supplementary fig. S4, Supplementary Material online).

Sixty Predicted Innovations in Metazoan-LCAs Point to Embryonic Development

The 6,670 human orthologs detected in at least one nonmetazoan species distribute into 530 protein-enriched BPs (fig. 4A), which, similarly to the core metazoan orthologome (Group I, fig. 3A), associate with huBPs predominantly related to metabolic processes (75%, table 2, supplementary table S2, Supplementary Material online). By contrast the 1,119 novel orthologs identified in Amphimedon proteome associates with 60 protein-enriched BPs (fig. 4A) mostly related to embryonic development (fig. 5A, table 2 and supplementary table S2, Supplementary Material online). This rather low number of novel proteins and associated BP in porifers is in agreement with the notion that transitions from unicellularity to multicellularity might have required a limited number of genetic innovations (Grosberg and Strathmann 2007; Ratcliff et al. 2012). However, the data set from porifers is still limited; therefore, some of the protein gains currently mapped to the eumetazoan transition might receive an earlier origin when genomic information will be extended to more porifer species.
Table 2

List of the 10 Most Significantly Protein-Enriched BPs Detected at Nine Evolutionary Periods

BP NumberBP NameCorr. P ValueEnrichment
Nonmetazoans A. thaliana S. cerevisiae D. discoideum C. owczarzaki M. brevicollis S. rosettaGO:0044248Cellular catabolic process1.4E-1402.1
GO:0016070RNA metabolic process2.0E-1342
GO:0006396RNA processing1.1E-1132.4
GO:0046907Intracellular transport1.5E-1052.1
GO:0016071mRNA metabolic process5.1E-1052.4
GO:0009057Macromolecule catabolic process8.2E-862.2
GO:0044265Cellular macromolecule catabolic process3.1E-812.2
GO:0042180Cellular ketone metabolic process1.5E-792.1
GO:0006082Organic acid metabolic process1.0E-762.1
GO:0019752Carboxylic acid metabolic process4.0E-762.1

Porifer LCA A. queenslandicaGO:0009790Embryo development2.6E-072.1
GO:0009887Organ morphogenesis3.2E-062.1
GO:0009792Embryo development ending in birth or egg hatching8.1E-062.4

Cnidarian LCA N. vectensis A. digitifera H. vulgaris C. hemisphaericaGO:0009653Anatomical structure morphogenesis7.3E-402
GO:0007399Nervous system development5.5E-372
GO:0007417Central nervous system development3.4E-292.6
GO:0048699Generation of neurons8.5E-262.1
GO:0022008Neurogenesis9.5E-262.1
GO:0009887Organ morphogenesis2.6E-252.4
GO:0007420Brain development7.2E-252.8
GO:0030182Neuron differentiation3.7E-242.3
GO:0010628Positive regulation of gene expression1.1E-232.1
GO:0045893Positive regulation of transcription, DNA-dependent2.3E-232.1

Bilaterian LCA T. spiralis, C. elegans, D. melanogaster, T. castaneum, C. teletaGO:0007399Nervous system development1.2E-142
GO:0010628Positive regulation of gene expression1.3E-142.4
GO:0031327Negative regulation of cellular biosynthetic process1.4E-142.4
GO:0045944Positive regulation of transcription from RNA polymerase II promoter2.0E-142.8
GO:0006357Regulation of transcription from RNA polymerase II promoter2.5E-142.3
GO:0045893Positive regulation of transcription, DNA-dependent4.0E-142.4
GO:0009890Negative regulation of biosynthetic process4.7E-142.4
GO:0051254Positive regulation of RNA metabolic process2.0E-122.3
GO:0009892Negative regulation of metabolic process3.4E-122.1
GO:0010557Positive regulation of macromolecule biosynthetic process3.6E-122.2

Deuterostome LCA S. kowalevskiiNone--

Chordate LCA B. floridae, C. intestinalisGO:0060537Muscle tissue development9.6E-064.9
Nonprimate vertebrates D. rerio, X. tropicalis, G. gallusGO:0007155Cell adhesion5.0E-382.3
GO:0022610Biological adhesion5.0E-382.3
GO:0007186G-protein coupled receptor signaling pathway1.3E-352.5
GO:0016337Cell–cell adhesion4.9E-182.5
GO:0032101Regulation of response to external stimulus1.0E-152.4
GO:0051050Positive regulation of transport1.8E-152.2
GO:0050730Regulation of peptidyl-tyrosine phosphorylation9.9E-153.3
GO:0006873Cellular ion homeostasis1.7E-142
GO:0006954Inflammatory response2.2E-142.3
GO:0050731Positive regulation of peptidyl-tyrosine phosphorylation3.4E-143.5

Nonhominidae primates M. mulattaGO:0042742Defense response to bacterium1.6E-173.8
GO:0051707Response to other organism1.6E-142.2
GO:0009607Response to biotic stimulus1.6E-132.1
GO:0009617Response to bacterium5.2E-092.2
GO:0050909Sensory perception of taste4.9E-084.4
GO:0007606Sensory perception of chemical stimulus6.3E-083

Hominidae H. sapiensGO:0006958Complement activation, classical pathway1.8E-788.5
GO:0002455Humoral immune response mediated by circulating Ig4.2E-778.4
GO:0006956Complement activation5.0E-758.1
GO:0072376Protein activation cascade1.6E-677.2
GO:0016064Immunoglobulin mediated immune response5.8E-677.2
GO:0019724B cell mediated immunity2.7E-667.2
GO:0002449Lymphocyte mediated immunity2.0E-626.7
GO:0006959Humoral immune response1.1E-606.5
GO:0002460Adaptive immune response based on somatic recombination of immune receptors built from immunoglobulin superfamily domains5.1E-596.3
GO:0002443Leukocyte mediated immunity3.1E-565.9

Note.—All processes listed here are enriched at least 2-fold with P values lower than 10−5. For the complete list of protein-enriched BPs, see supplementary table S2, Supplementary Material online.

F

Characterization of the ortholog-deduced Biological Processes (huBPs) emerged in the LCAs of metazoans (A), bilaterians (B), chordates (C), vertebrates (D), and primates (E). BPs showing protein enrichment ≥2 times (horizontal scale) are depicted by a circle whose surface is proportional to the number of proteins. The color code indicates two levels of statistical significance (see inset). Note the significantly enriched huBPs in LCAs of each period (see table 2): Embryonic development in protometazoans; neurogenesis, organ morphogenesis and regulation of transcription in protoeumetazoans; nervous system development and regulation of biosynthetic process in protobilaterians; muscle tissue development in protochordates; cell adhesion, response to external stimulus, G-protein coupled receptor signaling pathway and inflammatory response in vertebrates; sensory perception and defense response to bacterium in primates; complement activation, humoral immune response, and leukocyte-mediated immunity in hominidae. For the full list of protein-enriched BPs, see supplementary table S2, Supplementary Material online.

Characterization of the ortholog-deduced Biological Processes (huBPs) emerged in the LCAs of metazoans (A), bilaterians (B), chordates (C), vertebrates (D), and primates (E). BPs showing protein enrichment ≥2 times (horizontal scale) are depicted by a circle whose surface is proportional to the number of proteins. The color code indicates two levels of statistical significance (see inset). Note the significantly enriched huBPs in LCAs of each period (see table 2): Embryonic development in protometazoans; neurogenesis, organ morphogenesis and regulation of transcription in protoeumetazoans; nervous system development and regulation of biosynthetic process in protobilaterians; muscle tissue development in protochordates; cell adhesion, response to external stimulus, G-protein coupled receptor signaling pathway and inflammatory response in vertebrates; sensory perception and defense response to bacterium in primates; complement activation, humoral immune response, and leukocyte-mediated immunity in hominidae. For the full list of protein-enriched BPs, see supplementary table S2, Supplementary Material online. List of the 10 Most Significantly Protein-Enriched BPs Detected at Nine Evolutionary Periods Note.—All processes listed here are enriched at least 2-fold with P values lower than 10−5. For the complete list of protein-enriched BPs, see supplementary table S2, Supplementary Material online.

The 242 Predicted Innovations in Eumetazoan-LCAs Point to Cell–Cell Signaling, Morphogenesis, and Neurogenesis

The 2,422 novel eumetazoan proteins identified in cnidarians (fig. 3) associate with 242 protein-enriched huBPs; this is the largest number observed throughout the metazoan evolutionary steps selected here (figs. 4A and 4B). To test the robustness of these protein-enriched huBPs, we measured the enrichment of cnidarian proteins either over the human background (as in every other condition) or over the nonbilaterian background. The two methods yielded very similar results on strongly significant BPs (supplementary fig. S5, Supplementary Material online). However cell–cell signaling had a lower significance when tested on the human background rather than on the nonbilaterian background. A major difference exists between these two backgrounds, that is, a second wave of vertebrate-specific expansion of protein families involved in signaling, such as cytokines involved in immune response (fig. 5D), which “dilutes” the original enrichment signal. Beside cell–cell signaling, novel BPs in eumetazoan-LCAs include processes linked to epithelium tube morphogenesis, pattern specification, organ morphogenesis, sensory organ development, regulation of ossification, cell-fate commitment, neurogenesis, and eye development (fig. 6A). At the molecular level, the diversification of the Wnt and BMP signaling pathways and the presence of 183 novel transcription factors appear as robust eumetazoan innovations (supplementary table S3, Supplementary Material online), in agreement with previous reports (Kusserow et al. 2005; Saina et al. 2009; Galliot and Quiquand 2011).
F

Characterization of the huBPs associated with major protein gains in cnidarians. (A) Enrichments in proteins for a given huBP were identified in cnidarians (2,422 proteins in Group III) over the 10,211 protobilaterian proteins (Groups I + II + III) as background. The huBPs showing protein enrichment ≥2 times with corrected P values ≤10−5 are shown for cnidarians (red), and noneumetazoans (blue, green). The numbers after each huBP indicate the number of proteins in Groups I, II, III, respectively. The huBPs with protein enrichment ≥2.5x are written bold. For details, see supplementary table S3, Supplementary Material online. (B) Similar gains of novel human orthologs associated with selected huBPs in anthozoan (Acropora, Nematostella) and medusozoan (Hydra) cnidarian proteomes. These three cnidarian species exhibiting widely different lifestyles and morphologies. The scale represents the number of proteins identified in the proteome of each cnidarian species for each indicated huBP.

Characterization of the huBPs associated with major protein gains in cnidarians. (A) Enrichments in proteins for a given huBP were identified in cnidarians (2,422 proteins in Group III) over the 10,211 protobilaterian proteins (Groups I + II + III) as background. The huBPs showing protein enrichment ≥2 times with corrected P values ≤10−5 are shown for cnidarians (red), and noneumetazoans (blue, green). The numbers after each huBP indicate the number of proteins in Groups I, II, III, respectively. The huBPs with protein enrichment ≥2.5x are written bold. For details, see supplementary table S3, Supplementary Material online. (B) Similar gains of novel human orthologs associated with selected huBPs in anthozoan (Acropora, Nematostella) and medusozoan (Hydra) cnidarian proteomes. These three cnidarian species exhibiting widely different lifestyles and morphologies. The scale represents the number of proteins identified in the proteome of each cnidarian species for each indicated huBP.

The 95 Predicted Innovations in Bilaterian-LCAs Relate to Organogenesis, Skeletal Development, and Nervous System Development

The emergence of bilaterians correlates with 1,054 bilaterian-specific proteins present in human and at least one protostome proteome but absent from nonbilaterian proteomes. The analysis of these proteins point to 95 protein-enriched BPs (fig. 4A). As anticipated, those scoring highest are regarded as bilaterian-specific, related to nervous system development, embryonic organ morphogenesis, and embryonic skeletal system (fig. 5B and supplementary table S2, Supplementary Material online). Molecular innovations in bilaterians also include regulation of biosynthetic processes, regulation of transcription, novel nuclear receptors, in particular, steroid hormone receptor as previously reported (Bridgham et al. 2010; Lowe et al. 2011).

Few Protein-Predicted Innovations in Deuterostome-LCAs and Chordate-LCAs

To study the genetic modifications in deuterostome LCAs, we used the S. kowaleskii proteome, which despite significant losses (fig. 4B) represents well the nonchordate deuterostomes (Pani et al. 2012). The number of novel human RBHs orthologs in Saccoglossus is low (361 proteins, fig. 4B) and does not exhibit any protein-enriched huBPs (corrected P values ≤ 10−3), although at a lower level of significance, some proteins associate with glycolipid metabolism (supplementary table S2, Supplementary Material online). Similarly, the proteomes of the cephalochordate B. floridae and the urochordate C. intestinalis contain a rather low number of human orthologs absent from nonchordate proteomes (440, fig. 4B). These proteomes show a massive loss or divergence of human orthologs including developmental proteins (fig. 4B and not shown). The gain of 440 novel proteins is associated with 10 huBP novelties restricted to striated muscle development (figs. 4A and 5C). Hence, at these two periods, emergence of deuterostomes and chordates, novel huBPs inferred from protein enrichment appear very limited (supplementary table S2, Supplementary Material online), suggesting that innovations in deuterostome and chordate ancestors rather relied on mechanisms distinct from gene repertoire expansion. However, given the massive loss (or divergence) of proteins noted in these three species, this conclusion should be confirmed by testing the proteomes of additional species to definitely sort out phylum-specific from lineage-specific events (see Discussion).

The 222 Predicted Innovations in Nonprimate Vertebrates Point to Signaling, Cell Adhesion, Wound Healing, and Coagulation

By contrast, nonprimate vertebrate proteomes, represented here by D. rerio, X. tropicalis, and G. gallus, contain a large number of novel proteins, 3,466 (17.1%) as deduced from their absence from all invertebrate proteomes. These proteins show a significant enrichment for 222 BPs (fig. 4A): 73 of these BPs (32%) are related to cell communication, signal transduction, and cell surface receptor signaling pathway represented by 957 proteins, including 270 linked to G-protein coupled receptor activity (fig. 5D). Strongly protein-enriched BPs in vertebrates also point to wound healing, blood coagulation, calcium-independent cell–cell adhesion, organization of adherens junction (cadherins, cell adhesion proteins).

Limited Number of Predicted Innovations in the LCAs of Primates and Hominidae

Finally, despite a high number of novel proteins in primates and Hominidae, 2,446 and 2,206 respectively, by definition absent from all nonprimate and nonhominidae proteomes, we found a rather low number of protein-enriched BPs, 22 for primates, and 48 for hominidae (fig. 4A). This association between large numbers of novel proteins and low numbers of huBPs reflects the affiliation of most primate novel proteins to few BPs. Indeed, we found that the huBPs showing a protein enrichment >2 times in primates are all associated with sensory perception, response to other organism, and response to bacteria (fig. 5E and supplementary table S2, Supplementary Material online). Similarly, novel proteins enriched >2 times in humans are all dedicated to immune response (supplementary table S2, Supplementary Material online).

Similar Predicted Innovations in Cnidarian Species with Distinct Phenotypes

Next we analyzed whether the predicted eumetazoan innovations deduced from protein-enriched huBPs correspond to actual phenotypes in cnidarians. We found that the predicted innovations correspond to three distinct types of phenotypes: constrained when observed in all cnidarians and maintained in all bilaterians such as neurogenesis or gut development; labile when observed in some but not all cnidarian species, and frequently expressed in bilaterians such as eye development, mesodermal derivatives, and biomineralization; latent when not observed in cnidarians but widely conserved in bilaterians, for example, proteins directing central nervous system, skeletal, or endocrine development (fig. 8). Hence, protein-based predicted innovations in cnidarians are actually expressed with high variability.
F

Model of a regulatory-based parallel mechanism for the emergence of innovations as deduced from the comparison of predicted and observed phenotypes in cnidarians. (A) Innovations in cnidarians, i.e., absent in nonmetazoans or in earlier branching metazoans as porifers, were sorted in four categories of phenotypes: Constrained (dark green) when present in all cnidarians and maintained in bilaterians; labile (light purple) when expressed in some but not all cnidarian species, and largely expressed in bilaterians including vertebrates; latent (light blue) when observed in bilaterians but not in cnidarians; cnidarian-specific (orange) when restricted to cnidarians. Some eumetazoan innovations evolved differently in cnidarians and bilaterians as the sensory motoneurons and the myoepithelial cells that remained multifunctional in cnidarians (light green) but differentiated in more specialized cell types in bilaterians (Arendt 2008). (B) Cnidarian proteomes contain similar numbers of human orthologs (dots), labelled here according to their origin as premetazoan (green), metazoan (blue), eumetazoan (red), or taxon-restricted (yellow). These proteins can participate in genetic modules (GM) that can give rise to constrained phenotypes (gray backgrounds) when regulations between the different proteins are tightly linked, with a limited potential for innovation (upper panel). When forming GM with loose links between preexisting tight GM (middle panel), these protein networks can give rise to labile phenotypes (purple background), prone to innovation through parallel evolution. When not included in predicted GM, the corresponding phenotypes are latent (lowest panel). However, these proteins likely form also taxon-restricted GM that support taxon-specific phenotype (orange background) as anatomical and life cycle differences (Foret et al. 2010; Wenger and Galliot 2013).

To test whether these eumetazoan-specific novel huBPs correspond to unique genetic sets or rather involve proteins that participate in several phenotypes, we performed an overlap analysis of the protein-enriched BPs that were significant. We found a high variability in the protein overlaps depending on the huBPs combinations considered (fig. 7): few huBP combinations exhibit almost complete overlaps (shown in red), whereas most groups show a limited overlap, in a range from 0% to 50%, illustrating the fact that a subset of proteins may participate in multiple BPs. As a consequence, we assume that proteins related to huBPs that are not expressed in cnidarians yet are nevertheless constrained by their participation in other BPs.
F

Versatility of novel eumetazoan proteins: Heatmap showing a limited overlap between the protein contents of the BPs that are enriched in novel cnidarian human orthologs.

Versatility of novel eumetazoan proteins: Heatmap showing a limited overlap between the protein contents of the BPs that are enriched in novel cnidarian human orthologs. We then asked whether the number of novel proteins presumably involved in “labile” traits, that is, traits expressed with a high variability in cnidarian species as mesodermal features in the absence of a true mesodermal layer, sensory organ differentiation, biomineralization (see fig. 8), correlates with the expression of these traits. To do so, we compared in the proteomes of cnidarian species that exhibit different anatomies and different life cycles, the number of human orthologs involved in these labile processes (fig. 6B). The coral Acropora, the sea anemone Nematostella, and the Hydra polyp exhibit very similar numbers of proteins predicted to be involved in embryonic morphogenesis, cell adhesion, regulation of ossification, skeletal system development, sensory organ development, and eye development. This result indicates that the observed phenotypes are not predominantly dependent on the proteome content but may rather rely on variable genetic regulations. Model of a regulatory-based parallel mechanism for the emergence of innovations as deduced from the comparison of predicted and observed phenotypes in cnidarians. (A) Innovations in cnidarians, i.e., absent in nonmetazoans or in earlier branching metazoans as porifers, were sorted in four categories of phenotypes: Constrained (dark green) when present in all cnidarians and maintained in bilaterians; labile (light purple) when expressed in some but not all cnidarian species, and largely expressed in bilaterians including vertebrates; latent (light blue) when observed in bilaterians but not in cnidarians; cnidarian-specific (orange) when restricted to cnidarians. Some eumetazoan innovations evolved differently in cnidarians and bilaterians as the sensory motoneurons and the myoepithelial cells that remained multifunctional in cnidarians (light green) but differentiated in more specialized cell types in bilaterians (Arendt 2008). (B) Cnidarian proteomes contain similar numbers of human orthologs (dots), labelled here according to their origin as premetazoan (green), metazoan (blue), eumetazoan (red), or taxon-restricted (yellow). These proteins can participate in genetic modules (GM) that can give rise to constrained phenotypes (gray backgrounds) when regulations between the different proteins are tightly linked, with a limited potential for innovation (upper panel). When forming GM with loose links between preexisting tight GM (middle panel), these protein networks can give rise to labile phenotypes (purple background), prone to innovation through parallel evolution. When not included in predicted GM, the corresponding phenotypes are latent (lowest panel). However, these proteins likely form also taxon-restricted GM that support taxon-specific phenotype (orange background) as anatomical and life cycle differences (Foret et al. 2010; Wenger and Galliot 2013).

Discussion

RBHs, a Potent Strategy to Deduce Innovations from the Evolution of Proteomes

Thanks to the RBH method applied here, we retrieved 244,861 ortholog pairs from a diverse crowd of eukaryotes in a reasonable amount of time. Phylogenetic analyses performed on a limited number of Hydra sequences identified through RBHs indeed confirmed the orthology of these sequences (Wenger and Galliot 2013). Two types of methods are generally used to assign orthology: tree-based when relying on building phylogenetic trees (Page and Holmes 1998; Huerta-Cepas et al. 2007; Hejnol et al. 2009) and graph-based when relying on pairwise comparisons of large data sets (Overbeek et al. 1999; Altenhoff and Dessimoz 2012). Building trees from large data sets is computationally intensive and requires supervision to include meaningful outgroups. Among the graph-based methods, we selected RBH as it is recognized as a sensitive and highly specific method (Chen et al. 2007; Wolf and Koonin 2012). To compare various orthology detection methods, Chen et al. measured the detection rates of false-positive and false-negative orthologs retrieved by each of them. They show that the RBH method combines a good sensitivity (about 70% of the orthologs are detected) with an excellent specificity as the number of confirmed orthologs reaches ∼95%. This means that RBH retrieves a very low number of false positives (∼5%) but does not detect a rather high number of orthologs (∼30%). The decision to select a method where false positives are kept as low as possible was critical in our study, motivated by the second step of this analysis, that is, the inference of the emergence of the human BPs (huBPs). As a consequence of ortholog underprediction, the calculated enrichments of huBPs might suffer from a reduced statistical power but this should reinforce the reliability on the huBPs that are detected as significantly enriched. Chen et al. (2007) also show that InParanoid exhibit a similar specificity but a higher sensitivity than RBH (detecting about 80% orthologs). Here, we also compared the sensitivity of InParanoid and RBH (BlastP+ 2.2.25, e value ≤10−10, and soft masking), and unlike the results presented in the study by Chen et al., we found that the two methods yield extremely similar results (supplementary fig. S1, Supplementary Material online). The RBH method was chosen for its simplicity, high specificity, and low or no supervision requirements while processing large amounts of data efficiently.

Limits of Orthology Detection

However, some potential limitations of this large-scale proteome RBH analysis should be considered: one is the underestimation of true orthologs. Because of the conservative e value of 10−10, a number of genuine orthologs were not retained during the process if they match a sequence in the target proteome with an e value higher than 10−10. As a consequence, the final number of orthologs retrieved by the RBH procedure is likely underestimated. In turn, setting a stringent e value is beneficial for function prediction as it is more likely that orthologous pairs with high similarity share functions. Another limitation is the incorrect attribution of orthology to paralogous sequences. In case of duplication that precedes speciation, if one copy is kept in one species and the other copy is kept in another species, RBH identifies them as orthologs but they are in fact “out paralogs” (Gabaldon and Koonin 2013). Similarly in case of “in paralogs,” that is, case of recent duplications that originated independently after speciation, tracing the origin of each paralogous branch is not always trivial, even on phylogenetic analyses, and matching the ancestral-like sequence among recent paralogs might not necessarily reflect orthology, although all sequences evolved from the same founder sequence. Finally, as a consequence of the limited number of tested species in each phylum and lineage-specific gene losses, some orthologs might have been attributed a too recent origin. For example, in case a protein present in the LCAs of either deuterostomes or chordates but subsequently lost in S. kowalewskii, B. floridae, and C. intestinalis, then the origin of this gene would be incorrectly assigned to the vertebrate LCA. To alleviate this bias, first we only selected species with high-quality proteomes, and second, we considered groups of species rather than individual species to infer protein gains and losses (figs. 3A, 4A, and 4B). As a consequence, only the loss of a considered ortholog in all members of a group leads to the incorrect allocation of its origin along the evolutionary time scale. This pitfall, due to a lack of available data (i.e., not specific to the RBH method), will be largely resolved once a larger number of genomes from a wide variety of phyla will be available.

Waves of Specific Innovations in Metazoan, Eumetazoan, and Vertebrate LCAs

The analysis of human orthologomes reported here shows how innovations that built modern bodies progressively emerged during animal evolution. Based on the timing of emergence of human orthologs, we assessed whether groups of proteins related to huBPs were statistically enriched when compared with the human background. We reasoned that strong overrepresentations possibly provide molecular signatures of phenotypic changes. However, throughout this work, we remained cautious about the fact that statistical enrichments of huBPs over time do not necessarily imply that ancestors exhibited the phenotype nowadays associated in humans. Indeed, neofunctionalization and novel genetic regulation can associate with the emergence of novel phenotypes. At the quantitative level, we found that a large proportion of the 1,235 huBPs identified in this study were possibly already active in nonmetazoan species (42.9%), a significant number of innovations took place in eumetazoan (19.6%) and euteleostome (18.5%) ancestors, and to a lesser extent, in bilaterian (7.7%) and metazoan (4.9%) ancestors. Given the major innovations that accompanied the emergence of eumetazoan-LCAs (see fig. 8), that is, the differentiation of myoepithelial cells as well as mesodermal derivatives (Seipel and Schmid 2006; Arendt 2008; Steinmetz et al. 2012), the differentiation of a nervous system (Kass-Simon and Pierobon 2007; Marlow et al. 2009; Galliot and Quiquand 2011), the development of sensory organs including eyes (Nilsson 2004; Piatigorsky and Kozmik 2004), the specification of an oral-aboral axis (Ball et al. 2004; Technau and Steele 2011), this result was anticipated although never quantified. Similarly, the large number of innovations recorded at the base of the vertebrate branch is consistent with the two rounds of genome duplication previously traced in early chordates (Ohno 1999; McLysaght et al. 2002). Surprisingly, our study does not trace any innovation at the protodeuterostome period and only rare ones in the protochordate period (figs. 4 and 5C). In primates and hominidae, the situation is different as the significant protein gains seem to contribute to a limited number of huBPs (1.8% and 3.9%, respectively). This result actually fits well with the previously described massive duplication and fast evolution of proteins involved in recently evolved BPs in primates and hominidae such as olfactory sensing (Niimura 2009) or immune and inflammatory responses (Eichler 2001; Rodriguez et al. 2012). At the qualitative level, this approach points to the successive emergence of enriched huBPs, with innovations that are specific to each evolutionary period (figs. 5 and 6). Hence, genes involved in human phenotypes appeared in coordinated waves over well-defined period of times rather than emerging continuously. Interestingly, a recent analysis of vertebrate conserved nonexonic elements (CNEE) point to a similar conclusion (Lowe et al. 2011). The authors show that these CNEE are noncoding regulatory sequences that also exhibit punctuated evolution rates, leading to coordinated waves of regulatory innovations during vertebrate evolution.

Latent Phenotypes to Trace Lineage- or Invertebrate-Specific Phenotypes

Species and phyla that originated in periods of massive genetic changes provide attractive experimental frameworks to decipher the mechanisms of emergence and stabilization of phenotypic innovations. To consider novelties linked to the eumetazoan transition, we analyzed cnidarian proteome repertoires and found phenotypic novelties with three distinct levels that we named constrained, labile, and latent. Consistent with traditional inference views, the presence of evolutionarily conserved phenotypes across eumetazoans (e.g., neurogenesis) indicates that the underlying regulatory networks were already implemented in eumetazoan ancestors (Richards, Simionato, et al. 2008; Galliot et al. 2009; Marlow et al. 2009). However the evolutionary “latent” status of protein families involved in neurogenesis was previously documented in unicellular choanoflagellates that express cell signaling and cell adhesion proteins (King et al. 2003, 2008), but also in choanoflagellates and porifers that express most components of the postsynaptic scaffold although not differentiating synapses (Sakarya et al. 2007; Alie and Manuel 2010). One possible explanation for this “protoneurogenic” status might be the absence of a large number of neurogenic genes in these species, as most families of transcription factors involved in neurogenesis actually emerged later, after Porifera divergence (Galliot and Quiquand 2011). The strong conservation of the proteins affiliated to “latent phenotypes” in cnidarians indicates evolutionary constraints already present in cnidarians on functions largely unknown. Thus, investigating the function of evolutionarily conserved proteins related to human phenotypes that remain cryptic in cnidarians should help uncover functions presumably coopted for different tasks in bilaterians and cnidarians.

Labile Phenotypes as a Result of Independent Genetic Regulations Tying Conserved Genetic Modules

The conservation of Hydra-human RBH orthologs in cnidarians affiliated to “labile phenotypes” indicates evolutionary constraints already at work in cnidarians, on functions that most likely partially differ from the human ones. Eye differentiation provides a typical case of labile phenotype. First, the jellyfish eyes express the crystallin proteins (Kostrouch et al. 1998; Kozmik et al. 2003) and the c-opsin signaling cascade (Suga et al. 2008) as “effector” module. The analysis of the cnidarian opsin signaling cascade showed that in jellyfish opsins are expressed not only in photoreceptor cells but also in gonads, suggesting that this pathway is involved in spawning, a light-regulated process that is distinct from vision (Suga et al. 2008). Similarly in Hydra, a hydrozoan polyp that shows phototactic behavior but does not differentiate eyes, light appears to negatively regulate nematocyst discharge through opsins (Plachetzki et al. 2012). These results indicate that the molecular components of a genetic module (here the opsin signaling cascade) are already submitted to several distinct regulations in cnidarians, one possibly plesiomorphic as light regulation of sexual reproduction, another possibly phylum-specific as nematocyst discharge, and finally a third one linked to vision, present in only few cnidarian species, but fixed in most bilaterian phyla, where two distinct opsin signaling cascades are active (c-opsin and r-opsin) and variably conserved (Shubin et al. 2009). Similarly the Six and Eya transcription factors, regulators of eye formation in bilaterians, are expressed in jellyfish independently of eye formation (Stierwald et al. 2004; Graziussi et al. 2012), whereas the Pax regulators are deployed with some flexibility in jellyfish eyes, PaxB, the Pax2/5/8 ortholog in the scyphozoan eye, and PaxA, a Pax-related gene in the hydrozoan eye (Kozmik et al. 2003; Suga et al. 2010). In fact, in eyeless jellyfish (Stierwald et al. 2004) Six and Pax perform neurogenic functions independently of vision, similar to what is observed in anthozoans (Matus et al. 2007), nematodes (Chisholm and Horvitz 1995; Zhang and Emmons 1995), or planarians (Pineda et al. 2002). The cnidarian transcription factors orthologous to regulators of vision in bilaterians would thus already exhibit several functions in cnidarians, one related to neurogenesis present in most if not all cnidarians, another related to eye development in cnidarian jellyfish endowed with vision. Thus cnidarian vision relies on two modules, neurogenic and signaling, both “constrained” as they appear conserved from cnidarians to bilaterians. As noneyed cnidarian species also express these two modules, we assume that induction of eye formation would require a limited number of novel evolutionary steps, establishing regulatory connections between these two preconstrained genetic modules (fig. 8B). As such connections would require minimal molecular adjustments, they could easily occur several times independently and thus promote in parallel similar innovations in related organisms. A comparative analysis of the regulations of eye differentiation in several cnidarian species should test the validity of this model. It should also tell us what are the ancestral regulations that were robust enough to be maintained in cnidarians and vertebrates.

A Regulatory-Based Parallel Mechanism as a Source of Innovation

The refined analysis of the innovations predicted to emerge in eumetazoans ancestors pointed to phenotypes expressed with highly variable levels in cnidarians. On the one hand, orthologs to human proteins involved in specific functions emerge before these functions can be observed (latent phenotypes); on the other hand, cnidarian species that potentially express similar sets of human orthologs exhibit distinct phenotypes (labile). These two observations suggest that a parallel mechanism associates plesiomorphic and convergent processes to generate similar phenotypic innovations in periods when genetic novelties emerge. Briefly, the de novo association between preconstrained genetic modules, which already perform one or several subfunctions, through novel regulatory connections would allow the emergence of novel BPs (fig. 8B). This connecting process between preconstrained modules might arise multiple times independently, in agreement with the deep homology model, based on developmental genetics, whereby distinct taxa that share ancestral regulatory mechanisms evolve similar structures in parallel (Gould 2002; Shubin et al. 2009). This model does not rule out the scenario where similar phenotypes/functions can result from fully convergent processes, that is, supported by different genes in distinct clades (Gompel and Prud’homme 2009). A recent study analyzed the emergence of functional regulatory sequences and protein coding genes in vertebrates (Lowe et al. 2011). By analyzing the enrichment of regulatory sequences in the vicinity of well-identified classes of vertebrate genes (coding for transcription factors, developmental genes, nuclear receptors, and posttranslational protein modifications), Lowe et al. (2011) identify three distinct robust evolutionary patterns, for example, a massive expansion of the regulatory elements in the vicinity of “trans-dev” genes (i.e., transcription factors and developmental genes) at early times of vertebrate evolution followed by a sharp decline, together with an expansion of elements regulating receptors, both events observed independently in tetrapods and ray-finned fish. By contrast, genes involved in posttranslational protein modifications show an inverted pattern, with a progressive and later expansion of their regulatory elements, again occurring independently in several clades (Lowe et al. 2011). These results indicate that specific regulatory innovations peaked over three restricted periods of time along vertebrate evolution. Gene births do not systematically parallel the expansion of regulatory elements, indicating that regulatory innovations do not require novel proteins, as they can actually act on ancient proteins. However, their data show that the reverse situation is rather rare as commonly most gene births, whatever the GO annotation, are accompanied by a marked increase in regulatory sequences. These results strongly support the hypothesis of a regulatory-based parallel mechanism as proposed in this study, as at least in vertebrate evolution, the emergence of regulatory innovations at restricted periods, and, independently in distinct clades, accompanies the expansion of protein coding genes. In cnidarians, this scenario might apply to eye differentiation but also to other labile phenotypes such as differentiation of striated muscles, which is suspected to have evolved multiple times (Steinmetz et al. 2012), sensory organ development, and regionalization (fig. 6B). In case of eye differentiation, it would predict that the regulatory connections between the regulatory module (neurogenic genes) and the effector module (opsin signaling) should differ in eyed and eyeless species or even between eyed species. Once identified, it should be possible to trigger eye differentiation in an eyeless cnidarian species. This scenario would fit with models predicting a higher potential for innovation when robustness is intermediate, that is, when regulatory connections are established but still loose (Ciliberti et al. 2007). Hence, the strategy presented here identified candidate proteins and phenotypes linked to epoch-specific innovations, pointing to the emergence of BPs. Further studies deciphering the regulatory connections between groups of proteins forming functional modules involved in these BPs should help decipher the mechanisms that allowed the emergence of such innovations.

Supplementary Material

Supplementary figures S1–S5 and tables S1–S3 are available at Genome Biology and Evolution online (http://www.gbe.oxfordjournals.org/).
  97 in total

1.  Evolution of key cell signaling and adhesion protein families predates animal origins.

Authors:  Nicole King; Christopher T Hittinger; Sean B Carroll
Journal:  Science       Date:  2003-07-18       Impact factor: 47.728

2.  A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages.

Authors:  Tomislav Domazet-Loso; Josip Brajković; Diethard Tautz
Journal:  Trends Genet       Date:  2007-11       Impact factor: 11.639

Review 3.  The evolution of cell types in animals: emerging principles from molecular studies.

Authors:  Detlev Arendt
Journal:  Nat Rev Genet       Date:  2008-11       Impact factor: 53.242

4.  Origin of the metazoan phyla: molecular clocks confirm paleontological estimates.

Authors:  F J Ayala; A Rzhetsky; F J Ayala
Journal:  Proc Natl Acad Sci U S A       Date:  1998-01-20       Impact factor: 11.205

5.  Ancient deuterostome origins of vertebrate brain signalling centres.

Authors:  Ariel M Pani; Erin E Mullarkey; Jochanan Aronowicz; Stavroula Assimacopoulos; Elizabeth A Grove; Christopher J Lowe
Journal:  Nature       Date:  2012-03-14       Impact factor: 49.962

6.  Cubozoan jellyfish: an Evo/Devo model for eyes and other sensory systems.

Authors:  Joram Piatigorsky; Zbynek Kozmik
Journal:  Int J Dev Biol       Date:  2004       Impact factor: 2.203

7.  The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans.

Authors:  Nicole King; M Jody Westbrook; Susan L Young; Alan Kuo; Monika Abedin; Jarrod Chapman; Stephen Fairclough; Uffe Hellsten; Yoh Isogai; Ivica Letunic; Michael Marr; David Pincus; Nicholas Putnam; Antonis Rokas; Kevin J Wright; Richard Zuzow; William Dirks; Matthew Good; David Goodstein; Derek Lemons; Wanqing Li; Jessica B Lyons; Andrea Morris; Scott Nichols; Daniel J Richter; Asaf Salamov; J G I Sequencing; Peer Bork; Wendell A Lim; Gerard Manning; W Todd Miller; William McGinnis; Harris Shapiro; Robert Tjian; Igor V Grigoriev; Daniel Rokhsar
Journal:  Nature       Date:  2008-02-14       Impact factor: 49.962

8.  Hox Gene Loss during Dynamic Evolution of the Nematode Cluster.

Authors:  A Aziz Aboobaker; Mark L Blaxter
Journal:  Curr Biol       Date:  2003-01-08       Impact factor: 10.834

9.  A phylogenomic investigation into the origin of metazoa.

Authors:  Iñaki Ruiz-Trillo; Andrew J Roger; Gertraud Burger; Michael W Gray; B Franz Lang
Journal:  Mol Biol Evol       Date:  2008-01-09       Impact factor: 16.240

10.  Genome-wide acceleration of protein evolution in flies (Diptera).

Authors:  Joël Savard; Diethard Tautz; Martin J Lercher
Journal:  BMC Evol Biol       Date:  2006-01-25       Impact factor: 3.260

View more
  9 in total

Review 1.  Early metazoan life: divergence, environment and ecology.

Authors:  Douglas H Erwin
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2015-12-19       Impact factor: 6.237

2.  Hydra, a powerful model for aging studies.

Authors:  Szymon Tomczyk; Kathleen Fischer; Steven Austad; Brigitte Galliot
Journal:  Invertebr Reprod Dev       Date:  2014-06-19       Impact factor: 0.952

Review 3.  Multi-functionality and plasticity characterize epithelial cells in Hydra.

Authors:  W Buzgariu; S Al Haddad; S Tomczyk; Y Wenger; B Galliot
Journal:  Tissue Barriers       Date:  2015-07-15

4.  Ancient genes establish stress-induced mutation as a hallmark of cancer.

Authors:  Luis Cisneros; Kimberly J Bussey; Adam J Orr; Milica Miočević; Charles H Lineweaver; Paul Davies
Journal:  PLoS One       Date:  2017-04-25       Impact factor: 3.240

Review 5.  Alternative Animal Models of Aging Research.

Authors:  Susanne Holtze; Ekaterina Gorshkova; Stan Braude; Alessandro Cellerino; Philip Dammann; Thomas B Hildebrandt; Andreas Hoeflich; Steve Hoffmann; Philipp Koch; Eva Terzibasi Tozzini; Maxim Skulachev; Vladimir P Skulachev; Arne Sahm
Journal:  Front Mol Biosci       Date:  2021-05-17

6.  The TALE face of Hox proteins in animal evolution.

Authors:  Samir Merabet; Brigitte Galliot
Journal:  Front Genet       Date:  2015-08-18       Impact factor: 4.599

7.  Loss of neurogenesis in Hydra leads to compensatory regulation of neurogenic and neurotransmission genes in epithelial cells.

Authors:  Y Wenger; W Buzgariu; B Galliot
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2016-01-05       Impact factor: 6.237

Review 8.  A dynamic architecture of life.

Authors:  Beatrix P Rubin; Jeremy Brockes; Brigitte Galliot; Ueli Grossniklaus; Daniel Lobo; Marco Mainardi; Marie Mirouze; Alain Prochiantz; Angelika Steger
Journal:  F1000Res       Date:  2015-11-18

9.  Deficient autophagy in epithelial stem cells drives aging in the freshwater cnidarian Hydra.

Authors:  Szymon Tomczyk; Nenad Suknovic; Quentin Schenkelaars; Yvan Wenger; Kazadi Ekundayo; Wanda Buzgariu; Christoph Bauer; Kathleen Fischer; Steven Austad; Brigitte Galliot
Journal:  Development       Date:  2020-01-23       Impact factor: 6.868

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.