| Literature DB >> 24065732 |
Yvan Wenger1, Brigitte Galliot.
Abstract
Phenotypic traits derive from the selective recruitment of genetic materials over macroevolutionary times, and protein-coding genes constitute an essential component of these materials. We took advantage of the recent production of genomic scale data from sponges and cnidarians, sister groups from eumetazoans and bilaterians, respectively, to date the emergence of human proteins and to infer the timing of acquisition of novel traits through metazoan evolution. Comparing the proteomes of 23 eukaryotes, we find that 33% human proteins have an ortholog in nonmetazoan species. This premetazoan proteome associates with 43% of all annotated human biological processes. Subsequently, four major waves of innovations can be inferred in the last common ancestors of eumetazoans, bilaterians, euteleostomi (bony vertebrates), and hominidae, largely specific to each epoch, whereas early branching deuterostome and chordate phyla show very few innovations. Interestingly, groups of proteins that act together in their modern human functions often originated concomitantly, although the corresponding human phenotypes frequently emerged later. For example, the three cnidarians Acropora, Nematostella, and Hydra express a highly similar protein inventory, and their protein innovations can be affiliated either to traits shared by all eumetazoans (gut differentiation, neurogenesis); or to bilaterian traits present in only some cnidarians (eyes, striated muscle); or to traits not identified yet in this phylum (mesodermal layer, endocrine glands). The variable correspondence between phenotypes predicted from protein enrichments and observed phenotypes suggests that a parallel mechanism repeatedly produce similar phenotypes, thanks to novel regulatory events that independently tie preexisting conserved genetic modules.Entities:
Keywords: eumetazoan innovations; gene ontology enrichment; macroevolution of human orthologs; orthologomes; reciprocal best hits (RBHs); regulatory-based parallel evolution
Mesh:
Substances:
Year: 2013 PMID: 24065732 PMCID: PMC3814200 DOI: 10.1093/gbe/evt142
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Sources and Characteristics of the Different Proteome Data Sets Used in This Study
| Lineage | Species Included | Number of Sequences | Total Sequences per Group | Type of Sequences | Repository |
|---|---|---|---|---|---|
| Hominidae | 20,231 | 20,231 | Reference proteome set | UniProtKB | |
| Nonhominidae primates | 34,434 | 34,434 | Reference proteome set | UniProtKB | |
| Nonprimate vertebrates | 23,344 | 85,224 | Reference proteome set | UniProtKB | |
| 21,541 | Reference proteome set | UniProtKB | |||
| 40,339 | Reference proteome set | UniProtKB | |||
| Cephalochordates | 28,545 | 42,547 | Reference proteome set | UniProtKB | |
| Urochordates | 14,002 | Genome-predicted proteome | JGI | ||
| Hemichordates | 43,572 | 56,156 | Genome-predicted proteome | JGI | |
| 12,584 | RefSeq | NCBI | |||
| Protostomes | 16,246 | 106,607 | Proteome | UniProtKB | |
| 17,563 | Reference proteome set | UniProtKB | |||
| 16,986 | Complete proteome set | UniProtKB | |||
| 32,415 | Genome-predicted proteome | JGI | |||
| 23,397 | Reference proteome set | UniProtKB | |||
| Cnidarians | 24,435 | 199,482 | Reference proteome set | UniProtKB | |
| 30,666 | Assembled ESTs | Compagen | |||
| 23,677 | Genome-predicted proteome | OIST -MGU | |||
| 36,780 | RNA-seq | OIST -MGU | |||
| 57,611 | RNA-seq | ENA | |||
| Genome predicted | NCBI, JGI | ||||
| 26,313 | Single-pass ESTs | NCBI, Compagen | |||
| Poriferans | 30,060 | 30,060 | Genome-predicted proteome | JGI | |
| Non-metazoans | 27,416 | 75,642 | Genome-predicted proteome | TAIR | |
| 6,643 | Reference proteome set | UniProtKB | |||
| 12,318 | Proteome | dictyBase | |||
| 8,374 | Complete proteome set | UniProtKB | |||
| 9,188 | Reference proteome set | UniProtKB | |||
| 11,703 | Complete proteome set | UniProtKB | |||
| Total | 650,383 | ||||
Note.—ENA, European Nucleotide Archive; JGI, Joint Genome Institute; NCBI, National Center for Biotechnology Information; OIST-MGU, Okinawa Institute of Science and Technology—Marine Genomics Unit. For references, see Results.
FRBH computing. The RBH process takes place after a reasonably complete proteome (here human) is aligned unidirectionally to another whole proteome (here Hydra). (A) After BlastP+ (e value 10−10) relations between the human and Hydra protein sets are established, represented by a series of basal hits between either a given human protein and several Hydra proteins (black arrows) or inferred between a given Hydra protein and several human proteins (gray arrows). Each of these relationships receives a Blast score (numbers next to the arrows) that is valid for both the query–hit and the hit–query relationships. (B) Relations that are retained as RBHs fulfill two criteria: 1) Best score between a given query and the different hits (red arrow) and 2) best score between a given hit and the different queries (blue arrow). (C) In the case where two or more query/hit relationships with a shared query or hit qualify as RBH, one pair is selected randomly. This scenario typically takes places when nearly identical paralog sequences are present in the query or target proteomes.
FEvolution of the respective sizes of the human, Drosophila, Capitella, and Hydra orthologomes. Sequences of the Hydra, Drosophila, Capitella, and human proteomes were used to size independently orthologomes on representative eukaryotes. Timings of radiations were taken from Battacharya et al. (2009) for holozoans; from Peterson et al. (2008) for metazoans, eumetazoans, bilaterians, deuterostomes, and vertebrates; from Steiper and Young (2009) for primates. We arbitrarily placed Chordata origin at midtime between Deuterostomia and Vertebrata origins in agreement with Ayala et al. (1998). Each bar represents the number of RBHs obtained between human (black), Capitella (red), Drosophila (yellow), and Hydra (gray) proteomes and the indicated species. Size of the orthologomes is given for human and Hydra. Note the impact of proteome completeness with the two Saccoglossus data sets.
FExpansion of human orthologs in the LCAs of metazoans, eumetazoans, and bilaterians. (A) Plot showing the RBH scores obtained by 20,231 human proteins tested on seven noneumetazoan proteomes (x axis, Groups I and II) and on four cnidarian proteomes (y axis, Group III). Among these, 7,789 were present in the LCAs of metazoans (I, II), 2,422 (12%) originated in the LCAs of eumetazoans (Group III), and 10,020 (49.5%) represent postcnidarian novelties (Group IV). Note the distribution of proteins involved in human mesoderm development (blue) and ribosome biogenesis (red). (B) Scheme recapitulating the prebilaterian evolutionary events of human proteins: Emergences (star), losses (empty square), and family expansions (triangle). For species abbreviations, see figure 2.
FTiming of emergences of human orthologs and related Biological Processes (huBPs) in metazoan evolution. (A) Parallel bursts in human orthologs’ gains (green bars) and emergence of huBPs (gray bars, corrected P value ≤10−3). (B) Gains (green bars) and losses (blue bars) in human orthologs obtained by testing the complete human proteome against the proteomes of species belonging to phyla branching at various steps of metazoan evolution. (C) Rates of emergence of human orthologs across metazoan evolution expressed as numbers of novel ortholog proteins (y axis) detected by million year (Myr). Rates were deduced from the protein gains shown in A and B over the time periods separating the LCAs of two clades as indicated by inverted arrows at the bottom. References for each time period are given in the legend of figure 2.
List of the 10 Most Significantly Protein-Enriched BPs Detected at Nine Evolutionary Periods
| BP Number | BP Name | Corr. | Enrichment | |
|---|---|---|---|---|
| Nonmetazoans | GO:0044248 | Cellular catabolic process | 1.4E-140 | 2.1 |
| GO:0016070 | RNA metabolic process | 2.0E-134 | 2 | |
| GO:0006396 | RNA processing | 1.1E-113 | 2.4 | |
| GO:0046907 | Intracellular transport | 1.5E-105 | 2.1 | |
| GO:0016071 | mRNA metabolic process | 5.1E-105 | 2.4 | |
| GO:0009057 | Macromolecule catabolic process | 8.2E-86 | 2.2 | |
| GO:0044265 | Cellular macromolecule catabolic process | 3.1E-81 | 2.2 | |
| GO:0042180 | Cellular ketone metabolic process | 1.5E-79 | 2.1 | |
| GO:0006082 | Organic acid metabolic process | 1.0E-76 | 2.1 | |
| GO:0019752 | Carboxylic acid metabolic process | 4.0E-76 | 2.1 | |
| Porifer LCA | GO:0009790 | Embryo development | 2.6E-07 | 2.1 |
| GO:0009887 | Organ morphogenesis | 3.2E-06 | 2.1 | |
| GO:0009792 | Embryo development ending in birth or egg hatching | 8.1E-06 | 2.4 | |
| Cnidarian LCA | GO:0009653 | Anatomical structure morphogenesis | 7.3E-40 | 2 |
| GO:0007399 | Nervous system development | 5.5E-37 | 2 | |
| GO:0007417 | Central nervous system development | 3.4E-29 | 2.6 | |
| GO:0048699 | Generation of neurons | 8.5E-26 | 2.1 | |
| GO:0022008 | Neurogenesis | 9.5E-26 | 2.1 | |
| GO:0009887 | Organ morphogenesis | 2.6E-25 | 2.4 | |
| GO:0007420 | Brain development | 7.2E-25 | 2.8 | |
| GO:0030182 | Neuron differentiation | 3.7E-24 | 2.3 | |
| GO:0010628 | Positive regulation of gene expression | 1.1E-23 | 2.1 | |
| GO:0045893 | Positive regulation of transcription, DNA-dependent | 2.3E-23 | 2.1 | |
| Bilaterian LCA | GO:0007399 | Nervous system development | 1.2E-14 | 2 |
| GO:0010628 | Positive regulation of gene expression | 1.3E-14 | 2.4 | |
| GO:0031327 | Negative regulation of cellular biosynthetic process | 1.4E-14 | 2.4 | |
| GO:0045944 | Positive regulation of transcription from RNA polymerase II promoter | 2.0E-14 | 2.8 | |
| GO:0006357 | Regulation of transcription from RNA polymerase II promoter | 2.5E-14 | 2.3 | |
| GO:0045893 | Positive regulation of transcription, DNA-dependent | 4.0E-14 | 2.4 | |
| GO:0009890 | Negative regulation of biosynthetic process | 4.7E-14 | 2.4 | |
| GO:0051254 | Positive regulation of RNA metabolic process | 2.0E-12 | 2.3 | |
| GO:0009892 | Negative regulation of metabolic process | 3.4E-12 | 2.1 | |
| GO:0010557 | Positive regulation of macromolecule biosynthetic process | 3.6E-12 | 2.2 | |
| Deuterostome LCA | None | - | - | |
| Chordate LCA | GO:0060537 | Muscle tissue development | 9.6E-06 | 4.9 |
| Nonprimate vertebrates | GO:0007155 | Cell adhesion | 5.0E-38 | 2.3 |
| GO:0022610 | Biological adhesion | 5.0E-38 | 2.3 | |
| GO:0007186 | G-protein coupled receptor signaling pathway | 1.3E-35 | 2.5 | |
| GO:0016337 | Cell–cell adhesion | 4.9E-18 | 2.5 | |
| GO:0032101 | Regulation of response to external stimulus | 1.0E-15 | 2.4 | |
| GO:0051050 | Positive regulation of transport | 1.8E-15 | 2.2 | |
| GO:0050730 | Regulation of peptidyl-tyrosine phosphorylation | 9.9E-15 | 3.3 | |
| GO:0006873 | Cellular ion homeostasis | 1.7E-14 | 2 | |
| GO:0006954 | Inflammatory response | 2.2E-14 | 2.3 | |
| GO:0050731 | Positive regulation of peptidyl-tyrosine phosphorylation | 3.4E-14 | 3.5 | |
| Nonhominidae primates | GO:0042742 | Defense response to bacterium | 1.6E-17 | 3.8 |
| GO:0051707 | Response to other organism | 1.6E-14 | 2.2 | |
| GO:0009607 | Response to biotic stimulus | 1.6E-13 | 2.1 | |
| GO:0009617 | Response to bacterium | 5.2E-09 | 2.2 | |
| GO:0050909 | Sensory perception of taste | 4.9E-08 | 4.4 | |
| GO:0007606 | Sensory perception of chemical stimulus | 6.3E-08 | 3 | |
| Hominidae | GO:0006958 | Complement activation, classical pathway | 1.8E-78 | 8.5 |
| GO:0002455 | Humoral immune response mediated by circulating Ig | 4.2E-77 | 8.4 | |
| GO:0006956 | Complement activation | 5.0E-75 | 8.1 | |
| GO:0072376 | Protein activation cascade | 1.6E-67 | 7.2 | |
| GO:0016064 | Immunoglobulin mediated immune response | 5.8E-67 | 7.2 | |
| GO:0019724 | B cell mediated immunity | 2.7E-66 | 7.2 | |
| GO:0002449 | Lymphocyte mediated immunity | 2.0E-62 | 6.7 | |
| GO:0006959 | Humoral immune response | 1.1E-60 | 6.5 | |
| GO:0002460 | Adaptive immune response based on somatic recombination of immune receptors built from immunoglobulin superfamily domains | 5.1E-59 | 6.3 | |
| GO:0002443 | Leukocyte mediated immunity | 3.1E-56 | 5.9 | |
Note.—All processes listed here are enriched at least 2-fold with P values lower than 10−5. For the complete list of protein-enriched BPs, see supplementary table S2, Supplementary Material online.
FCharacterization of the ortholog-deduced Biological Processes (huBPs) emerged in the LCAs of metazoans (A), bilaterians (B), chordates (C), vertebrates (D), and primates (E). BPs showing protein enrichment ≥2 times (horizontal scale) are depicted by a circle whose surface is proportional to the number of proteins. The color code indicates two levels of statistical significance (see inset). Note the significantly enriched huBPs in LCAs of each period (see table 2): Embryonic development in protometazoans; neurogenesis, organ morphogenesis and regulation of transcription in protoeumetazoans; nervous system development and regulation of biosynthetic process in protobilaterians; muscle tissue development in protochordates; cell adhesion, response to external stimulus, G-protein coupled receptor signaling pathway and inflammatory response in vertebrates; sensory perception and defense response to bacterium in primates; complement activation, humoral immune response, and leukocyte-mediated immunity in hominidae. For the full list of protein-enriched BPs, see supplementary table S2, Supplementary Material online.
FCharacterization of the huBPs associated with major protein gains in cnidarians. (A) Enrichments in proteins for a given huBP were identified in cnidarians (2,422 proteins in Group III) over the 10,211 protobilaterian proteins (Groups I + II + III) as background. The huBPs showing protein enrichment ≥2 times with corrected P values ≤10−5 are shown for cnidarians (red), and noneumetazoans (blue, green). The numbers after each huBP indicate the number of proteins in Groups I, II, III, respectively. The huBPs with protein enrichment ≥2.5x are written bold. For details, see supplementary table S3, Supplementary Material online. (B) Similar gains of novel human orthologs associated with selected huBPs in anthozoan (Acropora, Nematostella) and medusozoan (Hydra) cnidarian proteomes. These three cnidarian species exhibiting widely different lifestyles and morphologies. The scale represents the number of proteins identified in the proteome of each cnidarian species for each indicated huBP.
FModel of a regulatory-based parallel mechanism for the emergence of innovations as deduced from the comparison of predicted and observed phenotypes in cnidarians. (A) Innovations in cnidarians, i.e., absent in nonmetazoans or in earlier branching metazoans as porifers, were sorted in four categories of phenotypes: Constrained (dark green) when present in all cnidarians and maintained in bilaterians; labile (light purple) when expressed in some but not all cnidarian species, and largely expressed in bilaterians including vertebrates; latent (light blue) when observed in bilaterians but not in cnidarians; cnidarian-specific (orange) when restricted to cnidarians. Some eumetazoan innovations evolved differently in cnidarians and bilaterians as the sensory motoneurons and the myoepithelial cells that remained multifunctional in cnidarians (light green) but differentiated in more specialized cell types in bilaterians (Arendt 2008). (B) Cnidarian proteomes contain similar numbers of human orthologs (dots), labelled here according to their origin as premetazoan (green), metazoan (blue), eumetazoan (red), or taxon-restricted (yellow). These proteins can participate in genetic modules (GM) that can give rise to constrained phenotypes (gray backgrounds) when regulations between the different proteins are tightly linked, with a limited potential for innovation (upper panel). When forming GM with loose links between preexisting tight GM (middle panel), these protein networks can give rise to labile phenotypes (purple background), prone to innovation through parallel evolution. When not included in predicted GM, the corresponding phenotypes are latent (lowest panel). However, these proteins likely form also taxon-restricted GM that support taxon-specific phenotype (orange background) as anatomical and life cycle differences (Foret et al. 2010; Wenger and Galliot 2013).
FVersatility of novel eumetazoan proteins: Heatmap showing a limited overlap between the protein contents of the BPs that are enriched in novel cnidarian human orthologs.