| Literature DB >> 28854603 |
José Luis Villanueva-Cañas1,2, Jorge Ruiz-Orera1, M Isabel Agea1, Maria Gallo3, David Andreu3, M Mar Albà1,3,4.
Abstract
The birth of genes that encode new protein sequences is a major source of evolutionary innovation. However, we still understand relatively little about how these genes come into being and which functions they are selected for. To address these questions, we have obtained a large collection of mammalian-specific gene families that lack homologues in other eukaryotic groups. We have combined gene annotations and de novo transcript assemblies from 30 different mammalian species, obtaining ∼6,000 gene families. In general, the proteins in mammalian-specific gene families tend to be short and depleted in aromatic and negatively charged residues. Proteins which arose early in mammalian evolution include milk and skin polypeptides, immune response components, and proteins involved in reproduction. In contrast, the functions of proteins which have a more recent origin remain largely unknown, despite the fact that these proteins also have extensive proteomics support. We identify several previously described cases of genes originated de novo from noncoding genomic regions, supporting the idea that this mechanism frequently underlies the evolution of new protein-coding genes in mammals. Finally, we show that most young mammalian genes are preferentially expressed in testis, suggesting that sexual selection plays an important role in the emergence of new functional genes.Entities:
Keywords: adaptive evolution; de novo gene; evolutionary innovation; lineage-specific gene; mammals; species-specific gene
Mesh:
Substances:
Year: 2017 PMID: 28854603 PMCID: PMC5554394 DOI: 10.1093/gbe/evx136
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
. 1.—Mammalian tree and number of mammalian-specific gene families. The tree depicts the phylogenetic relationships between 30 mammalian species from different major groups. The values in each node indicate the number of families that were mapped to the branch ending in the node. We define three conservation levels: “mam-basal” (class 2, approximately older than 100 Myr, red), “mam-young” (class 1, green) and “species-specific” (class 0, blue). The branch length represents the approximate number of substitutions per site as inferred from previous studies (see Materials and Methods). The scale bar on the bottom left corner represents 6 substitutions per 100 nucleotides. Dotted lines have been added to some branches to improve readability.
Examples of Mammalian-Specific Genes Families
| Gene Name | Description | Tree Node | Features | References |
|---|---|---|---|---|
| SCGB | Secretoglobin | Mammalia | Gene family, modulation of inflammation | ( |
| PRM3 | Protamine 3 | Theria | Affects sperm motility | ( |
| CSN1S1 | Casein alpha s1 | Eutheria | Ca-sensitive milk protein, related to vertebrate calcium-binding protein SPARCL1 | ( |
| LCE6A | Late cornified envelope 6A | Eutheria | Formation of the skin, part of the epidermal differentiation complex | ( |
| IL2 | Interleukin 2 | Eutheria | Cytokine, rapid sequence divergence | ( |
| MUC7 | Mucin 7 | Eutheria | Antimicrobial peptide, secreted in saliva | ( |
| NNAT | Neuronatin | Eutheria | Neural development | ( |
| IGIP | IgA-inducing protein | Eutheria | Activates the production of immunoglobulin A by B cells | ( |
| SMCP | Sperm mitochondrial-associated cysteine-rich protein | Eutheria | Involved in sperm motility | ( |
| CLLU1 | Chronic lymphocytic leukemia upregulated 1 | Primates | Overexpressed in leukemia, de novo origin | ( |
| HMHB1 | Histocompatibility (minor) HB-1 | Primates | Precursor of the histocompatibility antigen HB-1, de novo origin | ( |
| DCD | Dermcidin | Haplorrhini | Antimicrobial peptide, secreted in the skin | ( |
| MYEOV | Myeloma overexpressed | Haplorrhini | Overexpressed in myeloma, de novo origin | ( |
| RP11-429E11.3 | Uncharacterized protein | Great apes | De novo origin | ( |
| RP11-45H22.3 | Uncharacterized protein | Hum/Chimp | De novo origin | ( |
Note.—We indicate the node under which we classified the gene. Tree Node numbers: Mammalia 1; Theria 2; Eutheria 4; Primates 7; Haplorrhini 16; Great apes 26; Hum/Chimp 28.
. 2.—Gene expression patterns of genes from different conservation levels. (A) Proportion of broadly expressed and tissue-specific genes in different conservation classes. (B) Fraction of genes with maximum expression in a given tissue for different conservation classes. (C) Box-plot showing the distribution of FPKM gene expression values, at a logarithmic scale, in different conservation classes and for the tissue with the highest expression value. Data in (B) and (C) is for tissue-specific genes. All data shown is for mouse genes. See supplementary figure S3, Supplementary Material online, for the same data for human genes.
. 3.—Sequence properties of mammalian-specific genes. (A) Sequence length in amino acids. (B) Aromaticity. (C) Isoelectric point (IP). Protein sequences were extracted from the complete gene families set. We used the following gene groups: A: ancestral; R: random; 2: “mam-basal;” 1: “mam-young;” 0: species-specific.
. 4.—Proteomics and Gene Ontology information. (A) Proportion of mouse genes with proteomics or Gene Ontology (GO) data for different gene groups. Validated proteins were those that had at least two different peptides with a perfect match and these peptides did not map to any other protein allowing for up to two mismatches. (B) Number of unique peptides for validated proteins from different groups. (C) Number of total peptides for validated proteins from different groups.
Main Functions of Mammalian-Specific Genes
| Enriched Function | Representative Terms | Corrected | |
|---|---|---|---|
| 1. Immune response | 1.1 immune response (GO) | 14 | 2.2E-3 |
| 1.2 cytokine activity (GO) | 13 | 1.6E-10 | |
| 1.3 Jak-STAT signaling pathway (KEGG) | 6 | 5.5E-4 | |
| 2. Reproduction | 2.1 reproductive process in a multicellular organism (GO) | 12 | 1.3E-3 |
| 2.2 spermatogenesis (GO) | 10 | 9.2E-4 | |
| 3. Secreted protein | 3.1 extracellular region (GO) | 64 | 1.8E-15 |
| 3.2 secreted (Uniprot) | 59 | 2.8E-14 | |
| 3.3 signal peptide (Uniprot) | 60 | 7.0E-10 |
Note.—The results shown are for human genes classified as “mam-basal” (class 2).