| Literature DB >> 27056411 |
Daniele Guerzoni1, Aoife McLysaght2.
Abstract
De novo protein-coding gene origination is increasingly recognized as an important evolutionary mechanism. However, there remains a large amount of uncertainty regarding the frequency of these events and the mechanisms and speed of gene establishment. Here, we describe a rigorous search for cases of de novo gene origination in the great apes. We analyzed annotated proteomes as well as full genomic DNA and transcriptional and translational evidence. It is notable that results vary between database updates due to the fluctuating annotation of these genes. Nonetheless we identified 35 de novo genes: 16 human-specific; 5 human and chimpanzee specific; and 14 that originated prior to the divergence of human, chimpanzee, and gorilla and are found in all three genomes. The taxonomically restricted distribution of these genes cannot be explained by loss in other lineages. Each gene is supported by an open reading frame-creating mutation that occurred within the primate lineage, and which is not polymorphic in any species. Similarly to previous studies we find that the de novo genes identified are short and frequently located near pre-existing genes. Also, they may be associated with Alu elements and prior transcription and RNA-splicing at the locus. Additionally, we report the first case of apparent independent lineage sorting of a de novo gene. The gene is present in human and gorilla, whereas chimpanzee has the ancestral noncoding sequence. This indicates a long period of polymorphism prior to fixation and thus supports a model where de novo genes may, at least initially, have a neutral effect on fitness.Entities:
Keywords: de novo genes; human; incomplete lineage sorting; new genes; primates
Mesh:
Substances:
Year: 2016 PMID: 27056411 PMCID: PMC4860702 DOI: 10.1093/gbe/evw074
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
De Novo Genes that Originated Recently in the Primate Lineage
| Gene Name | EnsEMBL ID (Human) | Lineages | Exons | Length (aa) | Alu Elements Found within Exons | Overlap with Other Genes | Transcriptional Evidencea | Peptide Evidenceb |
|---|---|---|---|---|---|---|---|---|
| ENSG00000226452 | H | 1 | 65 | Yes—overlapping the CDS | Opposite strand overlap | Hs.617350 | No | |
| ENSG00000203863 | H | 1 | 144 | Yes—overlapping the CDS | No | Hs.640013 | gpmDB, PRIDE | |
| ENSG00000214780 | H | 2 | 195 | Yes—In UTR regions | Same strand overlap | Hs.676126 | No | |
| ENSG00000219410 | H | 4 | 139 | No | Opposite strand overlap | Hs.714839 | gpmDB | |
| ENSG00000250091 | H | 2 | 163 | Yes—In UTR regions | Opposite strand overlap | Hs.548335*, Hs.679261, Hs.728379 | gpmDB, PRIDE | |
| ENSG00000205148 | H | 1 | 126 | No | No | Hs.58690 | PRIDE | |
| ENSG00000196273 | H | 2 | 105 | No | No | Hs.379802, Hs.662255 | gpmDB, PRIDE | |
| ENSG00000213904 | H | 5 | 138 | No | Opposite strand overlap | Hs.600453*, Hs.624933 | No | |
| ENSG00000179253 | H | 2 | 140 | Yes—In UTR regions | Opposite strand overlap | Hs.683806 | gpmDB, PRIDE | |
| ENSG00000176912 | H | 2 | 123 | No | Opposite strand overlap | No | gpmDB, PRIDE | |
| ENSG00000233889 | H | 1 | 75 | No | No | Hs.573631 | No | |
| ENSG00000255869 | H | 1 | 140 | No | No | Hs.654784 | No | |
| ENSG00000256842 | H | 2 | 158 | No | Same strand overlap | Hs.721335 | No | |
| ENSG00000256707 | H | 1 | 243 | No | No | Hs.496083 | No | |
| ENSG00000258961 | H | 1 | 181 | No | No | Hs.531264 | PRIDE | |
| ENSG00000247270 | H | 1 | 201 | No | Opposite strand overlap | Hs.730232*, Hs.97805 | PRIDE | |
| ENSG00000198685 | HC | 3 | 149 | Yes—In UTR regions | No | Hs.194283 | PRIDE | |
| ENSG00000170647 | HC | 1 | 129 | No | No | Hs.44004 | PRIDE | |
| ENSG00000205414 | HC | 2 | 140 | Yes—In UTR regions | Opposite strand overlap | Hs.689579 | gpmDB | |
| ENSG00000256831 | HC | 2 | 170 | No | Same strand overlap | No | No | |
| ENSG00000255766 | HC | 1 | 266 | No | Opposite strand overlap | Hs.602995, Hs.712217 | No | |
| ENSG00000216839 | HCG | 1 | 153 | No | Same strand overlap | No | No | |
| ENSG00000187461 | HCG | 2 | 136 | No | Same strand overlap | Hs.674313* | No | |
| ENSG00000176424 | HCG | 1 | 234 | No | Opposite strand overlap | Hs.708964 | gpmDB, PRIDE | |
| ENSG00000227273 | HCG | 1 | 117 | No | Same strand overlap | No | No | |
| ENSG00000229429 | HCG | 2 | 158 | No | Both strands overlap | No | No | |
| ENSG00000176236 | HCG | 2 | 155 | Yes—In UTR regions | Opposite strand overlap | No | gpmDB, PRIDE | |
| ENSG00000203779 | HCG | 7 | 152 | No | Opposite strand overlap | Hs.646701 | No | |
| ENSG00000176984 | HCG | 2 | 323 | Yes—In UTR regions | No | Hs.638417 | gpmDB | |
| ENSG00000206105 | HCG | 1 | 44 | No | No | Hs.580879 | gpmDB, PRIDE | |
| ENSG00000255646 | HCG | 1 | 182 | Yes—In UTR regions | No | No | No | |
| ENSG00000256345 | HCG | 1 | 181 | No | Opposite strand overlap | Hs.610961 | No | |
| ENSG00000255953 | HCG | 2 | 140 | Yes—In UTR regions | Opposite strand overlap | Hs.730330, Hs.730455* | No | |
| ENSG00000257100 | HCG | 2 | 163 | No | Same strand overlap | Hs.534504 | No | |
| ENSG00000259119 | HCG | 2 | 114 | No | No | Hs.631462 | gpmDB | |
| ENSG00000256247 | ILS (H+G) | 2 | 162 | Yes—on junction | Same strand overlap | No | gpmDB |
Transcriptional evidence by displaying the identifier of associated Unigene clusters. Currently retired clusters are marked with *.
Peptide evidence is shown by displaying the name of the repository in which it can be currently found.
The gene is still annotated and has the same exonic structure, but it is not considered as protein coding in e70.
e70 gene ID is ENSG00000259498.
FILS of a de novo gene. (A) Segment of alignment of the de novo gene ENSG00000256247 with the orthologous region from other primates. The ORF is present only in human and gorilla. The ORF was created by a single base-pair insertion uniquely found in human and gorilla (indicated by an orange box). This frameshift means that the TGA stop codon (boxed in red) is no longer in frame in human and gorilla. These two species also uniquely share a three base-pair difference (GTG vs. CCC) very close to the insertion site. The start and stop codons in human and gorilla are not pictured in this segment. Numbers at the side of the alignment indicate base-pair positions starting from the human start codon. (B) Inferred evolutionary history of this de novo gene: The one base-pair insertion occurred in the ancestor of the great apes. The substitutions resulting in the downstream “GTG” were either already present in that individual, or occurred later in an individual carrying the insertion. The ORF thus created remained polymorphic (indicated by the dashed orange lines) until after the human–chimpanzee divergence. Subsequent independent lineage sorting saw the fixation of the original locus lacking the gene (black) in the chimpanzee lineage and the de novo gene (orange) was independently fixed in human and gorilla. Alignment visualized using JalView (Waterhouse et al. 2009). Species Latin names are shown in the alignment and the corresponding common names are shown in the phylogenetic tree.
FAlu elements and the de novo gene origins. The de novo gene AL079342 (ensembl ID ENSG00000203863) is overlapping with two Alu elements. (A) Schematic of the region on chromosome 6 that includes ENSG00000203863 (coding sequence shown in red). Two Alu elements (shaded green) overlap the gene sequence. The area shaded orange is shown in detail in part (B) of the figure. (B) Multiple sequence alignment of the orthologous region in several primates. AluJb provides the start codon for the ORF in human and is present cryptically in all other species examined (boxed in green). A human-specific frameshift is caused by the deletion of four bases (boxed in orange). The human ORF continues beyond the alignment segment shown.