| Literature DB >> 30339683 |
Angelo Pavesi1, Alberto Vianelli2, Nicola Chirico2, Yiming Bao3, Olga Blinkova4, Robert Belshaw5, Andrew Firth6, David Karlin7,8.
Abstract
Overlapping genes represent a fascinating evolutionary puzzle, since they encode two functionally unrelated proteins from the same DNA sequence. They originate by a mechanism of overprinting, in which point mutations in an existing frame allow the expression (the "birth") of a completely new protein from a second frame. In viruses, in which overlapping genes are abundant, these new proteins often play a critical role in infection, yet they are frequently overlooked during genome annotation. This results in erroneous interpretation of mutational studies and in a significant waste of resources. Therefore, overlapping genes need to be correctly detected, especially since they are now thought to be abundant also in eukaryotes. Developing better detection methods and conducting systematic evolutionary studies require a large, reliable benchmark dataset of known cases. We thus assembled a high-quality dataset of 80 viral overlapping genes whose expression is experimentally proven. Many of them were not present in databases. We found that overall, overlapping genes differ significantly from non-overlapping genes in their nucleotide and amino acid composition. In particular, the proteins they encode are enriched in high-degeneracy amino acids and depleted in low-degeneracy ones, which may alleviate the evolutionary constraints acting on overlapping genes. Principal component analysis revealed that the vast majority of overlapping genes follow a similar composition bias, despite their heterogeneity in length and function. Six proven mammalian overlapping genes also followed this bias. We propose that this apparently near-universal composition bias may either favour the birth of overlapping genes, or/and result from selection pressure acting on them.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30339683 PMCID: PMC6195259 DOI: 10.1371/journal.pone.0202513
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
General properties of the overlapping gene dataset.
| Nature of the genome | Number of families | Number of genera | Number of overlapping gene pairs | Number of proteins affected by overlap |
|---|---|---|---|---|
| 16 | 24 (26) | 37 | 70 | |
| 6 | 12 (13) | 15 | 29 | |
| 3 | 9 (9) | 14 | 26 | |
| 2 | 5 (5) | 5 | 10 | |
| 1 | 1 (1) | 1 | 2 | |
| 1 | 3 (6) | 6 | 12 | |
| 1 | 1 (1) | 2 | 3 | |
| 30 | 55 (61) | 80 | 152 |
aUnassigned genera or unassigned families are counted as bona fide genera or families
bSome genera contain several overlapping gene pairs
cSome genes overlap with more than one gene
Fig 1Frequency distribution of the length of the 80 overlapping genes of the dataset.
List of the 11 pairs of overlapping genes encoding interacting proteins.
| Virus species | Protein product 1 | Protein product 2 | Function | Bibliographic references |
|---|---|---|---|---|
| Adeno-associated virus 2 | Capsid protein (VP1) | AAP (Assembly Activating Protein) | Virion assembly | [ |
| Borna disease virus 1 | X protein | Phosphoprotein | Virus replication | [ |
| Chicken anemia virus | Capsid protein (VP2) | Nucleocapsid protein | Virion assembly | [ |
| Chicken anemia virus | Capsid protein (VP2) | Apoptin (VP3) | Host cell apoptosis | [ |
| East African cassava virus | AV2 protein | Capsid protein (AV1) | Within-host virus movement | [ |
| Hepatitis E virus | Phosphoprotein (ORF3) | Capsid protein (ORF2) | Virion assembly | [ |
| Human papillomavirus type 16 | E2 protein | E4 protein | Stabilization of the E2 protein | [ |
| Influenza virus A | RNA-dependent RNA polymerase (subunit PB1) | PB1-F2 protein | Virus replication | [ |
| Rotavirus A | Phosphoprotein (NSP5) | NSP6 protein | Viroplasm formation | [ |
| Sesbania mosaic virus | Polyprotein P2a (ATPase P10 domain) | Polyprotein P2ab (RdRp domain) | Virus replication | [ |
| Simian hemorrhagic fever virus | GP3 protein | GP4 protein | Virus entry | [ |
a. The interaction was established in a virus species from the same genus, Cotton leaf curl Kokhran virus-Dabawali.
b. The interaction was established in a virus species from the same genus, Equine arteritis virus.
Fig 2Main mechanisms used to express the proteins encoded by overlapping genes.
Mechanisms of expression of overlapping genes.
| Translational mechanisms (54 cases: 38 proven and 16 suspected) | Transcriptional mechanisms (17 cases: 16 proven and 1 suspected) | |||
|---|---|---|---|---|
| Alternative start codon | IRES | Ribosomal frameshifting | Subgenomic RNAs | Transcriptional slippage |
| 29 proven and 16 suspected cases | 2 proven cases | 7 proven cases | 12 proven and 1 suspected cases | 4 proven cases |
aResults in a completely new coding sequence
bResults in the fusion of a new coding sequence downstream of an existing coding sequence
Fig 3Difference between the pooled sets of overlapping and non-overlapping genes for the 20 most critical composition features.
(A) Nucleotides and dinucleotides. (B) Amino acids and amino acids grouped in accordance to codon degeneracy. (C) Synonymous codons.
Fig 4Principal component analysis (PCA) of overlapping genes.
The star symbol near the origin of the axes indicates the pooled dataset of overlapping genes, while the black circles indicate the individual overlapping genes. Circles outside the ellipse are outliers, that is overlaps with a composition significantly different from the rest (P<0.05). (A) Map yielded by the first (PC1) and second (PC2) principal component. (B) Map yielded by the first (PC1) and third (PC3) principal component.
Correlation between the 20 critical composition features of overlapping genes and the first (PC1), second (PC2), and third (PC3) principal component.
| Composition feature | PC1 | PC2 | PC3 |
|---|---|---|---|
| 0.76 | 0.24 | -0.26 | |
| -0.18 | -0.94 | 0.19 | |
| -0.70 | 0.59 | 0.23 | |
| 0.23 | -0.70 | 0.09 | |
| 0.18 | -0.67 | -0.08 | |
| -0.12 | -0.83 | 0.18 | |
| -0.41 | 0.41 | -0.24 | |
| -0.58 | 0.64 | 0.16 | |
| 0.05 | 0.20 | -0.84 | |
| -0.49 | -0.30 | -0.03 | |
| -0.56 | 0.52 | 0.14 | |
| 0.18 | -0.55 | 0.00 | |
| 0.10 | -0.55 | 0.14 | |
| -0.48 | -0.49 | -0.60 | |
| 0.91 | -0.02 | 0.16 | |
| 0.22 | 0.10 | -0.61 | |
| -0.18 | -0.13 | 0.07 | |
| -0.42 | 0.52 | 0.17 | |
| -0.46 | 0.27 | 0.07 | |
| 0.24 | -0.48 | -0.09 |
Fig 5Location of the 6 mammalian overlapping genes in the PC1-PC2 map of viral overlapping genes.
The mammalian overlapping genes are indicated by bold triangles, the viral overlapping genes by empty circles. The 3 circles and the triangle outside the ellipse are outliers, that is overlaps with a composition significantly different from the rest (P<0.05).