| Literature DB >> 34798370 |
Abstract
Viruses may evolve to increase the amount of encoded genetic information by means of overlapping genes, which utilize several reading frames. Such overlapping genes may be especially impactful for genomes of small size, often serving a source of novel accessory proteins, some of which play a crucial role in viral pathogenicity or in promoting the systemic spread of virus. Diverse genome-based metrics were proposed to facilitate recognition of overlapping genes that otherwise may be overlooked during genome annotation. They can detect the atypical codon bias associated with the overlap (e.g. a statistically significant reduction in variability at synonymous sites) or other sequence-composition features peculiar to overlapping genes. In this review, I compare nine computational methods, discuss their strengths and limitations, and survey how they were applied to detect candidate overlapping genes in the genome of SARS-CoV-2, the etiological agent of COVID-19 pandemic.Entities:
Mesh:
Year: 2021 PMID: 34798370 PMCID: PMC8594276 DOI: 10.1016/j.coviro.2021.10.009
Source DB: PubMed Journal: Curr Opin Virol ISSN: 1879-6257 Impact factor: 7.090
Figure 1Orientation of same-strand overlapping genes.
(a) Overlapping gene with the downstream ORF shifted one nucleotide 3′ with respect to the upstream ORF (+1 overlap, known also as −2). It contains three types of codon position (cp): i) cp13 in which the first codon position of upstream ORF overlaps the third codon position of downstream ORF; ii) cp21 in which the second codon position of upstream ORF overlaps the first codon position of downstream ORF; iii) cp32 in which the third codon position of upstream ORF overlaps the second codon position of downstream ORF. (b) Overlapping gene with the downstream ORF shifted two nucleotides 3′ with respect to the upstream ORF (+2 overlap, known also as −1). It contains three types of codon position (cp): i) cp12 in which the first codon position of upstream ORF overlaps the second codon position of downstream ORF; ii) cp23 in which the second codon position of upstream ORF overlaps the third codon position of downstream ORF; iii) cp31 in which the third codon position of upstream ORF overlaps the first codon position of downstream ORF. According to the genetic code and on average, a substitution at first codon position causes amino acid change in 95% of cases, at second position in 100% of cases, and at third position in 28% of cases.
Computational methods to detect overlapping genes in viruses
| Name of the method | References | Description | Features |
|---|---|---|---|
| SeqComp | [ | Detects overlapping genes based on their peculiar nucleotide and amino acid composition. | High sensitivity and low specificity. |
| CodScr + SeqComp | [ | Detects overlapping genes based on their peculiar nucleotide and amino acid composition and a statistically significant bias in codon usage. | Good sensitivity and specificity. |
| Codon test | [ | Detects overlapping genes on the basis of a length significantly longer than expected by chance; includes a codon-permutation test and a synonymous-mutation test. | High sensitivity for long overlapping genes but intermediate for short overlapping genes. Low specificity. |
| GOPHIX | [ | Detects overlapping ORFs on the basis of a significant enrichment in a set of 20 codons that are overrepresented in the protein-coding genes. | Sensitivity and specificity not reported. |
| Synplot2 | [ | Detects overlapping genes by selecting regions with a significantly enhanced conservation at synonymous sites, compared to a null model of neutral evolution. | High sensitivity. Poorly effective for too divergent, or too similar, sequences. |
| FRESCo | [ | Detects overlapping genes by selecting regions with an excess of synonymous constraints, under models of neutral and non-neutral evolution. | Good sensitivity and high specificity. |
| PhyloCSF | [ | Detects overlapping genes by selecting regions evolving under strong protein-coding constraint. | Sensitivity and specificity not reported. |
| cRegions | [ | Detects overlapping functional elements by identifying regions where the nucleotide sequence is significantly more conserved than expected. | Sensitivity and specificity not reported. |
| OLGenie | [ | Detects overlapping genes by estimating signs of strong purifying selection. | Intermediate sensitivity and specificity. |
Further features of the computational methods to detect overlapping genes in viruses
| Name of the method and references | Does it require as input single or multiple sequences? | Type of overlap detected | Does it provide a P-value in the results? | Availability |
|---|---|---|---|---|
| SeqComp [ | Single nucleotide sequences | Protein-protein coding | No | Not implemented |
| CodScr + SeqComp [ | Single nucleotide sequences | Protein-protein coding | No | Not implemented |
| Codon test [ | Single nucleotide sequences | Protein-protein coding | Yes | Script at |
| GOPHIX [ | Single nucleotide sequences | Protein-protein coding | No | Not implemented |
| Synplot2 [ | Multiple homologous sequences | Protein-protein coding or functional RNA element | Yes | Web site at |
| FRESCo [ | Multiple homologous sequences | Protein-protein coding or functional RNA element | No | Script at |
| PhyloCSF [ | Multiple homologous sequences | Protein-protein coding | Yes | Script at |
| cRegions [ | Multiple homologous sequences | Protein-protein coding or functional RNA element | Yes | Web site at |
| OLGenie [ | Multiple homologous sequences | Protein-protein coding | Yes | Script at |
List of the candidate overlapping ORFs detected in SARS-CoV-2 using six prediction methods
| Candidate overlapping ORF | Length (nt) | Ancestral overlapping gene | Boundaries of the candidate overlapping ORF | Prediction methods |
|---|---|---|---|---|
| ORF3c | 126 | ORF3a | 25457−25582 | CodScr + SeqComp [ |
| ORF3d | 174 | ORF3a | 25524−25697 | Codon test [ |
| ORF9b | 294 | Nucleocapsid | 28284−28577 | CodScr + SeqComp [ |
| ORF9c | 222 | Nucleocapsid | 28734−28955 | GOFIX [ |
| ORF-Sh | 120 | Spike | 24051−24170 | CodScr + SeqComp [ |
| ORF-Mh | 180 | Membrane | 26693−26872 | CodScr + SeqComp [ |
Boundaries of the predicted ORFs refer to the reference genome sequence of SARS-CoV-2 (NC_045512.2).
Term ‘h’ stands for hypothetical.