| Literature DB >> 32667280 |
Abstract
Identification of the full complement of genes in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a crucial step towards gaining a fuller understanding of its molecular biology. However, short and/or overlapping genes can be difficult to detect using conventional computational approaches, whereas high-throughput experimental approaches - such as ribosome profiling - cannot distinguish translation of functional peptides from regulatory translation or translational noise. By studying regions showing enhanced conservation at synonymous sites in alignments of SARS-CoV-2 and related viruses (subgenus Sarbecovirus) and correlating the results with the conserved presence of an open reading frame (ORF) and a plausible translation mechanism, a putative new gene - ORF3c - was identified. ORF3c overlaps ORF3a in an alternative reading frame. A recently published ribosome profiling study confirmed that ORF3c is indeed translated during infection. ORF3c is conserved across the subgenus Sarbecovirus, and encodes a 40-41 amino acid predicted transmembrane protein.Entities:
Keywords: 3c; ORF3c; SARS-CoV; coronavirus; overlapping gene; sarbecovirus
Mesh:
Substances:
Year: 2020 PMID: 32667280 PMCID: PMC7660454 DOI: 10.1099/jgv.0.001469
Source DB: PubMed Journal: J Gen Virol ISSN: 0022-1317 Impact factor: 3.891
Fig. 1.Synonymous site conservation analysis of sarbecoviruses. (a) Map of the SARS-CoV-2 genome (29 903 nt; black rectangle). Known ORFs are overlaid in red, blue or green depending on their relative reading frames (+0, +1, +2, respectively). Below are shown the positions of AUG (green) and stop (black) codons in each of the three reading frames, as indicated, in the reference sequence NC_045512.2. The putative ORF3c is indicated in pink (reading frame +2). (b) Synonymous site conservation analysis of 54 aligned sarbecovirus sequences. The red line shows the probability that the observed conservation could occur under a null model of neutral evolution at synonymous sites, whereas the brown line depicts the ratio of the observed number of substitutions to the number expected under the null model. The horizontal dashed grey line indicates a P=0.05 threshold after an approximate correction for multiple testing, namely scaling by (25 codon window size)/(length of plot in codons). Prior to analysis, the alignment was mapped to NC_045512.2 coordinates by removing alignment positions in which NC_045512.2 contained a gap character. NCBI accession numbers: NC_045512.2, AY274119.3, DQ022305.2, DQ071615.1, DQ084199.1, DQ084200.1, DQ412042.1, DQ412043.1, DQ648856.1, DQ648857.1, GQ153539.1, GQ153540.1, GQ153541.1, GQ153542.1, GQ153543.1, GQ153544.1, GQ153545.1, GQ153546.1, GQ153547.1, GQ153548.1, GU190215.1, JX993987.1, JX993988.1, KC881006.1, KF294457.1, KF367457.1, KF569996.1, KJ473811.1, KJ473812.1, KJ473813.1, KJ473816.1, KP886808.1, KP886809.1, KT444582.1, KY352407.1, KY417142.1, KY417143.1, KY417145.1, KY417146.1, KY417147.1, KY417148.1, KY417149.1, KY417150.1, KY417151.1, KY417152.1, KY770858.1, KY770859.1, KY770860.1, MG772933.1, MG772934.1, MK211374.1, MK211376.1, MK211377.1 and MK211378.1.
Fig. 2.Amino acid alignment of sarbecovirus 3c sequences. Amino acids are colour-coded according to their physicochemical properties. Asterisks indicate completely conserved columns in the alignment. The transmembrane region predicted by Phobius is indicated with a black bar below the alignment. Numbers at the right indicate the number of times the particular sequence occurs among the 54 sarbecovirus sequences (see Fig. 1 caption for accession numbers). †, SARS-CoV-1. ‡, SARS-CoV-2. For the sequence beginning with GUG instead of AUG, the genetic decoding (i.e. valine) is shown, even though non-AUG initiation codons are normally expected to be decoded as methionine by initiator Met-tRNA.
Fig. 3.Conservation analyses of the sarbecovirus S and 3a ORFs. In each plot, the upper three panels show the positions of alignment gaps (grey), stop codons (black) and AUG codons (green) in each reading frame in each of the 54 aligned sequences. In these plots the canonical ORFs (i.e. S and 3a) are taken as reading frame +0. Below is shown the analysis of conservation at synonymous sites (see Fig. 1 caption for details). In contrast to Fig. 1, here all alignment gaps were retained instead of mapping to NC_045512.2 coordinates. Novel alternative-frame translated ORFs identified in the SARS-CoV-2 ribosome profiling study of Finkel et al. [7] are indicated with yellow rectangles; for S, the two in-frame alternative initiation sites are indicated with pink arrows. ORF3c is labelled. Note that only ORF3c has conserved start and stop codon positions across sarbecoviruses and only ORF3c coincides with a region of enhanced synonymous site conservation.