Literature DB >> 33774510

Conflicting and ambiguous names of overlapping ORFs in the SARS-CoV-2 genome: A homology-based resolution.

Irwin Jungreis1, Chase W Nelson2, Zachary Ardern3, Yaara Finkel4, Nevan J Krogan5, Kei Sato6, John Ziebuhr7, Noam Stern-Ginossar4, Angelo Pavesi8, Andrew E Firth9, Alexander E Gorbalenya10, Manolis Kellis11.   

Abstract

At least six small alternative-frame open reading frames (ORFs) overlapping well-characterized SARS-CoV-2 genes have been hypothesized to encode accessory proteins. Researchers have used different names for the same ORF or the same name for different ORFs, resulting in erroneous homological and functional inferences. We propose standard names for these ORFs and their shorter isoforms, developed in consultation with the Coronaviridae Study Group of the International Committee on Taxonomy of Viruses. We recommend calling the 39 codon Spike-overlapping ORF ORF2b; the 41, 57, and 22 codon ORF3a-overlapping ORFs ORF3c, ORF3d, and ORF3b; the 33 codon ORF3d isoform ORF3d-2; and the 97 and 73 codon Nucleocapsid-overlapping ORFs ORF9b and ORF9c. Finally, we document conflicting usage of the name ORF3b in 32 studies, and consequent erroneous inferences, stressing the importance of reserving identical names for homologs. We recommend that authors referring to these ORFs provide lengths and coordinates to minimize ambiguity caused by prior usage of alternative names.
Copyright © 2021 The Author(s). Published by Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Accessory protein; Alternative reading frame; Nomenclature; ORF2b; ORF3b; ORF3c; ORF3d; ORF9a; ORF9b; Open reading frame; Overlapping ORF; SARS-CoV-2

Year:  2021        PMID: 33774510      PMCID: PMC7967279          DOI: 10.1016/j.virol.2021.02.013

Source DB:  PubMed          Journal:  Virology        ISSN: 0042-6822            Impact factor:   3.616


Introduction

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the recently identified strain (F.Wu et al., 2020a; Zhou et al., 2020; Zhu et al., 2020) of the species Severe acute respiratory syndrome-related coronavirus in the family Coronaviridae (subgenus Sarbecovirus, genus Betacoronavirus, subfamily Orthocoronavirinae) (Gorbalenya et al., 2020) that is the causative agent of coronavirus disease 2019 (COVID-19). Characterization of the SARS-CoV-2 proteome is vital for understanding its molecular biology and for development of countermeasures against the COVID-19 pandemic. Of particular interest are proteins that are unique to SARS-CoV-2, differ substantially from their SARS-CoV homologs, or have not been well characterized in other viruses of this species. Coronaviruses have positive-sense single-stranded RNA genomes that encode proteins expressed from genomic and subgenomic RNAs using complex regulation at the transcriptional, translational, and post-translational levels (Fung et al., 2016; Fung and Liu, 2018; Sola et al., 2015). Some of the protein-coding open reading frames (ORFs) are conserved across coronaviruses, with homologs in all strains, and were named according to a uniform coronavirus-wide nomenclature (de Groot et al., 2012). At the 5′ end are two large ORFs, ORF1a and ORF1b. ORF1a encodes polyprotein pp1a, and the combination of ORF1a and ORF1b encodes polyprotein pp1ab via a programmed frameshift. Polyproteins pp1a and pp1ab are proteolytically processed to yield 11 and 15 non-structural proteins (“nsp’s”), respectively (16 unique, nsp1-nsp16). These include the 3C-like cysteine proteinase (nsp5), RNA-dependent RNA polymerase (nsp12), helicase (nsp13), and exonuclease (nsp14) (Snijder et al., 2003). The name ORF1ab is sometimes used to refer to the two ORFs combined via the frameshift. However, we refer to ORF1a and ORF1b as separate ORFs following common practice in the nidovirus field motivated by their large sizes and small overlap, despite the fact that ORF1b begins at a frameshift site rather than a start codon, unlike the other ORFs we discuss here. The other ORFs conserved across coronaviruses encode, from 5′ to 3′, S (Spike protein), E (Envelope), M (Membrane), and N (Nucleocapsid). Other “accessory” ORFs, located in the region downstream of ORF1b, may be species-specific or present only in some strains of a species. SARS-CoV-2 has a full complement of ORFs previously identified in other viruses of the species Severe acute respiratory syndrome-related coronavirus, which includes the prototype SARS-CoV, the causative agent of the 2002–2003 SARS outbreak. In addition to the ORFs common to all coronaviruses these include, from 5′ to 3′, the accessory genes ORF3a, ORF6, ORF7a, ORF7b, and ORF8 (split into ORF8a and ORF8b in some SARS-CoV isolates) (Cui et al., 2019; Liu et al., 2014; Wu et al., 2020a). Because of the unprecedented interest in SARS-CoV-2, its proteome has been extensively investigated by various experimental and computational techniques. One additional independent ORF, ORF10, and several additional ORFs overlapping S, ORF3a, and N in alternative positive-sense reading frames have been hypothesized to encode functional proteins. Some of these alternative-frame ORFs are unique to SARS-CoV-2, some are completely or partially homologous to ones already described for SARS-CoV, and one is present in SARS-CoV but was not identified until the SARS-CoV-2 genome was investigated. Alternative-frame ORFs could be translated from the same subgenomic RNA as the main ORF via leaky scanning or internal ribosome entry, from a subgenomic RNA specific to the alternative-frame ORF, or via a translational frameshift (Di et al., 2017; Firth and Brierley, 2012; Irigoyen et al., 2016; Kim et al., 2020b; Liu and Inglis, 1992; O’Connor and Brian, 2000; Thiel and Siddell, 1994). Due in part to the biological and evolutionary complexity of these ORFs and the incremental nature of scientific discovery, different research groups have assigned different names to the alternative-frame ORFs, which has complicated clear scientific communication. Here we propose a standard set of names for these overlapping SARS-CoV-2 ORFs for use by the scientific community in order to facilitate unambiguous communication and minimize confusion while the coding potential and biological function of these ORFs continues to be investigated.

Results and discussion

SARS-CoV-2 overlapping ORFs and their ambiguous names

The term “open reading frame” or “ORF” has been used with slightly different meanings by different authors. Here we use the term to mean any contiguous stretch of RNA codons beginning with a start codon, ending with a stop codon, and with no intermediate in-frame stop codons. Given appropriate evidence, the 5′ end of the ORF might be moved to a site with a known stop codon readthrough or frameshift signal, as in the case of ORF1b, in order to accommodate the complexity of genome expression in viruses. (Note that, although we require an ORF to end with a stop codon, we do not include the stop codon when we report the lengths and coordinates of the ORF.) We do not require that an ORF exceeds some minimum length or that undisputed evidence is available for its translation into a protein. In what follows, we will only be discussing ORFs with AUG start codons, but our definition would include ORFs with other start codons (typically near-cognate to AUG, such as CUG). By this definition, the conceptual translation of the nucleotide sequence using a codon table determines whether a genome region is an ORF, whereas experimental or computational evidence is needed to determine if an ORF is indeed translated and encodes a functional protein during virus infection. This evidence may come from, but is not limited to, ribosome profiling, protein or peptide detection, and observation of evolutionary signals. Although a large number of ORFs satisfy our definition, we will only be discussing ORFs for which some evidence has suggested translation. Their consideration would benefit from having agreed nomenclature, even if for some of them this evidence may not pass the test of time. At least six ORFs overlapping S, ORF3a, and N in alternative reading frames have been hypothesized to encode functional proteins. These ORFs are detailed in Fig. 1 and Table 1 , and issues relating to their naming are discussed in the following paragraphs.
Fig. 1

Browser image of recommended names for overlapping ORFs. UCSC Genome Browser images showing our recommended names and the number of amino acids (below name) for overlapping ORFs (light green or pink background for ORFs whose codons are shifted 1 or 2 nucleotides, respectively, in the 3′ direction from those of the main ORF, white background). AUG (green) and stop codons (red) are shown in each of three positive-sense genomic reading frames. (A.). 5′ end of Spike ORF (S) containing ORF2b (39 codons). B. ORF3a containing ORFs 3c (41 codons), 3d (57 codons), 3d-2 (33 codons), and 3b (22 codons). The region homologous to SARS-CoV ORF3b, which overlaps the 3′ half of ORF3a and the 5′ end of the envelope protein ORF (E) is also shown (light blue background). Note that ORFs 3a, 3c, and 3d are in different reading frames (+0, +1, and +2, respectively), so the 59 nucleotide region common to all three could be a rare example of RNA translated in three different reading frames. C. Nucleocapsid ORF (N) containing ORFs 9b (97 codons) and 9c (73 codons). (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

Table 1

Recommended standard names. Recommended standard names for each of six ORFs overlapping S, ORF3a, or N, in 5′–3′ order, and the shorter isoform of ORF3d, with number of codons, coordinates, and a list of other names that have been used in previous publications or preprints. Codon counts and coordinates do not include the stop codon. Coordinates are with respect to the Wuhan-Hu-1 reference genome (NCBI: NC_045512.2). Frame +1 or +2 indicates that codons are shifted one or two nucleotides, respectively, in the 3′ direction from codons in the main (larger) ORF, which occupies frame +0.

Recommended ORF nameLength (codons)CoordinatesFrame relative to main (+0) ORFOther names used (not recommended)SARS-CoV homolog
ORF2b3921744-21860+1 (S)S.iORF1None
ORF3c4125457-25579+1 (ORF3a)ORF3h, 3a.iORF1, ORF3bUnnamed
ORF3dORF3d-2573325524-2569425596-25694+2 (ORF3a)+2 (ORF3a)ORF3b, ORF3c3a.iORF2NoneNone
ORF3b2225814-25879+1 (ORF3a)5′ end of ORF3b
ORF9b9728284-28574+1 (N)ORF9a, N.iORF1ORF9b
ORF9c7328734-28952+1 (N)ORF9b, ORF14ORF9c, ORF14
Browser image of recommended names for overlapping ORFs. UCSC Genome Browser images showing our recommended names and the number of amino acids (below name) for overlapping ORFs (light green or pink background for ORFs whose codons are shifted 1 or 2 nucleotides, respectively, in the 3′ direction from those of the main ORF, white background). AUG (green) and stop codons (red) are shown in each of three positive-sense genomic reading frames. (A.). 5′ end of Spike ORF (S) containing ORF2b (39 codons). B. ORF3a containing ORFs 3c (41 codons), 3d (57 codons), 3d-2 (33 codons), and 3b (22 codons). The region homologous to SARS-CoV ORF3b, which overlaps the 3′ half of ORF3a and the 5′ end of the envelope protein ORF (E) is also shown (light blue background). Note that ORFs 3a, 3c, and 3d are in different reading frames (+0, +1, and +2, respectively), so the 59 nucleotide region common to all three could be a rare example of RNA translated in three different reading frames. C. Nucleocapsid ORF (N) containing ORFs 9b (97 codons) and 9c (73 codons). (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.) Recommended standard names. Recommended standard names for each of six ORFs overlapping S, ORF3a, or N, in 5′–3′ order, and the shorter isoform of ORF3d, with number of codons, coordinates, and a list of other names that have been used in previous publications or preprints. Codon counts and coordinates do not include the stop codon. Coordinates are with respect to the Wuhan-Hu-1 reference genome (NCBI: NC_045512.2). Frame +1 or +2 indicates that codons are shifted one or two nucleotides, respectively, in the 3′ direction from codons in the main (larger) ORF, which occupies frame +0. UniProt (The UniProt Consortium, 2019) annotates two ORFs overlapping N in a different reading frame, namely a 97 codon ORF with coordinates 28284-28574, which they call ORF9b, and a 73 codon ORF with coordinates 28734-28952, which they call ORF14. (As a result of our recommendation, the 73 codon ORF is called ORF9c beginning with UniProt release 2021_01.) The name ORF14, which is out of sequence from the other SARS-CoV-2 ORF names, dates back to the 2003 paper that introduced the SARS-CoV genome (Marra et al., 2003), which numbered all ORFs sequentially, including overlapping ORFs. Later papers renumbered so that overlapping ORFs were distinguished using different letters following a shared number, but the name ORF14 continued to be used by some authors whereas others used the name ORF9c. Various authors have referred to the 97 and 73 codon SARS-CoV-2 ORFs overlapping N, respectively, as ORF9a and ORF9b (Cagliani et al., 2020; Davidson et al., 2020; Wu et al., 2020a), ORF9b and ORF9c (Gordon et al., 2020; Michel et al., 2020; Nelson et al., 2020b), or ORF13 and ORF14 (Lu et al., 2020), resulting in ambiguity about whether ORF9b refers to the 97 or 73 codon ORF. Biological and evolutionary complexity have engendered even greater confusion about the names of ORFs overlapping ORF3a. SARS-CoV contains an alternative-frame 154 codon ORF, ORF3b, that partially overlaps both ORF3a and E (Chan et al., 2005), but the homologous 155 codon region in SARS-CoV-2 contains several in-frame stop codons. The longest alternative-frame ORF overlapping SARS-CoV-2 ORF3a is the 57 codon ORF with coordinates 25524-25694 that overlaps a 5′-proximal portion of ORF3a that has no homology to SARS-CoV ORF3b (Fig. 1), though this ORF is truncated to 13 codons in a substantial fraction of isolates (Lu et al., 2020). Because there is no SARS-CoV-2 ORF of comparable length in the region homologous to SARS-CoV ORF3b, Chan et al. (2020) referred to this 57 codon ORF as ORF3b (the paper does not explicitly state the length or coordinates, and ORF3b is not included in the corresponding NCBI record, accession MN975262, but the ORF can be inferred from the amino acid sequence specified in their Fig. 4). However, Konno et al. used the name ORF3b to refer to the 22 codon ORF with coordinates 25814-25879 at the 5′ end of the region homologous to SARS-CoV ORF3b, which they reported to be an interferon antagonist when expressed from a plasmid (Konno et al., 2020b), a property that had previously been reported for the much longer SARS-CoV ORF3b (Kopecky-Bromberg et al., 2007). Adding to the potential for confusion, the non-homologous 57 codon ORF overlapping ORF3a has also been reported to function as an interferon antagonist in a paper that also referred to it as ORF3b (Lu et al., 2020). The 57 codon ORF was predicted to be translated and functional based on a statistical test (Schlub et al., 2018) for unexpectedly long overlapping ORFs, a ribosome profiling analysis, and a d N/d S analysis comparing SARS-CoV-2 to pangolin-CoV GX/P5L, and was named ORF3c in an early preprint (Nelson et al., 2020a), but its name was changed to ORF3d in the final published version (Nelson et al., 2020b) to reflect the consensus reported here. It was also predicted to be a bona fide gene using an independent sequence composition analysis method, but left unnamed (Pavesi, 2020). Complicating matters further, a ribosome profiling study reported evidence for translation of a 33 codon isoform of the 57 codon ORF that starts at a downstream in-frame AUG (coordinates 25596-25694), calling it ORF3a.iORF2, but did not obtain evidence that the full 57 codon isoform is translated (Finkel et al., 2020). As this example illustrates, some proteins have more than one potential start site and it can be difficult to determine which are functionally important. Other studies that discuss the 22 or 57 codon ORFs are listed in Table 2 and Supplementary Table 1.
Table 2

ORF3b studies. Thirty two studies that use the name “ORF3b”, but do not distinguish the 22 codon and 57 codon ORFs as separate entities. Information about what each study was investigating and how we determined the ORF referred to is provided in Supplementary Table 1.

ORF called “ORF3b” in studyRecommended nameStudy typeStudies (first author and citation)
22 codon ORF
ORF3b
Genome report(Wu et al., 2020a)
Empirical(Konno et al., 2020b; Lokugamage et al., 2020; Xia et al., 2020; Zhang et al., 2020)
Review
Sa Ribero et al. (2020)
57 codon ORF
ORF3d
Genome ReportChan et al. (2020)
Empirical(Banerjee et al., 2020; Gordon et al., 2020; Hachim et al., 2020; Hayn et al., 2020; Lam et al., 2020; Laurent et al., 2020; Samavarchi-Tehrani et al., 2020.; St-Germain et al., 2020)
Laboratory resource - sequence clone collection(Kim et al., 2020a)
Computational(Michel et al., 2020; Pasquier and Robichon, 2020; Sadegh et al., 2020)
Review
(Celik et al., 2020; Garofalo et al., 2020; Helmy et al., 2020; Taefehshokr et al., 2020; Wu et al., 2020b; Yang et al., 2020; Yi et al., 2020; Yoshimoto, 2020; Zinzula, 2020)
UnclearUnclearEmpirical(Lei et al., 2020; Nabeel-Shah et al., 2020; Yuen et al., 2020)
ComputationalSun (2020)
ORF3b studies. Thirty two studies that use the name “ORF3b”, but do not distinguish the 22 codon and 57 codon ORFs as separate entities. Information about what each study was investigating and how we determined the ORF referred to is provided in Supplementary Table 1. A third ORF overlapping ORF3a, the 41 codon ORF with coordinates 25457-25579, was proposed to be translated based on synonymous constraint across 6 closely-related strains of the species and called ORF3h (Cagliani et al., 2020). This ORF was independently identified using ribosome profiling (Finkel et al., 2020), by synonymous constraint in a larger group of strains (Firth, 2020), and using evolutionary signatures of protein-coding regions (Jungreis et al., 2021), and referred to as ORF3a.iORF1, ORF3c, and ORF3c in these three respective studies, with additional evidence of purifying selection reported by comparing different SARS-CoV-2 isolates (Nelson et al., 2020b). Adding to the confusion, this ORF has also been referred to as 3b protein (Pavesi, 2020). Interestingly, the 41 codon and 57 codon ORFs have a 59-nucleotide overlap (including the stop codon of the former), so if both encode functional proteins then this region of ORF3a would be translated in the main reading frame and both alternative reading frames, three frames in total (ORFs 3a, 3c and 3d, Fig. 1B). Translation of three genes overlapping the same sites in different reading frames is rare but known to occur in at least one other virus, namely Env, Tat, and Rev in HIV-2 (Bakouche et al., 2013). Lastly, the 39 codon ORF with coordinates 21744-21860 overlapping the Spike protein was found to show evidence of translation in a ribosome profiling experiment (Finkel et al., 2020). The sequence of this ORF displays evidence of purifying selection between human hosts, using a π N/π S method intended for use with overlapping genes (Nelson et al., 2020b). Ambiguity in the usage of the name ORF3b has been particularly confusing. Two of the earliest papers about the SARS-CoV-2 genome used the term ORF3b to mean different genomic regions: Wu et al. show ORF3b as the region homologous to SARS-CoV ORF3b in their Fig. 1 and their Supplementary Table 6 without noting the in-frame stop codons (F. Wu et al., 2020a), whereas Chan et al. use the name ORF3b to refer to the 57 codon ORF, which has no homology to SARS-CoV ORF3b (Chan et al., 2020). At the time of writing, at least 30 subsequent published papers or preprints have used the term ORF3b to refer to one or both of these two ORFs, many of which provide little if any information from which the reader might deduce which ORF is being referred to (Table 2, Supplementary Table 1). For example, a recent report about the antibody response to SARS-CoV-2 ORF3b (Hachim et al., 2020) does not state which ORF3b is referred to, only that proteins were chosen based on previous studies, and citing the two aforementioned papers (Chan et al., 2020; Wu et al., 2020a), which have different definitions of ORF3b. We were able to infer from the PCR primers in their Supplementary Table 5 (Hachim et al., 2020) (confirmed by personal communication) that it refers to the 57 amino acid protein. A subsequent preprint about the 22 amino acid protein cited the Hachim et al. study (Hachim et al., 2020) of the 57 amino acid protein as evidence of expression (Konno et al., 2020a), though this was corrected in the published version (Konno et al., 2020b). On the other hand, a report about interferon evasion (Xia et al., 2020) refers to one of the investigated ORFs as ORF3b, without providing coordinates, but we were able to infer from the PCR primers in their Supplementary Table S1 that the amplified region was the 155 codon SARS-CoV-2 region homoloous to SARS-CoV ORF3b, so presumably in this case it was the 22 codon ORF that was expressed, since that is at the 5′ end of this region. Furthermore, at least five review papers discuss the 57 codon and 22 codon ORFs overlapping ORF3a as if they were the same entity (Supplementary Table 1). Particularly confusing is one review that mentions ORF3b without providing coordinates and depicts it graphically as the ORF at the 3′ end of the SARS-CoV-2 region homologous to SARS-CoV ORF3b (overlapping the 3′ end of ORF3a and the 5′ end of E), but we can infer from the cited reference that they are referring to the 22 codon ORF at the 5′ end of the homologous region (Sa Ribero et al., 2020). We know of at least one instance of similar confusion for ORF9b: Davidson et al. (2020) report that they “could not detect peptides from ORF9b as described by Bojkova et al., 2020”, but “peptides corresponding to the ORF9a protein were identified”; however, Davidson's ORF9a and Bojkova's ORF9b are different names for the same 97 amino acid protein, and Davidson's ORF9b is the 73 amino acid protein, for which Bojkova et al. also did not find peptides (Bojkova et al., 2020), so the two studies detected the same N-overlapping protein after all. Examples of confusion caused by inconsistent naming continue to accumulate rapidly.

Consensus nomenclature for SARS-CoV-2 overlapping ORFs and isoforms

After discussions with many of the researchers that have published evidence related to overlapping ORFs in SARS-CoV-2, and in consultation with members of the Coronaviridae Study Group of the International Committee on Taxonomy of Viruses, we propose consensus nomenclature for the six aforementioned overlapping ORFs, and the shorter isoform of ORF3d (Table 1, Fig. 1). Our naming decisions were motivated by the following considerations. First, we strongly recommend using the same name as the SARS-CoV homolog where one exists. This rule is in agreement with the prevailing practice for ORF and protein naming by the Coronaviridae Study Group. This facilitates the transfer of knowledge in the coronavirus field and cross-communication between research on SARS-CoV-2, SARS-CoV, and other viruses of the species Severe acute respiratory syndrome-related coronavirus. The importance of this rule can be appreciated by imagining a scenario in which researchers use the name “hemoglobin” to refer to hemoglobin (homologous) in one eukaryotic species, but to insulin (not homologous) in another — accurate transfer of knowledge would be error-prone indeed. Practically, homology recognition is based on analysis of sequence and structure similarity which is not always straightforward for small ORFs unless assisted by other considerations like genome collinearity (synteny). However, we recommend that researchers naming ORFs make every possible effort to determine whether there is homology to ORFs in related strains or species, and avoid names that could lead to mistaken assumptions of homology or lack thereof. At least eight studies using the name ORF3b to refer to the 57 codon SARS-CoV-2 ORF overlapping ORF3a (following Chan et al. (2020)) have mistakenly assumed homology and a possible functional relationship to SARS-CoV ORF3b due to this departure from the homology rule (Supplementary Table 1). The application of the same-name-for-homologs rule to ORF9b is unambiguous, because SARS-CoV and SARS-CoV-2 encode full-length homologs. On the other hand, there are several small ORFs within the region of the SARS-CoV-2 genome homologous to SARS-CoV ORF3b, but to our knowledge only the ORF at the 5′ end of this region, beginning at the AUG codon homologous to the start codon of SARS-CoV ORF3b, has been proposed to be protein-coding (Konno et al., 2020b). Thus, we assign the name ORF3b to the 22 codon ORF, in line with prior studies of ORFs of various lengths in bat viruses of this species homologous to the 5′ end of SARS-CoV ORF3b (Zhou et al., 2012). Second, we maintain the convention of naming overlapping ORFs as ORF{Number1}{letter}, where “Number1” is the number of the main ORF (note that the numeric names of S and N are ORF2 and ORF9a, respectively (Inberg and Linial, 2004)) and “letter” is a lower case letter. We reserve “a” for the main ORF and default to sequential (alphabetical) letters to name additional overlapping ORFs in 5′–3′ order, but retain the flexibility to accommodate the same-name-for-homologs rule or historical usage. In the case of ORFs overlapping ORF3a, “a” is taken by the main ORF and “b” by the SARS-CoV homolog. Thus, we have named the remaining two overlapping ORFs 3c and 3d in 5′–3′ order even though they occur 5′ of ORF3b. In the case of ORF9c, we choose to use “c” because it is 3′ of ORF9b, and because the homologous ORF in SARS-CoV has sometimes been called ORF9c (though it has also been called ORF14). Finally, we extend our convention by naming smaller isoforms of overlapping ORFs using alternative start codons according to the template ORF{Number1}{letter}{-}{Number2}. Specifically, we introduce the name ORF3d-2 for the 33 codon isoform of ORF3d. Whether either, both, or neither of these two isoforms encode a functional protein has yet to be determined, so we have chosen to name the shorter isoform in case it is the only functional isoform, but use a name that relates it to ORF3d in case both are functional. There are several other sub-ORFs that have been proposed to be translated (Finkel et al., 2020) but we have not assigned names to them because, as far as we know, ORF3d-2 is the only one for which anyone has proposed that the shorter form is translated but the longer one is not. If, in the future, names are needed for other smaller isoforms of overlapping ORFs using alternative start codons, we suggest using a naming strategy that is analogous to ORF3d-2. While researchers have presented experimental or computational evidence of translation or function for each of the six overlapping ORFs discussed here, a final consensus in the community has not yet been achieved. We would like to emphasize that in choosing names we are not intending to imply anything about the strength of evidence for translation or function of any of these ORFs, or parts thereof. With the humbling recognition that our knowledge of coronavirus biology in general and the SARS-CoV-2 genome in particular is far from complete, we have tried to suggest naming rules with sufficient flexibility to handle future discoveries.

Conclusions

We have proposed standard names for six SARS-CoV-2 ORFs and one shorter isoform that have been hypothesized to encode accessory proteins. The ORF names we have recommended here have been endorsed by several members of the Coronaviridae Study Group of the International Committee on Taxonomy of Viruses, namely Stanley Perlman, Bart L Haagmans, and Benjamin W Neuman, and two coauthors John Ziebuhr and Alexander E. Gorbalenya; by the other coauthors of this paper, who represent many of the groups that initially proposed or reported additional evidence for the protein-coding status of some of these ORFs; and by the virus curator of SwissProt/UniProt, Philippe Lemercier. We hope that future publications will adopt the recommended names, including the published versions of any current preprints that refer to these ORFs, in order to facilitate unambiguous communication and minimize confusion. We also recommend that authors referring to any of these ORFs explicitly provide the length or genome coordinates with respect to the reference SARS-CoV-2 genome Wuhan-Hu-1 (NCBI: NC_045512.2), and report the name used in any cited paper if it is different. These practices should help to resolve ambiguities caused by names that have already appeared in the literature.

Author contributions

Irwin Jungreis: Conceptualization, Writing - Original Draft, Writing - Review & Editing, Project administration. Chase W. Nelson: Conceptualization, Writing - Review & Editing. Zachary Ardern: Data Curation, Writing - Review & Editing. Yaara Finkel: Writing - Review & Editing. Nevan J. Krogan: Writing - Review & Editing. Kei Sato: Writing - Review & Editing. John Ziebuhr: Writing - Review & Editing. Noam Stern-Ginossar: Writing - Review & Editing. Angelo Pavesi: Writing - Review & Editing. Andrew E. Firth: Conceptualization, Writing - Review & Editing. Alexander E. Gorbalenya: Conceptualization, Supervision, Writing - Review & Editing. Manolis Kellis: Supervision, Writing - Review & Editing.
  19 in total

1.  A proteome-scale map of the SARS-CoV-2-human contactome.

Authors:  Dae-Kyum Kim; Benjamin Weller; Chung-Wen Lin; Dayag Sheykhkarimli; Jennifer J Knapp; Guillaume Dugied; Andreas Zanzoni; Carles Pons; Marie J Tofaute; Sibusiso B Maseko; Kerstin Spirohn; Florent Laval; Luke Lambourne; Nishka Kishore; Ashyad Rayhan; Mayra Sauer; Veronika Young; Hridi Halder; Nora Marín-de la Rosa; Oxana Pogoutse; Alexandra Strobel; Patrick Schwehn; Roujia Li; Simin T Rothballer; Melina Altmann; Patricia Cassonnet; Atina G Coté; Lena Elorduy Vergara; Isaiah Hazelwood; Betty B Liu; Maria Nguyen; Ramakrishnan Pandiarajan; Bushra Dohai; Patricia A Rodriguez Coloma; Juline Poirson; Paolo Giuliana; Luc Willems; Mikko Taipale; Yves Jacob; Tong Hao; David E Hill; Christine Brun; Jean-Claude Twizere; Daniel Krappmann; Matthias Heinig; Claudia Falter; Patrick Aloy; Caroline Demeret; Marc Vidal; Michael A Calderwood; Frederick P Roth; Pascal Falter-Braun
Journal:  Nat Biotechnol       Date:  2022-10-10       Impact factor: 68.164

2.  SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes.

Authors:  Irwin Jungreis; Rachel Sealfon; Manolis Kellis
Journal:  Nat Commun       Date:  2021-05-11       Impact factor: 14.919

Review 3.  Structures and functions of coronavirus replication-transcription complexes and their relevance for SARS-CoV-2 drug design.

Authors:  Brandon Malone; Nadya Urakova; Eric J Snijder; Elizabeth A Campbell
Journal:  Nat Rev Mol Cell Biol       Date:  2021-11-25       Impact factor: 113.915

Review 4.  Computational methods for inferring location and genealogy of overlapping genes in virus genomes: approaches and applications.

Authors:  Angelo Pavesi
Journal:  Curr Opin Virol       Date:  2021-11-16       Impact factor: 7.090

Review 5.  Antiviral effects of azithromycin: A narrative review.

Authors:  Saeed Khoshnood; Maryam Shirani; Amine Dalir; Melika Moradi; Mohammad Hossein Haddadi; Nourkhoda Sadeghifard; Faezeh Sabet Birjandi; Ilya Yashmi; Mohsen Heidary
Journal:  Biomed Pharmacother       Date:  2022-02-04       Impact factor: 6.529

6.  Profiling SARS-CoV-2 HLA-I peptidome reveals T cell epitopes from out-of-frame ORFs.

Authors:  Shira Weingarten-Gabbay; Susan Klaeger; Siranush Sarkizova; Leah R Pearlman; Da-Yuan Chen; Kathleen M E Gallagher; Matthew R Bauer; Hannah B Taylor; W Augustine Dunn; Christina Tarr; John Sidney; Suzanna Rachimi; Hasahn L Conway; Katelin Katsis; Yuntong Wang; Del Leistritz-Edwards; Melissa R Durkin; Christopher H Tomkins-Tinch; Yaara Finkel; Aharon Nachshon; Matteo Gentili; Keith D Rivera; Isabel P Carulli; Vipheaviny A Chea; Abishek Chandrashekar; Cansu Cimen Bozkus; Mary Carrington; Nina Bhardwaj; Dan H Barouch; Alessandro Sette; Marcela V Maus; Charles M Rice; Karl R Clauser; Derin B Keskin; Daniel C Pregibon; Nir Hacohen; Steven A Carr; Jennifer G Abelin; Mohsan Saeed; Pardis C Sabeti
Journal:  Cell       Date:  2021-06-03       Impact factor: 66.850

Review 7.  SARS-CoV-2 Accessory Proteins in Viral Pathogenesis: Knowns and Unknowns.

Authors:  Natalia Redondo; Sara Zaldívar-López; Juan J Garrido; Maria Montoya
Journal:  Front Immunol       Date:  2021-07-07       Impact factor: 7.561

8.  Prediction of two novel overlapping ORFs in the genome of SARS-CoV-2.

Authors:  Angelo Pavesi
Journal:  Virology       Date:  2021-07-28       Impact factor: 3.513

Review 9.  Coronavirus, the King Who Wanted More Than a Crown: From Common to the Highly Pathogenic SARS-CoV-2, Is the Key in the Accessory Genes?

Authors:  Nathalie Chazal
Journal:  Front Microbiol       Date:  2021-07-14       Impact factor: 5.640

10.  Evolution of enhanced innate immune evasion by SARS-CoV-2.

Authors:  Lucy G Thorne; Mehdi Bouhaddou; Ann-Kathrin Reuschl; Lorena Zuliani-Alvarez; Ben Polacco; Adrian Pelin; Jyoti Batra; Matthew V X Whelan; Myra Hosmillo; Andrea Fossati; Roberta Ragazzini; Irwin Jungreis; Manisha Ummadi; Ajda Rojc; Jane Turner; Marie L Bischof; Kirsten Obernier; Hannes Braberg; Margaret Soucheray; Alicia Richards; Kuei-Ho Chen; Bhavya Harjai; Danish Memon; Joseph Hiatt; Romel Rosales; Briana L McGovern; Aminu Jahun; Jacqueline M Fabius; Kris White; Ian G Goodfellow; Yasu Takeuchi; Paola Bonfanti; Kevan Shokat; Natalia Jura; Klim Verba; Mahdad Noursadeghi; Pedro Beltrao; Manolis Kellis; Danielle L Swaney; Adolfo García-Sastre; Clare Jolly; Greg J Towers; Nevan J Krogan
Journal:  Nature       Date:  2021-12-23       Impact factor: 69.504

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.