| Literature DB >> 35575901 |
Diego Forni1, Rachele Cagliani1, Cristian Molteni1, Federica Arrigoni2, Alessandra Mozzi1, Mario Clerici3,4, Luca De Gioia2, Manuela Sironi1.
Abstract
Coronaviruses (CoVs) have complex genomes that encode a fixed array of structural and nonstructural components, as well as a variety of accessory proteins that differ even among closely related viruses. Accessory proteins often play a role in the suppression of immune responses and may represent virulence factors. Despite their relevance for CoV phenotypic variability, information on accessory proteins is fragmentary. We applied a systematic approach based on homology detection to create a comprehensive catalogue of accessory proteins encoded by CoVs. Our analyses grouped accessory proteins into 379 orthogroups and 12 super-groups. No orthogroup was shared by the four CoV genera and very few were present in all or most viruses in the same genus, reflecting the dynamic evolution of CoV genomes. We observed differences in the distribution of accessory proteins in CoV genera. Alphacoronaviruses harboured the largest diversity of accessory open reading frames (ORFs), deltacoronaviruses the smallest. However, the average number of accessory proteins per genome was highest in betacoronaviruses. Analysis of the evolutionary history of some orthogroups indicated that the different CoV genera adopted similar evolutionary strategies. Thus, alphacoronaviruses and betacoronaviruses acquired phosphodiesterases and spike-like accessory proteins independently, whereas horizontal gene transfer from reoviruses endowed betacoronaviruses and deltacoronaviruses with fusion-associated small transmembrane (FAST) proteins. Finally, analysis of accessory ORFs in annotated CoV genomes indicated ambiguity in their naming. This complicates cross-communication among researchers and hinders automated searches of large data sets (e.g., PubMed, GenBank). We suggest that orthogroup membership is used together with a naming system to provide information on protein function.Entities:
Keywords: accessory proteins; coronavirus; naming system; phosphodiesterase; remote homology
Mesh:
Year: 2022 PMID: 35575901 PMCID: PMC9328142 DOI: 10.1111/mec.16531
Source DB: PubMed Journal: Mol Ecol ISSN: 0962-1083 Impact factor: 6.622
FIGURE 1Overview of the applied workflow. Schematic representation of the workflow applied in this study for the characterization of CoV accessory proteins and for the identification of orthogroups/supergroups
FIGURE 2AlphaCoV and betaCoV genome organization. A schematic genome organization of representative alphaCoV (a) and betaCoV (b) genomes. Super‐groups and relevant orthogroups are coloured as shown in the key; known and unknown orthogroups are coloured in black and grey, respectively, and orthogroup names are reported. For all viruses, ORF1ab is not shown to scale and structural proteins are coloured in white. For human coronaviruses, accessory protein names are also reported. Viruses reported in the figure are as follows: Common shrew coronavirus Tibet‐2014: KY370053; mink coronavirus 1: MN535737; Lucheng Rn rat coronavirus: MT820627; HCoV‐NL63: NC_005831; HCoV‐229E: NC_002645; Alphacoronavirus bat‐CoV/P.kuhlii/Italy/3398–19/2015: NC_046964; coronavirus BtRs‐AlphaCoV/YN2018: MK211373; Rousettus bat coronavirus HKU10: NC_018871; Rousettus bat coronavirus GCCDC1: MT350598; Zaria bat coronavirus: HQ166910; SARS‐CoV‐2: NC_045512; SARS‐CoV: NC_004718; MERS‐CoV: NC_019843; Longquan aa mouse coronavirus: KF294357; HCoV‐OC43: NC_006213; HCoV‐HKU1: NC_006577; bat Hp‐betacoronavirus/Zhejiang2013: NC_025217; Rousettus bat coronavirus isolate GCCDC1 356: NC_030886
Super‐group classification and description
| Super‐group | OGs | Contributing viruses |
|
|---|---|---|---|
| ORF3a‐like | alphaOG01 | All alphaCovs | Homology to a number of proteins encoded by betaCoVs (betaOG02 and betaOG22) |
| betaOG02 | Most sarbecoviruses; includes SARS‐CoV‐2 ORF3a | Homology to proteins encoded by alphaCoVs (alphaOG01) and betaCoVs (betaOG22), as well to the coronavirus/torovirus M protein | |
| betaOG08 | All merbecoviruses; includes MERS‐CoV ORF5/NS5 | Homology to HKU9 proteins in betaOG22 | |
| betaOG22 | Most hibecoviruses and nobecoviruses | Homology to a number of proteins encoded by alphaCoVs (alphaOG01) and beta CoVs (betaOG02), as well to the coronavirus/torovirus M protein | |
| ORF7a/ORF8‐like | alphaOG18 | Rodent coronaviruses (luchacoviruses) | Homology to SARS‐Cov‐2 ORF7a (betaOG01) |
| betaOG01 | All sarbecoviruses; includes SARS‐CoV‐2 ORF7a/ORF8 | Homology only to ORFs in the same OG | |
| PDE | alphaOG21 | Most rodent alphaCoVs (luchacoviruses) | Homology to coronavirus/torovirus/rotavirus PDEs; homology to cellular PDEs (AKAP7) |
| alphaOG107 | Only one alphaCoV (Shrew‐CoV) | Homology to coronavirus/torovirus/rotavirus PDEs; homology to cellular PDEs (AKAP7) | |
| betaOG10 | All merbecoviruses; includes MERS‐CoV NS4b | Homology to coronavirus/torovirus/rotavirus PDEs; homology to cellular PDEs (AKAP7) | |
| betaOG16 | Most embecoviruses | Homology to coronavirus/torovirus/rotavirus PDEs; homology to cellular PDEs (AKAP7) | |
| N internal protein | alphaOG02 | Several alphaCoVs | Homology to proteins encoded by betaCoVs (betaOG17) |
| betaOG04 | Subset of sarbecoviruses; includes SARS‐CoV‐2 ORF9b | Homology only to ORFs in the same OG | |
| betaOG12 | Most merbecoviruses | Homology to proteins encoded by betaCoVs (betaOG17). Also homology (84% probability) to sarbecovirus ORF9b protein (betaOG04) | |
| betaOG17 | Most embecoviruses | Homology to proteins encoded by betaCoVs (betaOG12) | |
| 4.8‐kDa protein | betaOG18 | Subset of embecoviruses | No homology (excluding embecovirus proteins) |
| betaOG66 | Only one embecovirus (Buffalo coronavirus B1‐28F) | Homology to the HS4 protein of MHV (betaOG18) | |
| Ns7a‐like | betaOG33 | Subset of nobecoviruses | Homology to HKU9 nonstructural protein 7a |
| betaOG40 | Subset of nobecoviruses | Homology to HKU9 nonstructural protein 7a | |
| betaOG48 | Subset of nobecoviruses; includes HKU9 | No homology | |
| ORF7b‐like | betaOG05 | Several sarbecoviruses; includes SARS‐CoV‐2 ORF7b | Homology (probability 92%) to uncharacterized baculovirus proteins; however, homology extends across a short low‐complexity region, suggesting a false positive result |
| betaOG65 | Subset of sarbecoviruses | Homology to SARS‐CoV and SARS‐CoV‐2 ORF7b | |
| ORF8 Zaria | alphaOG11 | Subset of decacoviruses; one duvinacovirus | Homology to Zaria bat coronavirus ORF8 (betaOG76) |
| betaOG76 | Only Zaria bat coronavirus | No homology | |
| ORF3‐like | alphaOG50 | Two Nyctacovirus; one unclassified alphaCoV | Homology to MERS‐CoV and HKU5 ORF3 (betaOG11) |
| betaOG11 | All merbecoviruses; includes MERS‐CoV ORF3/NS3 | No homology | |
| 3x‐like | alphaOG08 | Carnivore CoVs (minacoviruses) | No homology |
| alphaOG172 | FIPV | Homology to 3x‐like proteins of ferret/mink CoVs (alphaOG08) | |
| 4b‐like | gammaOG03 | Most gammaCoVs; includes IBV 4b | Homology to deltaCoV NS6 proteins (deltaOG1) |
| deltaOG01 | All deltaCoVs | Homology to IBV 4b protein (gammaOG03) | |
| 3b‐like | gammaOG05 | Most gammaCoVs; includes IBV 3b | Homology to deltaCoV 7b proteins (deltaOG11) |
| deltaOG11 | Subset of deltaCoVs | Homology to IBV 3b protein (gammaOG05) |
FIGURE 3GammaCoV and deltaCoV genome organization. A schematic genome organization of representative gammaCoV (a) and deltaCoV (b) genomes. Super‐groups and relevant orthogroups are coloured as shown in the key; unknown orthogroups are coloured in grey and orthogroup names are reported. For all viruses, ORF1ab is not shown to scale and structural proteins are coloured in white. Viruses reported in the figure are as follows: Turkey coronavirus: NC_010800; Canada goose coronavirus: NC_046965; beluga whale coronavirus: NC_010646; infectious bronchitis virus: KP662631; sparrow deltacoronavirus: MG812376; porcine deltacoronavirus HN2019‐C132: MN520206; porcine coronavirus HKU15: NC_039208; common moorhen coronavirus: NC_016996; Wigeon coronavirus: NC_016995
FIGURE 4OG representation in coronaviruses. (a) Counts of OGs shared by different fractions of viruses (from less than 1% to more than 50%). (b) Distribution of OGs by host group. OG counts in different virus host groups are shown (dark bars). Light grey bars indicate the number of OGs that are host group‐specific
FIGURE 5Alpha CoV and beta CoV phylogenies and orthogroup distribution. Maximum‐likelihood phylogenetic tree of the RdRp region of alphaCoVs (a) and betaCoVs (b) generated by iq‐tree using the LG + I + F + G4 and LG + R5 models. Internal nodes with bootstrap values >80% are shown as black dots and tips are coloured according to viral hosts (see key). The distribution of all orthogroups among viral species is also reported. Super‐groups and relevant orthogroups are shown with the same colours as in Figure 2; known and unknown orthogroups are coloured in black and grey, respectively. Asterisks indicate orthogroups with split/paralogous ORFs. Scale bars are expressed as substitutions per site
FIGURE 6DeltaCoV and GammaCoV phylogeny and orthogroup distribution. Maximum‐likelihood phylogenetic tree of the RdRp region of deltaCoV (a) and gammaCoV (b) generated by iq‐tree using the LG + G4 and JTT + F + R6 models. Relevant internal nodes with bootstrap values >80% are shown as black dots. Super‐groups and relevant orthogroups are shown with the same colours as in Figure 3; unknown orthogroups are coloured in grey. Asterisks indicate orthogroups with split/paralogous ORFs. Scale bars are expressed as substitutions per site
List of relevant orthogroups
| OG | Viruses | Homology results |
|---|---|---|
| alphaOG16 | Seven bat viruses | Family of viral and host proteins containing Ig‐like domains (e.g., HCMV RL11 and adenovirus protein E3) |
| alphaOG20 | All rodent CoVs (luchacoviruses) | Spike protein of alphaCoVs (SADS‐CoV and HKU2) |
| alphaOG63 | Two rodent viruses | C‐type lectin domain |
| alphaOG86 | FIPV and TGEV | Phosphoribosylamine–glycine ligase domain of bacterial and eukaryotic organisms |
| betaOG09 | Most merbecoviruses; includes MERS‐CoV ORF4a/NS4a | RNA binding proteins, either from viruses (e.g., nonstructural protein 3 of Rotavirus C, Pernambuco mycovirus 1, Thika virus, NSP1 from Porcine rotavirus) or from cellular organisms (e.g., rat dsRNA‐specific editase 1, human interferon‐inducible double stranded RNA‐dependent protein kinase activator A, human TAR RNA‐binding protein 2) |
| betaOG13 | All embecoviruses | Influenza virus and torovirus haemagglutinin‐esterase |
| betaOG36 | Subset of merbecoviruses hosted by hedgehogs | Several viral and cellular proteins carrying Ig‐like domains. Most viral proteins with high similarity are encoded by human herpesviruses |
| betaOG56 | Two bat nobecoviruses | Rotavirus ns1‐1 proteins |
| betaOG78 | Only one hibecovirus | SARS‐CoV‐2 spike protein |
| gammaOG15 | Three cegacoviruses | Capsid protein (VP90) of mamastoviruses and human/porcine artroviruses |
| gammaOG18 | Three cegacoviruses | Uridine kinases from many different cellular organisms |
| deltaOG12 | Six Buldecoviruses | ns1‐1 product of Rotavirus B |
FIGURE 7FAST proteins in nobecoviruses and buldecoviruses. Maximum‐likelihood phylogenetic tree showing the relationship among coronavirus and reovirus FAST proteins. The tree with bootstrap values was generated by iq‐tree using the best fitting substitution model (LG + G4). In addition to nobecovirus (betaOG56) and buldecovirus (deltaOG12) proteins, the FAST proteins of human rotavirus B (YP_008126848 and AAF69263), porcine rotavirus B (BAL04357 and AUG44809), bat orthoreovirus (JF811580 and NC_020448) and avian orthoreovirus (AIA57457 and DQ996607) were included. Orthogroups are shown with the same colours as in Figures 2 and 3. Scale bar is expressed as substitutions per site
FIGURE 8Spike‐like proteins in alphaCoVs and betaCovs. (a) Alignment of the spike‐like proteins of luchacoviruses (dark grey) with the spike proteins of SADS‐CoV and BtRf‐AlphaCoV/YN2012 (blue). Fully conserved residues are shaded in yellow. The red triangles denote the start and end of the predicted RBD of SADS‐CoV spike protein. The green bar indicates the predicted signal peptide in spike‐like proteins. (b) 3D structure superimposition of (left) the ab initio 3D model of alphaOG20 spike‐like protein (NC_032730) to the RBD of SADS‐CoV spike protein (PDB ID: 6 M39) and of (right) the ab initio 3D model of betaOG78 spike‐like protein (NC_025217) to the N‐terminal domain of SARS‐CoV2 spike protein (PDB ID: 7B62). The lDDT scores for alphaOG20 and betaOG78 spike‐like protein models were 0.75 and 0.73, respectively. Dark grey cartoons correspond to spike‐like ab initio models, and blue cartoons to spike protein structures. (c) Phylogenetic trees of spike proteins and spike‐like proteins identified in luchacoviruses (left, alphaOG20) and hibecoviruses (right, betaOG78). Tip colour indicates alpha (red) and beta (blue) CoVs. The position of SADS‐CoV is indicated by a yellow triangle, along with relevant CoV subgenera. The trees were generated with iq‐tree using the WAG+I + G4 model. Bootstrap values for relevant internal nodes are also reported. Scale bars are expressed as substitutions per site
FIGURE 9Phylogenetic relationship among phosphodiesterases. Phylogenetic trees showing the relationship among viral and host phosphodiesterases. A maximum‐likelihood tree with bootstrap values (a) generated by iq‐tree using the JTT + I + G4 model and a Bayesian tree with posterior probabilities (b) generated by bali‐phy are reported. Values are shown for relevant internal nodes only. IDs for human AKAP7, mouse AKAP7, porcine torovirus PDE and rotavirus a VP3 are NP_057461.2, NP_001366167, YP_008798231 and YP_002302228, respectively. Scale bars are expressed as substitutions per site
FIGURE 10Ambiguous nomenclature of accessory proteins. Word clouds for gene and/or product annotations for 181 proteins belonging to the ORF3a‐like super‐group (a) and 61 viral PDEs (b). Word clouds were generated with WordItOut (https://worditout.com/)