| Literature DB >> 32765822 |
Abstract
The nucleocapsid (N) protein is conserved in all four genera of the coronaviruses, namely alpha, beta, gamma, and delta, and is essential for genome functionality. Bioinformatic analysis of coronaviral N sequences revealed two intrinsically disordered regions (IDRs) at the center of the polypeptide. While both IDR structures were found in alpha, beta, and gamma-coronaviruses, the second IDR was absent in deltacoronaviruses. Two novel coronaviruses, currently placed in the Gammacoronavirus genus, appeared intermediate in this regard, as the second IDR structure could be barely discerned with a low probability of disorder. Interestingly, these two are the only coronaviruses thus far isolated from marine mammals, namely beluga whale and bottlenose dolphin, two highly related species; the N proteins of the viruses were also virtually identical, differing by a single amino acid. These two unique viruses remain phylogenetic oddities, since gammacoronaviruses are generally avian (bird) in nature. Lastly, both IDRs, regardless of the coronavirus genus in which they occurred, were rich in Ser and Arg, in agreement with their disordered structure. It is postulated that the central IDRs make cardinal contributions in the multitasking role of the nucleocapsid protein, likely requiring structural plasticity, perhaps also impinging on coronavirus host tropism and cross-species transmission.Entities:
Keywords: CoV, Coronavirus; CoVID-19, Coronavirus disease-2019; Coronavirus; Host tropism; IDR, Intrinsically disordered region; Infection; Intrinsic disorder; MERS, Middle East Respiratory Syndrome; N, Nucleocapsid; Phosphorylation; Phylogenetic; RNA-binding; SARS, Severe acute respiratory syndrome; Ser/Arg-rich; TGEV, Transmissible Gastroenteritis Virus
Year: 2020 PMID: 32765822 PMCID: PMC7366112 DOI: 10.1016/j.csbj.2020.07.005
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
N protein sequences analyzed.
| Coronavirus (CoV) genus | GenBank Accession# | Graph color legend ( | ID region analysed (aa residue#) |
|---|---|---|---|
| Alpha-CoV | |||
| Lucheng Rn rat CoV | Series 1 | 147–266 | |
| Feline CoV | Series 2 | 138–257 | |
| Canine enteric CoV K378 | Series 3 | 141–260 | |
| TGEV | ANR94935.1 | Series 4 | 144–263 |
| Mink CoV strain WD1127 | Series 5 | 141–259 | |
| Ferret CoV | Series 6 | 143–262 | |
| Human CoV 229E | AGW80953.1 | Series 7 | 144–263 |
| Rhinolophus bat CoV HKU2 | Series 8 | 130–249 | |
| BtRf-AlphaCoV/YN2012 | Series 9 | 131–250 | |
| Beta-CoV | |||
| Pipistrellus bat CoV HKU5 | Series 1 | 150–270 | |
| Tylonycteris bat CoV HKU4 | Series 2 | 149–269 | |
| Bat (unclassified species) CoV | Series 3 | 150–270 | |
| MERS CoV | Series 4 | 150–270 | |
| Bat Hp-betaCoV/Zhejiang2013 | Series 5 | 155–275 | |
| Bat CoV BM48-31/BGR/2008 | Series 6 | 160–280 | |
| SARS-CoV-2 | Series 7 | 161–281 | |
| SARS-CoV-1 | P59595.1 | Series 8 | 162–282 |
| Horseshoe bat SARSr-CoV | Q3LZX4.1 | Series 9 | 160–280 |
| Gamma-CoV | |||
| Canada goose CoV | Series 1 | 152–272 | |
| Turkey CoV | Series 2 | 146–266 | |
| AIBV | Series 3 | 146–266 | |
| Beluga whale CoV SW1 | YP_001876448.1 | Series 4 | 145–234 |
| Bottlenose dolphin CoV | QII89031.1 | Series 5 | 145–234 |
| Delta-CoV | |||
| Common moorhen CoV | Series 1 | 132–252 | |
| Munia CoV HKU13-3514 | YP_002308510.1 | Series 2 | 127–247 |
| Magpie-robin CoV HKU18 | Series 3 | 126–246 | |
| Porcine CoV HKU15 | YP_009513025.1 | Series 4 | 131–251 |
| Thrush CoV HKU12-600 | Series 5 | 127–247 | |
| White-eye CoV HKU16 | Series 6 | 128–248 | |
| Night heron CoV HKU19 | YP_005352867.1 | Series 7 | 122–231 |
| Wigeon CoV HKU20 | YP_005352875.1 | Series 8 | 130–239 |
Representative N proteins of the four genera of coronaviruses, the IDR of which, spanning the indicated amino acid numbers, are plotted in Fig. 1. Color codes for Fig. 1 are designated here as Excel-assigned Series#. A subset of these sequences, marked in bold, are aligned in Fig. 2, in the same order as shown (i.e., top to bottom). ID = Intrinsic Disorder; aa = amino acid; TGEV = Transmissible Gastroenteritis Virus; MERS = Middle East Respiratory Syndrome; AIBV = Avian Infectious Bronchitis Virus.
Fig. 1Intrinsically disordered region (IDR) at the center of the coronaviral N protein. IDR was predicted by several methods, as described previously [28] and in Materials and Methods, which produced similar results, and those from PrDOS are presented (See Fig. 2 for the corresponding primary structures). The cut-off threshold was set at the relatively stringent FPR (false positive rate) of 5%, corresponding to a disorder probability of 0.5 (Y-axis). Thus, only the areas of disorder probability >0.5 (above the red dotted line) were considered as significant. The “Series” designations of the color-coded graphs are listed in Table 1. As explained in the Materials and Methods, the amino acid numbers (X-axis) do not refer to the full-length protein, but only to the displayed sequence portion, so as to provide a scale of the relative location and length of the two IDR peaks. However, the actual residue numbers are listed in Table 1. Note that alpha, beta, and gamma coronaviruses have two central IDRs, whereas deltacoronaviruses have only one, roughly corresponding to the first. Two highly similar coronaviruses, isolated from beluga whale and bottlenose dolphin, are currently considered in the gamma category [38], [39], but their weak second IDR is marked as ‘deviants’ in this genus. They are presented in the same box as gamma to clearly demonstrate the difference in the peak, but also see Fig. 2, where their sequences are placed in a separate category to indicate this difference. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 2Representative primary structures of the centrally located intrinsically disordered regions (IDRs, in bold letters) in N proteins and the Ser/Arg residues (shown in red color) inside them. The identity and accession number of the sequences are listed in Table 1 in bold. The numbers are actual amino acid numbers in the full-length N protein. To draw attention to the abundant Lys (K) residues inside the IDRs, they have been colored blue. The two underlined Ser (in SRGGS) are GSK3 phosphorylation sites [18]. All four established coronavirus genera (A–D) are shown, accompanied by a schematic map of the protein, depicting only the relevant features (top). While the terminal IDRs are not shown, the central IDR area is indicated by the parallelogram, shaded in gradient to indicate a greater propensity of disorder on its N-terminal side, gradually tapering off towards the C-terminal side. For example, note that the alpha, beta, and gamma coronaviruses have two IDRs, whereas deltacoronaviruses have one, roughly corresponding to the first (also see Fig. 1). The concentration of Ser/Arg parallels this trend, being more concentrated in the common N-terminal half than in the C-terminal. The two coronaviruses, isolated from beluga whale and bottlenose dolphin are tentatively placed in a separate category (E) due to their apparently unique IDR arrangement (Also see Fig. 1, where they are placed as ‘deviants’ in the gammacoronavirus box for graphic illustration of the difference). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)