| Literature DB >> 32522874 |
Ayal B Gussow1, Noam Auslander2, Guilhem Faure3, Yuri I Wolf1, Feng Zhang3,4,5,6,7, Eugene V Koonin2.
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) poses an immediate, major threat to public health across the globe. Here we report an in-depth molecular analysis to reconstruct the evolutionary origins of the enhanced pathogenicity of SARS-CoV-2 and other coronaviruses that are severe human pathogens. Using integrated comparative genomics and machine learning techniques, we identify key genomic features that differentiate SARS-CoV-2 and the viruses behind the two previous deadly coronavirus outbreaks, SARS-CoV and Middle East respiratory syndrome coronavirus (MERS-CoV), from less pathogenic coronaviruses. These features include enhancement of the nuclear localization signals in the nucleocapsid protein and distinct inserts in the spike glycoprotein that appear to be associated with high case fatality rate of these coronaviruses as well as the host switch from animals to humans. The identified features could be crucial contributors to coronavirus pathogenicity and possible targets for diagnostics, prognostication, and interventions.Entities:
Keywords: COVID-19; coronaviruses; nucleocapsid; pathogenicity; spike protein
Mesh:
Substances:
Year: 2020 PMID: 32522874 PMCID: PMC7334499 DOI: 10.1073/pnas.2008176117
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Searching coronavirus genomes for determinants of pathogenicity. (A) Phylogenetic tree of coronavirus species, based on the alignment of complete nucleotide sequences of virus genomes. Blue font corresponds to alphacoronaviruses, and magenta font corresponds to betacoronaviruses. (B) A schematic illustration of the pipeline applied for detection of genomic regions predictive of high-CFR strains. (C) (Top) Pie chart showing the percentage of identified genomic determinants in each protein. (Bottom) Map of SARS-CoV-2 genome with detected regions. (D) Bar plot showing the significance of the distribution of detected regions across each protein. (E) Percentage of detected predictive regions in each protein.
Fig. 2.Putative determinants of coronavirus pathogenicity in the nucleocapsid and the spike protein. (A) (Left) Phylogenetic tree and protein alignment of the nucleocapsid protein across coronavirus species with the unrooted tree built based on the nucleocapsid amino acid sequences. NLS and NES are outlined in orange and green, respectively. The circle next to each signal sequence denotes peptide charge, with red denoting higher charge and blue denoting a lower charge. (Right) The overall charge of each full protein sequence. (B) Map of SARS-CoV-2 nucleocapsid protein with relevant NLS (orange) and NES (green) motifs marked. (C) Boxplots displaying (Left, Center Left, and Center Right) the charge of the three NLS motifs and (Right) that of the complete nucleocapsid protein for SARS-CoV-2, SARS-CoV, MERS-CoV, and low-CFR strains. The one-sided rank sum P values are shown when significant between any two groups, supporting a gradual increase of the charge.
Fig. 3.The signature inserts in the spike glycoproteins of the high-CFR coronaviruses. (A) Map of SARS-CoV-2 spike protein with relevant protein regions and the features detected by the present analysis with the unrooted tree built based on the spike amino acid sequences. The relevant regions include the RBD, the RBM, the furin recognition site, a hydrophobic residue preceding the first heptad repeat, and both heptad repeats. The two features detected by this analysis are the insertions in the RBM found in pathogenic strains before the zoonotic transmission to human, and the insertion in the high-CFR strains preceding the heptad repeat. (B) Phylogenetic tree and protein alignment of the spike protein insertion preceding the first heptad repeat with the unrooted tree built based on the spike amino acid sequences. (C) Phylogenetic tree and protein alignment of the spike protein zoonotic insertions in the RBM of high-CFR coronaviruses (disulfide bonds are shown for human strains). (D) Structure of the SARS-CoV-2 spike glycoprotein trimer with the inserts mapped to the heptad repeat-containing domain and the receptor-binding domain. Top Inset (red rectangle) shows the locations of the inserts in the RBM, designated as in A, that are located within segments bordered by orange spheres (unresolved in the structure). Middle Inset (blue rectangle) shows the GAAL insert upstream of the first heptad-repeat region. Bottom Inset (blue rectangle) shows a close view of the GAAL insert (in orange). (E) (Top) Structures of the receptor-binding motifs of SARS-CoV, SARS-CoV-2 and MERS-CoV. The inserts are highlighted in wheat for MERS-CoV, orange for SARS-CoV, and purple for SARS-CoV-2, and the PC doublets and disulfide bonds are shown. (Bottom) Interactions between the inserts in the RBM of the spike glycoproteins of SARS-CoV, SARS-CoV-2, and MERS-CoV, and the corresponding human receptors. Residues shown with stick models are within a 5-Å distance from the interacting residues in the inserts. The salt bridge is highlighted in red with a thick red border (in MERS-CoV), charge interaction is highlighted in red with a thin blue border (SARS-CoV-2), and H-bond network is highlighted in yellow (Y473, T27, S19).