| Literature DB >> 33456724 |
Camila Pontes1,2, Victoria Ruiz-Serra1, Rosalba Lepore1, Alfonso Valencia1,3.
Abstract
The recent emergence of the novel SARS-CoV-2 in China and its rapid spread in the human population has led to a public health crisis worldwide. Like in SARS-CoV, horseshoe bats currently represent the most likely candidate animal source for SARS-CoV-2. Yet, the specific mechanisms of cross-species transmission and adaptation to the human host remain unknown. Here we show that the unsupervised analysis of conservation patterns across the β-CoV spike protein family, using sequence information alone, can provide valuable insights on the molecular basis of the specificity of β-CoVs to different host cell receptors. More precisely, our results indicate that host cell receptor usage is encoded in the amino acid sequences of different CoV spike proteins in the form of a set of specificity determining positions (SDPs). Furthermore, by integrating structural data, in silico mutagenesis and coevolution analysis we could elucidate the role of SDPs in mediating ACE2 binding across the Sarbecovirus lineage, either by engaging the receptor through direct intermolecular interactions or by affecting the local environment of the receptor binding motif. Finally, by the analysis of coevolving mutations across a paired MSA we were able to identify key intermolecular contacts occurring at the spike-ACE2 interface. These results show that effective mining of the evolutionary records held in the sequence of the spike protein family can help tracing the molecular mechanisms behind the evolution and host-receptor adaptation of circulating and future novel β-CoVs.Entities:
Keywords: APC, average product correction; CoVs, Coronaviruses; EV, evolutionary rate; Functional specificity; MCA, multiple correspondence analysis; MI, mutual information; MSA, multiple sequence alignment; NTD, N-terminal domain; Phylogenetic analysis; Protein subfamilies; RBD, receptor binding domain; RBM, receptor binding motif; SARS-CoV-2; SDPs, specificity determining positions; Specificity Determining Positions; Spike protein evolution; hACE2, human angiotensin converting enzyme 2
Year: 2021 PMID: 33456724 PMCID: PMC7802526 DOI: 10.1016/j.csbj.2021.01.006
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Results of the S3Det MCA analysis based on the full β-CoVs family and Sarbecovirus subgroup. (A-B) Results of the S3Det MCA analysis showing the subfamily segregation and associated amino acid positions obtained for the Sarbecovirus subgroup and for the full β-CoV family, respectively. (C) Phylogenetic tree obtained for the complete β-CoV spike protein family. S3Det subfamilies are shown in different colors and reflect the phylogenetic classification of Betacoronavirus into five subgenera. (D) Phylogenetic tree obtained for the Sarbecovirus subgroup. S3Det clusters are highlighted in red, green and blue. Both SARS-CoV-2 and RaTG13 are clustered together with SARS-CoV and other members of Sarbecovirus clade 1. Phylogenetic trees were built using PhyML [35]. Spike protein sequences from human pathogenic CoVs are indicated in bold. Host species are shown for some of the nodes as dark silhouettes. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 2Frequency distribution of SDPs across different domains of the spike sequence from five human pathogenic β-CoVs. Protein domains are denoted as follows: SS, signal sequence; NTD, N-terminal domain; RBD, receptor binding domain; FP, fusion peptide; HR1, heptad repeat 1; HR2, heptad repeat 2. Interdomain regions are denoted by ID followed by an integer according to the order in which they appear in the sequence. Dashed vertical lines denote S1/S2 subunits boundaries.
Fig. 3Structural localization of SDPs. 3D structure of the spike protein from SARS-CoV-2 (blue; PDB ID: 6LZG) and SARS-CoV (green; PDB ID: 2AJF) in complex with the human ACE2 cell receptor (yellow). Amino acid residues at the interface are shown as sticks. Intermolecular contacts are shown as dashed black lines. SDPs are highlighted as spheres. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 4Mutational impact at SDPs across the RBM. (A) Pairwise sequence alignment of SARS-CoV-2 RBM and consensus sequence of Sarbecovirus clade 2. Black triangles indicate amino acid mismatches. SDP positions are depicted as pink letters. (B) Boxplot distributions of ΔΔG values resulting from mutating SDPs and non-SDPs using FoldX (PDB ID: 6LZG). Significant differences were computed using a Wilcoxon unpaired two-sample test (p-value < 0.01). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 5Coevolution analysis within the RBM. Contact map (8A distance cutoff, any atom) over the RBM of the SARS-CoV-2 spike protein. MI-APC contact predictions (among top 500 scores) are shown in blue (true positives) and in red (false positives). SDPs are highlighted in green. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)