| Literature DB >> 35155843 |
Abstract
Emerging mutations and genotypes of the SARS-CoV-2 virus, responsible for the COVID-19 pandemic, have been reported globally. In Costa Rica during the year 2020, a predominant genotype carrying the mutation T1117I in the spike (S:T1117I) was previously identified. To investigate the possible effects of this mutation on the function of the spike, i.e. the biology of the virus, different bioinformatic pipelines based on phylogeny, natural selection, and co-evolutionary models, molecular docking, and epitopes prediction were implemented. Results of the phylogeny of sequences carrying the S:T1117I worldwide showed a polyphyletic group, with the emergence of local lineages. In Costa Rica, the mutation is found in the lineage B.1.1.389 and it is suggested to be a product of positive/adaptive selection. Different changes in the function of the spike protein and more stable interaction with a ligand (nelfinavir drug) were found. Only one epitope out 742 in the spike was affected by the mutation, with some different properties, but suggesting scarce changes in the immune response and no influence on the vaccine effectiveness. Jointly, these results suggest a partial benefit of the mutation for the spread of the virus with this genotype during the year 2020 in Costa Rica, although possibly not strong enough with the introduction of new lineages during early 2021 which became predominant later. In addition, the bioinformatic analyses used here can be applied as an in silico strategy to eventually study other mutations of interest for the SARS-CoV-2 virus and other pathogens.Entities:
Keywords: COVID-19; Costa Rica; Lineage B.1.1.389; SARS-CoV-2; T1117I
Year: 2022 PMID: 35155843 PMCID: PMC8824091 DOI: 10.1016/j.genrep.2022.101554
Source DB: PubMed Journal: Gene Rep ISSN: 2452-0144
Fig. 1Phylogenetic tree of SARS-CoV-2 genome sequence carrying the mutation T1117 in the spike (S:T1117I) of all around the world. The 1155 available sequences in GISAID database (until April 30th, 2021) with this variant are distributed according to PANGOLIN lineages. Six lineages are predominant with a frequency > 5% (colors), each one with a monophyletic origin. Rest of the lineages (frequency < 5%) were represented in gray color. The B.1.1.389 was the only lineage that carries the S:T1117I as a characteristic mutation (marker for the lineage), unlike the other groups in which this mutation is not widely found among the genomes of the lineage. In addition, B.1.1.389 is the only S:T1117I-carrying lineage with a relatively high prevalence of 22% in a specific location (Costa Rica), unlike other lineages with a prevalence <0.5% in other countries.
Fig. 2Epidemiological and genomic determinants of the B.1.1.389 lineage. (A) The whole sequence of the SARS-CoV-2 reference genome is represented, including genes which are identified by different colors. Mutations of the B.1.1.389 are represented using circles. The lineage is characterized by the presence of eight mutations including two in the spike, the D614G and the T1117I variants. (B–C) The B.1.1.389 lineage has been found in all the seven provinces of Costa Rica (prevalence range 10–41%), reaching up to 22% out of all sequences from Costa Rica. (D) Relative frequencies of different lineages among all the sequences from Costa Rica over the time are represented using different colors. New lineages started to circulate during 2021 in Costa Rica, including the A.2.5, A.2.5.1, A.2.5.2, B.1.1.7 (alpha variant), and P.1 (gamma variant), with the subsequent reduction of the B.1.1.389 which was dominant during 2020. Details of the dominant lineages over time are found in the Supplementary Table S1.
Epidemiological and genomic determinants of the B.1.1.389 lineage. (A) The whole sequence of the SARS-CoV-2 reference genome is represented, including genes which are identified by different colors. Mutations of the B.1.1.389 are represented using circles. The lineage is characterized by the presence of eight mutations including two in the spike, the D614G and the T1117I variants. (B–C) The B.1.1.389 lineage has been found in all the seven provinces of Costa Rica (prevalence range 10–41%), reaching up to 22% out of all sequences from Costa Rica. (D) Relative frequencies of different lineages among all the sequences from Costa Rica over the time are represented using different colors. New lineages started to circulate during 2021 in Costa Rica, including the A.2.5, A.2.5.1, A.2.5.2, B.1.1.7 (alpha variant), and P.1 (gamma variant), with the subsequent reduction of the B.1.1.389 which was dominant during 2020. Details of the dominant lineages over time are found in the Supplementary Table S1.
Analysis of positive selection of mutations by a Fast Unconstrained Bayesian AppRoximation (FUBAR) among protein sequences of the spike of the SARS-CoV-2 from Costa Rican cases of COVID-19.
| Codon | Probability [dN/dS > 1] | Empirical Bayes factor (EBF) [dN/dS] > 1 | Potential scale reduction Factor (PSRF) | Effective sample size (N_eff) |
|---|---|---|---|---|
| 26 | 0.96 | 31.57 | 1.01 | 220.44 |
| 5 | 0.95 | 29.79 | 1.01 | 218.56 |
| 80 | 0.95 | 28.63 | 1.01 | 239.95 |
| 1117 | 0.94 | 22.93 | 1.01 | 244.71 |
| 677 | 0.94 | 22.04 | 1.01 | 245.94 |
| 1118 | 0.93 | 18.41 | 1 | 639.92 |
| 501 | 0.91 | 14.36 | 1 | 337.09 |
| 1027 | 0.9 | 13.24 | 1 | 499.11 |
Fig. 3Mutual coevolutionary relationship between residues in the spike protein of SARS-CoV-2 from Costa Rican cases of COVID-19. After a multiple sequence alignment of protein sequences was done, the corrected Mutual Information (MI) was used to identify correlations between positions and the possible effect on the structure or function of the spike protein. The protein sequence was presented as a circular plot, residue by residue. The most impacted region (residues with orange connections) are part of the RBD (position 319–541), including the case of the mutation N501Y. For the other residues, including the case of the mutation S:T1117I, no drastic effects are predicted according to the correlation metrics which are presented with gray connections.
Fig. 4Molecular docking of the complex protein-ligand using the mutated spike protein (lineage B.1.1.389) and nelfinavir drug. The drug was docked into the HR1 region of S2 domain in the spike, using the WT or the mutated (S:D614G and S:T1117I) proteins. Affinity energy predicted a more stable complex for the mutated protein in comparison with the wild type.
Molecular docking between nelfinavir and the spike protein of the SARS-CoV-2 for WT and the mutated sequences.
| Parameter | WT spike sequence (reference NC_045512.2) | Mutated spike from lineage B.1.1.389 (mutations S:D614G and S:T1117I) |
|---|---|---|
| Affinity (kcal/mol) | −9.231 | −9.656 |
| Total energy (kcal/mol) | 46,498.778 | 46,499.431 |
| van der Waals energy (kcal/mol) | −22.798 | −28.593 |
| Electrostatic energy (kcal/mol) | −25.596 | −19.015 |
Best score for stability.
Comparison of epitopes associated with the mutation S:T1117I of the SARS-CoV-2.
| Peptides in the spike as candidate epitopes | 742 peptides (lineal or conformational) | |
| Peptides overlapping the position 1117 in the spike: only one | Sequence: QRNFYEPQIITTDNTFVSGN | |
| Comparison of the selected epitope | Spike WT sequence | Spike with the mutation T1117I (lineage B.1.1.389) |
| QRNFYEPQIITTDNTFVSGN | QRNFYEPQIITIDNTFVSGN | |
| Hydrophobicity | −0.19 | −0.15 |
| Charge | −1 | −1 |
| Molecular weight | 2344.82 | 2356.88 |
| Toxicity and allergenicity analysis | Non-toxic and probable allergen | Non-toxic and probable allergen |
| Cell B epitope prediction: antigenic region and global score (average) | YEPQIITTDNTF | YEP |
| MHC-I processing prediction (HLA-A01:01): sequence with highest affinity | NFYEPQIITTDNTF | NFYEPQIITIDNTF |
| MHC-I binding prediction (HLA-A01:01): sequence with highest affinity | TTDNTFVS | FYEPQIITI |
| MHC-II binding prediction (HLA-DRB1*01:01): sequence with highest score | EPQIITTDNTFVSGN | EPQIITIDNTFVSGN |
Best score in the comparison.