| Literature DB >> 35048970 |
Ankit K Pathak1, Gyan Prakash Mishra2, Bharathram Uppili1,3, Safal Walia2, Saman Fatihi1,3, Tahseen Abbas1,3, Sofia Banu4, Arup Ghosh2, Amol Kanampalliwar2, Atimukta Jha2, Sana Fatma2, Shifu Aggarwal2, Mahesh Shanker Dhar5, Robin Marwal5, Venkatraman Srinivasan Radhakrishnan5, Kalaiarasan Ponnusamy5, Sandhya Kabra5, Partha Rakshit5, Rahul C Bhoyar1, Abhinav Jain1,3, Mohit Kumar Divakar1,3, Mohamed Imran1,3, Mohammed Faruq1, Divya Tej Sowpati4, Lipi Thukral1, Sunil K Raghav2, Mitali Mukerji1,6.
Abstract
During the course of the COVID-19 pandemic, large-scale genome sequencing of SARS-CoV-2 has been useful in tracking its spread and in identifying variants of concern (VOC). Viral and host factors could contribute to variability within a host that can be captured in next-generation sequencing reads as intra-host single nucleotide variations (iSNVs). Analysing 1347 samples collected till June 2020, we recorded 16 410 iSNV sites throughout the SARS-CoV-2 genome. We found ∼42% of the iSNV sites to be reported as SNVs by 30 September 2020 in consensus sequences submitted to GISAID, which increased to ∼80% by 30th June 2021. Following this, analysis of another set of 1774 samples sequenced in India between November 2020 and May 2021 revealed that majority of the Delta (B.1.617.2) and Kappa (B.1.617.1) lineage-defining variations appeared as iSNVs before getting fixed in the population. Besides, mutations in RdRp as well as RNA-editing by APOBEC and ADAR deaminases seem to contribute to the differential prevalence of iSNVs in hosts. We also observe hyper-variability at functionally critical residues in Spike protein that could alter the antigenicity and may contribute to immune escape. Thus, tracking and functional annotation of iSNVs in ongoing genome surveillance programs could be important for early identification of potential variants of concern and actionable interventions.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35048970 PMCID: PMC8860616 DOI: 10.1093/nar/gkab1297
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Cohort-wise share of samples for Phase 1
| Cohort | No. of samples processed | No. of samples with SARS-CoV-2 reads | No. of samples with iSNVs |
|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| East India | 246 | 245 | 190 |
| North India | 77 | 61 | 49 |
| South India | 166 | 166 | 162 |
|
|
|
|
|
The number of processed samples for each population curated from samples submitted in NCBI SRA till 23 May 2020 (China, Germany, Malaysia, United Kingdom, USA) and samples sequenced in laboratories from Bhubaneswar, Delhi, and Hyderabad (India) latest by 11 June 2020.
Figure 3.Functional impact of hyper-variation on Spike protein. (A) Dot plot illustrating cohort-wise outliers in Z-score values based on the distribution of number of iSNVs per sample. (B) Bar plot representing distribution of iSNV sites (n = 6356) with respect to nucleotide changes in the SARS-CoV-2 genome in hyper-variable samples. (C) Needle plot depicting the distribution of protein-coding changes due to iSNVs in Spike protein. Non-synonymous, synonymous, and stop variants are shown in orange, blue, and red respectively. The length of the needle depicts the occurrence of altered residues out of the total of 22 hyper-variable samples. Protein domain architecture is indicated as horizontal boxes. (D) Counts of amino acid substitutions at each residue location in the Spike protein. (E) Conservation score of residues in the Spike protein. A conservation score of 9 denotes a highly conserved while a score of 1 denotes a highly variable position. (F) Selected iSNVs with missense changes mapped onto the Spike trimer structure where red dots represent altered sites. The other two chains of the trimer are shown in surface representation. (G) A close-up view of the RBD domain in surface representation harbouring the altered residues in hyper-variable samples. Green highlights the ACE2 binding surface and orange highlights the antibody binding surface. (H) A close-up view of the NTD domain with altered residues shown in red and N-glycosylation sites in blue. Altered residues are labelled in black text: amino acid name and position.
Figure 1.Spectrum of iSNVs in Phase 1 samples. (A) Split plot depicting the distribution of unique iSNV sites (n = 16 410) with respect to the nucleotide change in the SARS-CoV-2 genome and across samples (B) Box plots comparing iSNV prevalence between wild-type, C14408T and C13730T RdRp mutant samples in India (* P < 3e–3, ** P < 1e–4, *** P < 2e–6) and between C14408T RdRp mutant samples in India and USA (**** P < 2e–8). (C) Radial plot showcasing frequency distribution for select iSNV sites in China, India and USA denoted in Red, Blue and Yellow respectively. Each concentric ring depicts an iSNV frequency range of 0.2 and the different colour gradients depict different populations with percentage of samples at a given position in each cell. The outer labels denote the position of change, nucleotide change and amino acid change. Variations that define the B.1 and B.6 lineages have been marked (*) and (**) respectively.
Figure 2.Spatio-temporal dynamics of iSNVs. (A) Temporal (fortnightly) distribution of prominent lineages identified in SARS-CoV-2 samples submitted pan-India till May 2021. Only lineages with at least 10% representation in the total samples submitted in a given fortnight period were selected. (B) Line plot showing temporal trends of iSNV frequency and incidence of SNV for Delta (B.1.617.2) and Kappa (B.1.617.1) lineage-defining positions in India between November 2020 to May 2021. The left y-axis denotes the average iSNV frequency and the right y-axis denotes the percentage of samples with SNVs on a 14-day rolling basis. Red and blue lines illustrate the average iSNV frequency and the percentage of samples with SNVs at the site respectively.