| Literature DB >> 33985433 |
Sanket Desai1,2, Aishwarya Rane1, Asim Joshi1,2, Amit Dutt3,4,5.
Abstract
BACKGROUND: Rapid analysis of SARS-CoV-2 genomic data plays a crucial role in surveillance and adoption of measures in controlling spread of Covid-19. Fast, inclusive and adaptive methods are required for the heterogenous SARS-CoV-2 sequence data generated at an unprecedented rate.Entities:
Keywords: Next-generation sequencing; Pathogen analysis pipeline; Phylogenetic clade analysis; SARS-CoV-2
Year: 2021 PMID: 33985433 PMCID: PMC8118100 DOI: 10.1186/s12859-021-04172-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Global distribution and gene-wise mutation analysis of the SARS-CoV-2 genome mutations. a Genomic hotspot mutations (recurrence > 40,000 samples) distribution across the genome. Mutations have been labelled with protein change in the plot. The intergenic and synonymous mutations are colored grey. The gene annotation track on the x-axis is not to scale. b Proportion of synonymous and nonsynonymous mutations across all the SARS-CoV-2 genes, c proportion of mutated/non-mutated bases across the SARS-CoV-2 gene features. The dotted line indicates an average fraction of mutated residues per feature (~ 0.8)
Fig. 2Overlap of variants recurring among the emerging strains (B 1.1.7, B 1.135 and P1) and Indian samples. a Variants recurring in at least 50 per cent of analyzed samples are overlapped with variants in Indian samples. b Variants common across all the strains, including Indian samples and private clade defining variants in the S protein across the emerging SARS-CoV-2 strains
Fig. 3Factors affecting the accuracy of IPD 2.0 SARS-CoV-2 clade prediction module, a clade prediction accuracy based on the samples with different genome coverage, b comparison of prediction accuracy based on several variants obtained per sample, c variation in the clade prediction accuracy based on the background mutation rate of the SARS-CoV-2 genomes