| Literature DB >> 33615042 |
Jayanti Saha1, Sukanya Bhattacharjee1, Monalisha Pal Sarkar2, Barnan Kumar Saha1, Hriday Kumar Basak3, Samarpita Adhikary1, Vivek Roy1, Parimal Mandal2, Abhik Chatterjee3, Ayon Pal1.
Abstract
The novel corona virus disease or COVID-19 caused by a positive strand RNA virus (PRV) called SARS-CoV-2 is plaguing the entire planet as we conduct this study. In this study a multifaceted analysis was carried out employing dinucleotide signature, codon usage and codon context to compare and unravel the genomic as well as genic characteristics of the SARS-CoV-2 isolates and how they compare to other PRVs which represents some of the most pathogenic human viruses. The main emphasis of this study was to comprehend the codon biology of the SARS-CoV-2 in the backdrop of the other PRVs like Poliovirus, Japanese encephalitis virus, Hepatitis C virus, Norovirus, Rubella virus, Semliki Forest virus, Zika virus, Dengue virus, Human rhinoviruses and the Betacoronaviruses since codon usage pattern along with the nucleotide composition prevalent within the viral genome helps to understand the biology and evolution of viruses. Our results suggest discrete genomic dinucleotide signature within the PRVs. Some of the genes from the different SARS-CoV-2 isolates were also found to demonstrate heterogeneity in terms of their dinucleotide signature. The SARS-CoV-2 isolates also demonstrated a codon context trend characteristically dissimilar to the other PRVs. The findings of this study are expected to contribute to the developing global knowledge base in countering COVID-19.Entities:
Keywords: CAI, Codon Adaptation Index; CNS, Central Nervous System; COVID-19; CRS, Congenital Rubella Syndrome; CUB, Codon Usage Bias; Codon context; Codon usage bias; Coronaviruses; Fop, Frequency of optimal codons; GC1, Guanine and Cytosine content on the first position of the codon; GC2, Guanine and Cytosine content on the second position of the codon; GC3, Guanine and Cytosine content on the third position of the codon; HCV, Hepatitis C Virus; MERS, Middle East Respiratory Syndrome; MFE, Minimum Free Energy; Nc, Effective Number of Codons; PCA, Principal Component Analysis; PRV, Positive strand RNA Virus; Positive strand RNA virus; RCDI, Relative Codon De-Optimization Index; RSCU, Relative Synonymous Codon Usage; SARS, Severe Acute Respiratory Syndrome; SARS-CoV-2; SARS-CoV-2, Severe Acute Respiratory Syndrome Coronavirus 2; SCUO, Synonymous Codon Usage Order; SiD, Similarity Index
Year: 2021 PMID: 33615042 PMCID: PMC7887452 DOI: 10.1016/j.genrep.2021.101055
Source DB: PubMed Journal: Gene Rep ISSN: 2452-0144
Fig. 1A parallel plot depicting the relation between genome size and AU/GC content of the positive strand RNA viruses (PRVs) included in this study.
Supplementary Fig. 1A bar plot depicting the contributions (in percentage) of the different dinucleotide combinations in the first and second dimension of the PCA showing the segregation of the positive strand RNA viruses (PRVs) based on the over and underrepresentation of the 16 dinucleotide combinations.
Fig. 2A PCA bi-plot showing the segregation of the positive strand RNA viruses (PRVs) based on the over and underrepresentation of the 16 dinucleotide combinations. The x-axis represents the 1st dimension which accounts for 33.9% of the total variance and the y-axis represents the 2nd dimension accounting for 20.2% of the total variance.
Fig. 3A PCA bi-plot showing the segregation of the SARS-CoV-2 isolates based on the over and underrepresentation of the 16 dinucleotide combinations. The x-axis represents the 1st dimension which accounts for 16% of the total variance and the y-axis represents the 2nd dimension accounting for 11.1% of the total variance.
Supplementary Fig. 2A PCA bi-plot showing the segregation of the different genes and ORFs of the SARS-CoV-2 isolates based on the base model data of over and underrepresentation of the different dinucleotide combinations. The different dinucleotide combinations contributing majorly to the variation in the first two dimensions of the plot is shown in the plot background.
Supplementary Fig. 3A PCA plot of the different genes and ORFs of the SARS-CoV-2 isolates based on the codon model data of over and underrepresentation of the different dinucleotide combinations demonstrating segregation of the ORFs based on geographical location.
Supplementary Fig. 4A PCA bi-plot showing the segregation of the different genes and ORFs of the SARS-CoV-2 isolates based on the syncodon model data of over and underrepresentation of the different dinucleotide combinations. The different dinucleotide combinations contributing majorly to the variation in the first two dimensions of the plot is shown in the plot background.
Fig. 4A Nc-plot depicting the relationship between GC content at the 3rd position of codon or GC3 (x-axis) with effective number of codons or Nc (y-axis) of all the genes present in the genomes of the positive strand RNA viruses (PRVs) included in this study. The dashed blue line represents the null hypothesis curve which suggests that codon usage bias is solely due to mutation and not selection (Wright, 1990).
Fig. 5A Nc-plot depicting the relationship between GC content at the 3rd position of codon or GC3 (x-axis) with effective number of codons or Nc (y-axis) of all the genes present in the genomes of the SARS-CoV-2 isolates included in this study. The dashed blue line represents the null hypothesis curve which suggests that codon usage bias is solely due to mutation and not selection (Wright, 1990).
Fig. 6Plots depicting the relationship between GC content at the 3rd position of codon or GC3 (x-axis) and synonymous codon usage order or SCUO (y-axis) for the genes N, S and orf1ab, orf6, orf7 and orf10 of the SARS-CoV-2 isolates.
Fig. 7Neutrality plot showing the relationship between GC1/GC2 with GC3 for the different genes of the SARS-CoV-2 genome.
Fig. 8A scatter plot showing the similarity index (SiD) values of the different positive strand RNA viruses (PRVs) included in this study.
Supplementary Fig. 5A histogram depicting the codon pair ratio (CPR) data of the SARS-CoV-2 isolates demonstrating the presence of five different types of CPR clusters utilizing k-means clustering.
Supplementary Fig. 6A PCA plot showing the segregation of the different SARS-CoV-2 isolates included in this study based on the occurrence of the different codon pairs. The x-axis represents the 1st dimension which accounts for 55.8% of the total variance and the y-axis represents the 2nd dimension accounting for 33.9% of the total variance. The different coloured circles represent the five different clusters of the SARS-CoV-2 isolates based on their codon pair frequency.
Fig. 9A PCA plot showing the segregation of the different positive strand RNA viruses (PRVs) included in this study based on the occurrence of the different codon pairs. The x-axis represents the 1st dimension which accounts for 23.3% of the total variance and the y-axis represents the 2nd dimension accounting for 11.7% of the total variance. The red dashed square depicts the distribution of the SARS-CoV-2 isolates relative to the other coronaviruses (in the blue square) and the other PRVs (depicted by the green square).