| Literature DB >> 35920776 |
Ruksana Aziz1, Piyali Sen2, Pratyush Kumar Beura1, Saurav Das1, Debapriya Tula3, Madhusmita Dash4, Nima Dondu Namsa1,5, Ramesh Chandra Deka5,6, Edward J Feil7, Siddhartha Sankar Satapathy2,5, Suvendra Kumar Ray1,5.
Abstract
A common approach to estimate the strength and direction of selection acting on protein coding sequences is to calculate the dN/dS ratio. The method to calculate dN/dS has been widely used by many researchers and many critical reviews have been made on its application after the proposition by Nei and Gojobori in 1986. However, the method is still evolving considering the non-uniform substitution rates and pretermination codons. In our study of SNPs in 586 genes across 156 Escherichia coli strains, synonymous polymorphism in 2-fold degenerate codons were higher in comparison to that in 4-fold degenerate codons, which could be attributed to the difference between transition (Ti) and transversion (Tv) substitution rates where the average rate of a transition is four times more than that of a transversion in general. We considered both the Ti/Tv ratio, and nonsense mutation in pretermination codons, to improve estimates of synonymous (S) and non-synonymous (NS) sites. The accuracy of estimating dN/dS has been improved by considering the Ti/Tv ratio and nonsense substitutions in pretermination codons. We showed that applying the modified approach based on Ti/Tv ratio and pretermination codons results in higher values of dN/dS in 29 common genes of equal reading-frames between E. coli and Salmonella enterica. This study emphasizes the robustness of amino acid composition with varying codon degeneracy, as well as the pretermination codons when calculating dN/dS values.Entities:
Keywords: dN/dS; pretermination codon; synonymous/non-synonymous sites; transition; transversion
Mesh:
Substances:
Year: 2022 PMID: 35920776 PMCID: PMC9358017 DOI: 10.1093/dnares/dsac023
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.477
S and NS sites of codons in the genetic code table by the original method
| Old method | U | C | A | G | |||||
|---|---|---|---|---|---|---|---|---|---|
| S | NS | S | NS | S | NS | S | NS | ||
| U | 0.333 | 2.667 | 1.000 | 2.000 | 0.333 | 2.667 | 0.333 | 2.667 | U |
| 0.333 | 2.667 | 1.000 | 2.000 | 0.333 | 2.667 | 0.333 | 2.667 | C | |
| 0.667 | 2.333 | 1.000 | 2.000 | N/A | N/A | N/A | N/A | A | |
| 0.667 | 2.333 | 1.000 | 2.000 | N/A | N/A | 0.000 | 3.000 | G | |
| C | 1.000 | 2.000 | 1.000 | 2.000 | 0.333 | 2.667 | 1.000 | 2.000 | U |
| 1.000 | 2.000 | 1.000 | 2.000 | 0.333 | 2.667 | 1.000 | 2.000 | C | |
| 1.333 | 1.667 | 1.000 | 2.000 | 0.333 | 2.667 | 1.333 | 1.667 | A | |
| 1.333 | 1.667 | 1.000 | 2.000 | 0.333 | 2.667 | 1.333 | 1.667 | G | |
| A | 0.667 | 2.333 | 1.000 | 2.000 | 0.333 | 2.667 | 0.333 | 2.667 | U |
| 0.667 | 2.333 | 1.000 | 2.000 | 0.333 | 2.667 | 0.333 | 2.667 | C | |
| 0.667 | 2.333 | 1.000 | 2.000 | 0.333 | 2.667 | 0.667 | 2.333 | A | |
| 0.000 | 3.000 | 1.000 | 2.000 | 0.333 | 2.667 | 0.667 | 2.333 | G | |
| G | 1.000 | 2.000 | 1.000 | 2.000 | 0.333 | 2.667 | 1.000 | 2.000 | U |
| 1.000 | 2.000 | 1.000 | 2.000 | 0.333 | 2.667 | 1.000 | 2.000 | C | |
| 1.000 | 2.000 | 1.000 | 2.000 | 0.333 | 2.667 | 1.000 | 2.000 | A | |
| 1.000 | 2.000 | 1.000 | 2.000 | 0.333 | 2.667 | 1.000 | 2.000 | G | |
S and NS sites of codons in the genetic code table by accounting a transition being four time more frequent than a transversion and the nonsense substitutions in the pretermination codons
| New method | U | C | A | G | |||||
|---|---|---|---|---|---|---|---|---|---|
| S | NS | S | NS | S | NS | S | NS | ||
| U | 0.667 | 2.333 | 1.000 | 2.000 | 0.667 | 2.000 | 0.667 | 2.167 | U |
| 0.667 | 2.333 | 1.000 | 2.000 | 0.667 | 2.000 | 0.667 | 2.167 | C | |
| 1.333 | 1.333 | 1.000 | 1.667 | N/A | N/A | N/A | N/A | A | |
| 1.333 | 1.500 | 1.000 | 1.833 | N/A | N/A | 0.000 | 1.667 | G | |
| C | 1.000 | 2.000 | 1.000 | 2.000 | 0.667 | 2.333 | 1.000 | 2.000 | U |
| 1.000 | 2.000 | 1.000 | 2.000 | 0.667 | 2.333 | 1.000 | 2.000 | C | |
| 1.667 | 1.333 | 1.000 | 2.000 | 0.667 | 1.667 | 1.167 | 1.167 | A | |
| 1.667 | 1.333 | 1.000 | 2.000 | 0.667 | 1.667 | 1.167 | 1.833 | G | |
| A | 0.833 | 2.167 | 1.000 | 2.000 | 0.667 | 2.333 | 0.667 | 2.333 | U |
| 0.833 | 2.167 | 1.000 | 2.000 | 0.667 | 2.333 | 0.667 | 2.333 | C | |
| 0.333 | 2.667 | 1.000 | 2.000 | 0.667 | 2.167 | 0.833 | 2.000 | A | |
| 0.000 | 3.000 | 1.000 | 2.000 | 0.667 | 2.167 | 0.833 | 2.167 | G | |
| G | 1.000 | 2.000 | 1.000 | 2.000 | 0.667 | 2.333 | 1.000 | 2.000 | U |
| 1.000 | 2.000 | 1.000 | 2.000 | 0.667 | 2.333 | 1.000 | 2.000 | C | |
| 1.000 | 2.000 | 1.000 | 2.000 | 0.667 | 2.167 | 1.000 | 1.833 | A | |
| 1.000 | 2.000 | 1.000 | 2.000 | 0.667 | 2.167 | 1.000 | 2.000 | G | |
Figure 1fS values of different codons in E. coli. Ratio between the observed number of synonymous polymorphisms to the expected number of synonymous polymorphisms (fS values) observed in all codons to that of expected number in the codon. The 59 codons are on the x-axis. The vertical bar represents the fS values of individual codons. It is evident that the values of TFD codons (in the right side of the graph) are higher than that of the FFD codons. In case of Ser, fS values of the split box codons (AGY) are like that of TFD codons, whereas the same of the family box codons (UCN) are similar that of FFD codons.
Figure 2Box-plot comparison of fS values between TFD and FFD. Box-plot between fS values (So/Se) of synonymous polymorphism observed across TFD and FFD codons. The fS value of 2FD codons ranges between 5.19 and 8.36, with mean 7.0713, s.d. 7.3, whereas that of FFD codons ranges between 2.3 and 2.77, with mean 2.48, s.d. 2.49. The values are significantly different (P < 0.01) between TFD and FFD. https://www.socscistatistics.com/tests/mannwhitney/default2.aspx (25 June 2022, date last accessed) site was used for the Mann–Whitney U-test.
Figure 3(a) scattered plot of the dN/dS values of the sample 29 genes by the proposed method. (b) Scattered plot of the dN/dS values of the sample 29 genes using MEGA-X. In (a), dN/dS values of the sample 29 genes between E. coli and S. enterica were calculated using the simple approach vs the proposed method in this study (Materials and Methods). dN/dS in the proposed method accounting Ti/Tv and nonsense substitutions is higher than that of the old method in every gene. We have only considered single substitutions in the calculation of dN/dS. In (b), Scattered-plot of dN/dS values of the sample 29 genes using MEGA-X (Nei-Gojobori dN/dS method and modified-Nei-Gojobori dN/dS method (considering Ti/Tv ratio = 2) in the same 29 genes of E. coli and S. enterica in MEGA-X, although in these methods double and triple mutations are also considered along with the single substitutions.