| Literature DB >> 27224236 |
Guangyu Wang1,2,3, Shixiang Sun1,2,3, Zhang Zhang1,2.
Abstract
The second law of thermodynamics states that entropy, as a measure of randomness in a system, increases over time. Although studies have investigated biological sequence randomness from different aspects, it remains unknown whether sequence randomness changes over time and whether this change consists with the second law of thermodynamics. To capture the dynamics of randomness in molecular sequence evolution, here we detect sequence randomness based on a collection of eight statistical random tests and investigate the randomness variation of coding sequences with an application to Escherichia coli. Given that core/essential genes are more ancient than specific/non-essential genes, our results clearly show that core/essential genes are more random than specific/non-essential genes and accordingly indicate that sequence randomness indeed increases over time, consistent well with the second law of thermodynamics. We further find that an increase in sequence randomness leads to increasing randomness of GC content and longer sequence length. Taken together, our study presents an important finding, for the first time, that sequence randomness increases over time, which may provide profound insights for unveiling the underlying mechanisms of molecular sequence evolution.Entities:
Mesh:
Year: 2016 PMID: 27224236 PMCID: PMC4880282 DOI: 10.1371/journal.pone.0155935
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The content-centric re-organization of the genetic code.
| 1st base | |||||
| A | T | G | C | ||
| A | AAR(K) | TAR(St) | GAR(E) | CAR(Q) | |
| AAY(N) | TAY(Y) | GAY(D) | CAY(H) | ||
| 2nd base | T | ATR(I, M) | TTR(L) | ||
| ATY(I) | TTY(F) | ||||
| G | AGR(R) | TGR(St, W) | |||
| AGY(S) | TGY(C) | ||||
| C | |||||
Note: N represents any nucleotide. R represents A and G. Y represents T and C. St indicates stop codon.
Fig 1Random and non-random clusters based on 8 statistical tests.
Bars are color-coded by different ranges of P-value.
Fig 2Proportion of random sequences in E. coli.
Statistical test of randomness between essential and non-essential genes.
| Cluster | Essential Genes | Non-essential Genes | 2×2 |
|---|---|---|---|
| Random | 418 | 2120 | <0.0001 |
| Nonrandom | 109 | 836 |
Percentage of genes that equally use codons in PDH and PRH.
| Pan-genome group | Percentage |
|---|---|
| Core | 77.6% |
| Medium-Core | 75.9% |
| Medium | 65.4% |
| Medium-Specific | 53.5% |
| Specific | 46.8% |
* P-value<0.05 (The frequency test)
Fig 3Variation of GC contents in the E. coli pan-genome.
Random and nonrandom sequences are examined separately and each dot represents the average of GC content across a specific gene set.
Fig 4Length of coding sequences in the E. coli pan-genome.
Random and nonrandom sequences are examined separately and each bar represents the average of sequence length across a specific gene set.