| Literature DB >> 26114113 |
Tatiana Tatarinova1, Bilal Salih2, Jennifer Dien Bard1, Irit Cohen3, Alexander Bolshoy4.
Abstract
Proteins of the same functional family (for example, kinases) may have significantly different lengths. It is an open question whether such variation in length is random or it appears as a response to some unknown evolutionary driving factors. The main purpose of this paper is to demonstrate existence of factors affecting prokaryotic gene lengths. We believe that the ranking of genomes according to lengths of their genes, followed by the calculation of coefficients of association between genome rank and genome property, is a reasonable approach in revealing such evolutionary driving factors. As we demonstrated earlier, our chosen approach, Bubble-sort, combines stability, accuracy, and computational efficiency as compared to other ranking methods. Application of Bubble Sort to the set of 1390 prokaryotic genomes confirmed that genes of Archaeal species are generally shorter than Bacterial ones. We observed that gene lengths are affected by various factors: within each domain, different phyla have preferences for short or long genes; thermophiles tend to have shorter genes than the soil-dwellers; halophiles tend to have longer genes. We also found that species with overrepresentation of cytosines and guanines in the third position of the codon (GC3 content) tend to have longer genes than species with low GC3 content.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26114113 PMCID: PMC4465819 DOI: 10.1155/2015/786861
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Consistency of Bubble Sort ranks in 1390 and 100 genomes datasets. Pearson's correlation coefficient between two ranks is 0.95; Kendall tau correlation coefficient is 0.82.
Figure 2Violin plots of Bubble sort ranks of Archaea and Bacteria. Average rank of 1276 Bacterial genomes is 735 and average rank of 114 Archaeal genomes is 254.
B-sort results (one run) for 1390 genomes, archaea.
| Phylum | Average rank | StDev | Median rank | Rank range | Number of genomes |
|---|---|---|---|---|---|
| Crenarchaeota | 189 | 179 | 77 | 7–492 | 35 |
| Euryarchaeota | 312 | 297 | 233 | 5–1263 | 74 |
| Korarchaeota | 169 | NA | 169 | 169–169 | 1 |
| Nanoarchaeota | 5 | NA | 5 | 5–5 | 1 |
| Thaumarchaeota | 347 | 239 | 347 | 178–516 | 2 |
| Unclassified archaea | 771 | NA | 771 | 771–771 | 1 |
B-sort results for 1390 genomes, bacteria.
| Phylum | Average rank | STD | Median rank | Rank range | Number of genomes |
|---|---|---|---|---|---|
| Actinobacteria | 1223 | 166 | 1260 | 343–1390 | 137 |
| Aquificae | 182 | 79 | 168 | 82–306 | 8 |
| Bacteroidetes/Chlorobi | 992 | 188 | 1071 | 502–1359 | 71 |
| Candidatus Cloacamonas | 1054 | NA | 1054 | 1054–1054 | 1 |
| Chlamydiae/Verrucomicrobia | 1076 | 81 | 1079 | 835–1223 | 25 |
| Chloroflexi | 774 | 520 | 1109 | 70–1274 | 15 |
| Chrysiogenetes | 545 | NA | 545 | 545–545 | 1 |
| Cyanobacteria | 938 | 209 | 975 | 619–1276 | 40 |
| Deferribacteres | 205 | NA | 205 | 205–205 | 1 |
| Deinococcus-Thermus | 607 | 282 | 566 | 263–1126 | 12 |
| Dictyoglomi | 207 | 49 | 207 | 172–242 | 2 |
| Elusimicrobia | 412 | 143 | 412 | 311–513 | 2 |
| Fibrobacteres/Acidobacteria | 1171 | 172 | 1240 | 839–1293 | 6 |
| Firmicutes | 307 | 188 | 286 | 21–1387 | 271 |
| Fusobacteria | 462 | 100 | 461 | 361–564 | 4 |
| Gemmatimonadetes | 1214 | NA | 1214 | 1214–1214 | 1 |
| Nitrospirae | 563 | 418 | 563 | 267–858 | 2 |
| Planctomycetes | 1364 | 29 | 1368 | 1319–1389 | 5 |
| Proteobacteria | 759 | 325 | 775 | 1–1379 | 588 |
| Spirochaetes | 1050 | 155 | 1066 | 700–1317 | 31 |
| Synergistetes | 466 | 40 | 466 | 438–494 | 2 |
| Tenericutes | 657 | 223 | 631 | 92–1092 | 36 |
| Thermobaculum | 1049 | NA | 1049 | 1049–1049 | 1 |
| Thermodesulfobacteria | 458 | 32 | 458 | 435–480 | 2 |
| Thermotogae | 253 | 165 | 203 | 45–566 | 12 |
List of Archaeal (A) and Bacterial (B) genomes in the Bubble Sort ordering rank, 100 genomes dataset. Hyperthermophiles, Streptococci, and Enterococci are marked in the Note column.
| Rank | Domain | Note | Organism |
|---|---|---|---|
| 1 | A | Hyperthermophile |
|
| 2 | A | Hyperthermophile |
|
| 3 | B | Hyperthermophile |
|
| 4 | A | Hyperthermophile |
|
| 5 | B | Hyperthermophile |
|
| 6 | A | Hyperthermophile |
|
| 7 | B |
| |
| 8 | B |
| |
| 9 | B | Hyperthermophile |
|
| 10 | B | Hyperthermophile |
|
| 11 | B |
| |
| 12 | B |
| |
| 13 | B |
| |
| 14 | A | Hyperthermophile |
|
| 15 | B |
| |
| 16 | B |
| |
| 17 | B |
| |
| 18 | A | Hyperthermophile |
|
| 19 | B |
| |
| 20 | A | Hyperthermophile |
|
| 21 | B | Streptococcus |
|
| 22 | B | Streptococcus |
|
| 23 | B |
| |
| 24 | B |
| |
| 25 | B | Streptococcus |
|
| 26 | A |
| |
| 27 | B |
| |
| 28 | B | Streptococcus |
|
| 29 | B |
| |
| 30 | B |
| |
| 31 | B |
| |
| 32 | B | Enterococcus |
|
| 33 | B |
| |
| 34 | B |
| |
| 35 | B |
| |
| 36 | B |
| |
| 37 | B |
| |
| 38 | B |
| |
| 39 | B |
| |
| 40 | B |
| |
| 41 | B |
| |
| 42 | B |
| |
| 43 | B |
| |
| 44 | B |
| |
| 45 | B |
| |
| 46 | B |
| |
| 47 | B |
| |
| 48 | B |
| |
| 49 | A |
| |
| 50 | B |
| |
| 51 | B |
| |
| 52 | B |
| |
| 53 | B |
| |
| 54 | B |
| |
| 55 | B |
| |
| 56 | B |
| |
| 57 | B |
| |
| 58 | B |
| |
| 59 | B |
| |
| 60 | B |
| |
| 61 | B |
| |
| 62 | B |
| |
| 63 | B |
| |
| 64 | B |
| |
| 65 | B |
| |
| 66 | B |
| |
| 67 | B |
| |
| 68 | B |
| |
| 69 | B |
| |
| 70 | B |
| |
| 71 | B |
| |
| 72 | B |
| |
| 73 | B |
| |
| 74 | B |
| |
| 75 | B |
| |
| 76 | B |
| |
| 77 | B |
| |
| 78 | B |
| |
| 79 | B |
| |
| 80 | B |
| |
| 81 | B |
| |
| 82 | B |
| |
| 83 | B |
| |
| 84 | B |
| |
| 85 | B |
| |
| 86 | B |
| |
| 87 | B |
| |
| 88 | B |
| |
| 89 | B |
| |
| 90 | B |
| |
| 91 | B |
| |
| 92 | B |
| |
| 93 | B |
| |
| 94 | B |
| |
| 95 | B |
| |
| 96 | B |
| |
| 97 | B |
| |
| 98 | B |
| |
| 99 | B |
| |
| 100 | B |
|