| Literature DB >> 21386877 |
Sara P Garcia1, Armando J Pinho, João M O S Rodrigues, Carlos A C Bastos, Paulo J S G Ferreira.
Abstract
Minimal absent words have been computed in genomes of organisms from all domains of life. Here, we explore different sets of minimal absent words in the genomes of 22 organisms (one archaeota, thirteen bacteria and eight eukaryotes). We investigate if the mutational biases that may explain the deficit of the shortest absent words in vertebrates are also pervasive in other absent words, namely in minimal absent words, as well as to other organisms. We find that the compositional biases observed for the shortest absent words in vertebrates are not uniform throughout different sets of minimal absent words. We further investigate the hypothesis of the inheritance of minimal absent words through common ancestry from the similarity in dinucleotide relative abundances of different sets of minimal absent words, and find that this inheritance may be exclusive to vertebrates.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21386877 PMCID: PMC3031530 DOI: 10.1371/journal.pone.0016065
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Genomic data.
| Organism | Abbreviation | Genome reference |
|
| ||
|
| Mj | NC000909 |
|
| ||
|
| Ba | NC003997 |
|
| Bs | NC000964 |
|
| Ec | NC000913 |
|
| Hi | NC000907 |
|
| Hp | NC000915 |
|
| Lc | NC010999 |
|
| Ll | NC002662 |
|
| Mg | NC000908 |
|
| Sa | NC002745 |
| methicillin-resistant | MRSa | NC002952 |
| methicillin-susceptible | MSSa | NC002953 |
|
| Sp | NC010582 |
|
| Xc | NC007086 |
|
| ||
|
| Sc | SGD release 1 |
|
| At | AGI release 7.2 |
|
| Ce | WormBase release 170 |
|
| Dm | FlyBase release 5 |
|
| Gg | build 2.1 |
|
| Mm | build 37.1 |
|
| Pt | build 2.1 |
|
| Hs | build 36.3 |
Organisms selected for this study, with reference to the respective abbreviation, and identification of genome sequence data by accession number (euryarchaeota and bacteria) or genome assembly project (eukaryotes).
Figure 1Number of minimal absent words in genomic sequences:.
Distributions of the number of minimal absent words (MAWs) in the genomes of selected organisms for word length up to 50.
Compositional biases.
| Genome |
|
|
|
| ||||||
| Size (bp) | G+C | Size | G+C | Size | G+C | Size | G+C | Size | G+C | |
| Ba | 5,227,293 | 0.36 | 873,180 | 0.56 | 4,485,825 | 0.32 | 330,770 | 0.26 | 694 | 0.36 |
| Bs | 4,214,630 | 0.44 | 935,554 | 0.54 | 3,089,102 | 0.40 | 126,496 | 0.36 | 162 | 0.40 |
| Ec | 4,639,675 | 0.50 | 839,014 | 0.50 | 3,504,611 | 0.52 | 141,556 | 0.54 | 702 | 0.54 |
| Hi | 1,830,023 | 0.38 | 888,594 | 0.48 | 947,347 | 0.32 | 43,098 | 0.32 | 312 | 0.38 |
| Hp | 1,667,825 | 0.38 | 722,762 | 0.48 | 943,139 | 0.36 | 71,452 | 0.32 | 218 | 0.36 |
| Lc | 3,079,196 | 0.46 | 1,048,894 | 0.52 | 1,715,556 | 0.44 | 49,896 | 0.44 | 630 | 0.54 |
| Ll | 2,365,589 | 0.36 | 919,156 | 0.50 | 1,446,921 | 0.30 | 93,684 | 0.26 | 516 | 0.28 |
| Mj | 1,664,957 | 0.32 | 539,920 | 0.46 | 1,051,171 | 0.28 | 96,626 | 0.22 | 428 | 0.32 |
| Mg | 580,076 | 0.32 | 391,682 | 0.40 | 215,544 | 0.26 | 10,658 | 0.24 | 66 | 0.42 |
| Sa | 2,814,816 | 0.32 | 852,402 | 0.50 | 1,969,819 | 0.28 | 123,642 | 0.24 | 362 | 0.34 |
| MRSa | 2,902,619 | 0.32 | 852,542 | 0.50 | 1,988,247 | 0.28 | 131,756 | 0.24 | 354 | 0.34 |
| MSSa | 2,799,802 | 0.32 | 851,978 | 0.50 | 1,988,247 | 0.28 | 126,348 | 0.22 | 418 | 0.36 |
| Sp | 2,209,198 | 0.40 | 1,019,764 | 0.50 | 1,078,877 | 0.36 | 39,170 | 0.32 | 576 | 0.38 |
| Xc | 5,148,708 | 0.64 | 853,780 | 0.48 | 3,936,720 | 0.66 | 642,460 | 0.72 | 1,762 | 0.66 |
| Sc | 12,739,648 | 0.38 | 423,500 | 0.64 | 12,398,033 | 0.38 | 901,182 | 0.28 | 5,619 | 0.20 |
| At | 118,973,747 | 0.36 | 22,900 | 0.80 | 56,034,743 | 0.48 | 56,311,864 | 0.30 | 363,823 | 0.28 |
| Ce | 100,269,917 | 0.36 | 7,668 | 0.70 | 53,359,766 | 0.48 | 43,423,752 | 0.30 | 648,884 | 0.28 |
| Dm | 162,348,295 | 0.42 | 104 | 0.70 | 74,357,742 | 0.50 | 54,260,892 | 0.36 | 506,678 | 0.34 |
| Gg | 984,856,238 | 0.42 | 700 | 0.60 | 38,646,642 | 0.56 | 1,006,332,266 | 0.42 | 6,768,820 | 0.36 |
| Mm | 2,559,165,832 | 0.42 | 190 | 0.62 | 26,244,051 | 0.56 | 1,788,521,026 | 0.44 | 47,781,970 | 0.40 |
| Pt | 2,752,354,403 | 0.40 | 116 | 0.62 | 26,194,501 | 0.56 | 1,767,172,092 | 0.44 | 58,934,573 | 0.36 |
| Hs | 2,858,029,377 | 0.40 | 104 | 0.60 | 25,778,756 | 0.56 | 1,788,484,146 | 0.44 | 61,816,985 | 0.36 |
GC content (G+C) and total (haploid) genome size in units of base pairs (bp) for selected genomes, followed by the GC content and total number of words (size) in sets of minimal absent words of word length 11 (), 14 (), 17 () and 24 ().
Dinucleotide relative abundances.
| TT | GT | CT | TG | GG | TC | TT | GT | CT | TG | GG | TC | |||||||||
| AA | AC | AG | AT | CA | CC | CG | GA | GC | TA | AA | AC | AG | AT | CA | CC | CG | GA | GC | TA | |
|
|
| |||||||||||||||||||
| Ba | 0.8674 | 1.0829 | 1.0923 | 0.9579 | 1.0367 | 0.9270 | 0.9426 | 1.0532 | 0.9268 | 1.0778 | 1.1632 | 0.8850 | 0.9131 | 0.9778 | 1.0685 | 0.9252 | 0.9343 | 0.9982 | 1.1398 | 0.8450 |
| Bs | 0.8389 | 1.1625 | 1.0671 | 0.9096 | 0.9544 | 0.9941 | 0.9771 | 0.9754 | 0.8734 | 1.2592 | 1.3249 | 0.6594 | 0.8978 | 1.0097 | 1.1116 | 0.9227 | 1.0001 | 1.0920 | 1.3769 | 0.5639 |
| Ec | 0.8653 | 1.0761 | 1.1074 | 0.9460 | 0.9447 | 1.0551 | 0.8959 | 1.0539 | 0.8099 | 1.1434 | 1.2753 | 0.8326 | 0.7758 | 1.1317 | 1.1428 | 0.8806 | 1.2016 | 0.8909 | 1.3901 | 0.6799 |
| Hi | 0.9717 | 1.0392 | 1.0115 | 1.0223 | 1.0561 | 0.8915 | 0.9939 | 0.9865 | 1.0328 | 1.0298 | 1.3343 | 0.7593 | 0.7542 | 0.9443 | 1.1501 | 1.0040 | 1.0393 | 0.8656 | 1.5976 | 0.6944 |
| Hp | 0.9999 | 0.8946 | 1.0342 | 1.1011 | 1.0897 | 0.9942 | 0.8301 | 0.9418 | 1.1336 | 1.0129 | 1.4790 | 0.5966 | 0.9617 | 0.8030 | 0.9171 | 1.1756 | 0.9360 | 0.8583 | 1.6716 | 0.6833 |
| Lc | 0.8694 | 1.0764 | 1.0688 | 0.9881 | 0.9256 | 0.9919 | 1.0015 | 1.0188 | 0.9064 | 1.2046 | 1.3264 | 0.7931 | 0.7564 | 1.0573 | 1.2951 | 0.8888 | 1.0202 | 0.9474 | 1.4023 | 0.4950 |
| Ll | 0.9394 | 1.0483 | 1.0543 | 1.0123 | 1.0562 | 0.9297 | 0.9069 | 1.0110 | 0.9543 | 1.0446 | 1.3057 | 0.7430 | 0.9376 | 0.8808 | 1.1690 | 0.9871 | 0.5607 | 1.0806 | 1.1910 | 0.6240 |
| Mj | 1.0080 | 0.9651 | 1.0950 | 1.0086 | 1.1583 | 1.0592 | 0.5254 | 0.9743 | 0.9255 | 0.9509 | 1.1675 | 0.6589 | 1.1160 | 0.9607 | 0.9928 | 1.3265 | 0.1362 | 1.0894 | 1.0554 | 0.8397 |
| Mg | 1.0729 | 0.9656 | 1.0820 | 0.9640 | 1.1730 | 1.0249 | 0.4096 | 0.9425 | 0.9614 | 0.9203 | 1.3516 | 0.9641 | 1.0373 | 0.6887 | 1.1480 | 0.9439 | 0.2021 | 0.8445 | 1.2541 | 0.6935 |
| Sa | 0.9518 | 1.0724 | 1.0640 | 0.9759 | 1.0598 | 0.8798 | 0.9294 | 1.0273 | 0.9538 | 1.0301 | 1.1225 | 0.9195 | 0.8297 | 1.0227 | 1.2305 | 0.8174 | 0.7703 | 0.9571 | 1.2221 | 0.8439 |
| MRSa | 0.9512 | 1.0725 | 1.0643 | 0.9753 | 1.0583 | 0.8838 | 0.9293 | 1.0283 | 0.9506 | 1.0308 | 1.1228 | 0.9182 | 0.8331 | 1.0222 | 1.2262 | 0.8181 | 0.7731 | 0.9591 | 1.2198 | 0.8442 |
| MSSa | 0.9510 | 1.0714 | 1.0641 | 0.9781 | 1.0582 | 0.8792 | 0.9316 | 1.0292 | 0.9530 | 1.0308 | 1.1229 | 0.9187 | 0.8311 | 1.0221 | 1.2291 | 0.8185 | 0.7690 | 0.9570 | 1.2240 | 0.8440 |
| Sp | 0.9238 | 1.0612 | 1.0455 | 1.0114 | 1.0453 | 0.9319 | 0.9319 | 1.0047 | 0.9616 | 1.0690 | 1.2614 | 0.7412 | 1.0843 | 0.8686 | 1.1094 | 1.0437 | 0.4806 | 1.1595 | 1.0175 | 0.6272 |
| Xc | 0.9980 | 1.0233 | 1.0372 | 0.9046 | 0.9933 | 1.0837 | 0.9363 | 1.0844 | 0.8420 | 0.8905 | 0.9987 | 0.9314 | 0.8578 | 1.2561 | 1.3101 | 0.7968 | 1.1579 | 0.9357 | 1.3168 | 0.3718 |
| Sc | 0.7361 | 1.1548 | 1.0227 | 0.9554 | 0.8656 | 0.9264 | 1.1417 | 1.0688 | 0.9380 | 1.3886 | 1.1430 | 0.8883 | 0.9940 | 0.9657 | 1.1245 | 1.0058 | 0.6842 | 1.0659 | 0.9637 | 0.7815 |
| At | 0.6870 | 1.1120 | 0.9022 | 1.0019 | 0.8643 | 0.9710 | 1.1041 | 0.7442 | 1.0904 | 2.4828 | 0.9740 | 1.0247 | 1.0508 | 0.9970 | 1.0531 | 0.9553 | 0.8830 | 1.0528 | 0.9136 | 0.9670 |
| Ce | 0.7467 | 1.0779 | 1.1695 | 0.7336 | 0.5345 | 1.0924 | 1.0418 | 0.6806 | 0.9795 | 3.1638 | 0.9972 | 1.0064 | 1.0168 | 1.0173 | 1.0793 | 0.9359 | 0.9132 | 1.0663 | 0.9413 | 0.9046 |
| Dm | 0.2031 | 1.4083 | 1.3361 | 0.3250 | 0.1444 | 1.1315 | 1.1395 | 0.5597 | 0.8185 | 4.6312 | 0.9694 | 1.0001 | 1.0178 | 1.0264 | 1.0255 | 0.9717 | 0.9731 | 1.0342 | 0.9837 | 0.9804 |
| Gg | 0.7215 | 1.2696 | 0.4839 | 1.3320 | 0.1294 | 0.3009 | 2.7760 | 1.6576 | 1.0173 | 1.7020 | 0.9875 | 1.0633 | 0.8567 | 1.1065 | 0.8177 | 0.9293 | 1.3506 | 1.0541 | 0.9569 | 1.1795 |
| Mm | 0.8047 | 1.4168 | 0.0980 | 1.6093 | 0.0802 | 0.1989 | 3.1720 | 1.4792 | 1.1052 | 2.0116 | 1.0112 | 1.1047 | 0.7685 | 1.1408 | 0.7452 | 0.8347 | 1.5629 | 1.0451 | 1.0283 | 1.2707 |
| Pt | 0.5187 | 1.4746 | 0.2044 | 1.7917 | 0.0292 | 0.2080 | 3.1465 | 1.7082 | 0.8499 | 1.9804 | 0.9765 | 1.1250 | 0.7935 | 1.1155 | 0.7496 | 0.8127 | 1.5620 | 1.0659 | 1.0202 | 1.2721 |
| Hs | 0.6270 | 1.5013 | 0.1453 | 1.5549 | 0.0969 | 0.2078 | 3.1589 | 1.4529 | 0.9975 | 2.3073 | 0.9770 | 1.1258 | 0.7904 | 1.1168 | 0.7466 | 0.8113 | 1.5697 | 1.0670 | 1.0197 | 1.2732 |
|
|
| |||||||||||||||||||
| Ba | 1.2532 | 0.7720 | 0.9573 | 0.8747 | 1.0238 | 0.9385 | 0.8698 | 1.0798 | 1.2244 | 0.7420 | 1.2009 | 0.8422 | 0.9177 | 0.9518 | 0.8565 | 1.0275 | 1.3330 | 1.0763 | 1.0910 | 0.8396 |
| Bs | 1.4571 | 0.5755 | 0.9314 | 0.8589 | 1.0798 | 0.9053 | 0.9921 | 1.1213 | 1.5339 | 0.4535 | 1.2538 | 0.7191 | 1.0631 | 0.8939 | 1.0631 | 0.9644 | 0.8647 | 1.0943 | 1.2904 | 0.6419 |
| Ec | 1.3243 | 0.7623 | 0.8105 | 1.1538 | 1.1810 | 0.8463 | 1.1719 | 0.8565 | 1.4930 | 0.6207 | 1.0957 | 0.8330 | 0.8121 | 1.2909 | 1.0156 | 0.8584 | 1.3054 | 0.8330 | 1.4561 | 1.0463 |
| Hi | 1.3925 | 0.7487 | 0.7372 | 0.8735 | 1.1157 | 1.0156 | 1.1798 | 0.8351 | 1.7471 | 0.6578 | 1.6130 | 0.8853 | 0.7172 | 0.6313 | 1.0249 | 1.2034 | 1.1918 | 0.6367 | 1.5788 | 0.6023 |
| Hp | 1.6071 | 0.5276 | 1.0325 | 0.6202 | 0.7679 | 1.2301 | 1.0894 | 0.8674 | 1.9277 | 0.5925 | 1.4888 | 0.6517 | 1.0665 | 0.6898 | 0.8469 | 1.0904 | 1.0143 | 0.9968 | 1.4454 | 0.6208 |
| Lc | 1.3740 | 0.7606 | 0.7478 | 1.0331 | 1.3132 | 0.8841 | 1.0198 | 0.9273 | 1.4838 | 0.4472 | 1.3225 | 0.9470 | 1.0247 | 0.6921 | 0.9837 | 0.9648 | 1.0462 | 0.8282 | 1.2417 | 0.8722 |
| Ll | 1.3628 | 0.6761 | 0.9463 | 0.7994 | 1.1622 | 0.9457 | 0.4694 | 1.1463 | 1.2745 | 0.5643 | 1.2463 | 1.1724 | 1.2465 | 0.6195 | 1.2349 | 0.3842 | 0.2861 | 1.1048 | 0.8257 | 0.6208 |
| Mj | 1.2579 | 0.5432 | 1.1750 | 0.8504 | 0.8545 | 1.4438 | 0.1240 | 1.1772 | 1.1479 | 0.7630 | 1.3002 | 0.6199 | 0.9459 | 0.9135 | 0.7801 | 1.6623 | 0.9276 | 1.0307 | 0.9851 | 0.7976 |
| Mg | 1.3999 | 0.9789 | 1.0396 | 0.6213 | 1.1343 | 0.9374 | 0.1796 | 0.8249 | 1.3549 | 0.6482 | 1.2241 | 1.1004 | 0.9062 | 0.7949 | 1.1327 | 1.3469 | 0.5563 | 0.7228 | 0.8492 | 0.9062 |
| Sa | 1.1715 | 0.8083 | 0.8213 | 0.9710 | 1.2210 | 0.8432 | 0.6931 | 1.0024 | 1.4168 | 0.7926 | 1.1583 | 0.8588 | 0.8889 | 0.9751 | 1.2354 | 1.2402 | 0.4757 | 0.9836 | 1.0193 | 0.7482 |
| MRSa | 1.1749 | 0.8022 | 0.8264 | 0.9672 | 1.2122 | 0.8459 | 0.6956 | 1.0064 | 1.4191 | 0.7918 | 1.2262 | 0.8094 | 0.9032 | 0.9319 | 1.2391 | 1.2731 | 0.4355 | 1.0450 | 0.9464 | 0.6387 |
| MSSa | 1.1734 | 0.8047 | 0.8238 | 0.9691 | 1.2136 | 0.8474 | 0.6976 | 1.0086 | 1.4048 | 0.7913 | 1.1488 | 0.7910 | 0.8930 | 1.0286 | 1.2138 | 1.3481 | 0.4349 | 0.9586 | 1.0972 | 0.7625 |
| Sp | 1.3139 | 0.6674 | 1.0855 | 0.8313 | 1.0782 | 1.0787 | 0.4590 | 1.2045 | 1.0577 | 0.5800 | 1.1551 | 0.9672 | 1.1612 | 0.7942 | 1.0862 | 0.7218 | 0.8312 | 1.1224 | 1.0762 | 0.7239 |
| Xc | 0.8915 | 0.8372 | 0.9190 | 1.5356 | 1.4737 | 0.7039 | 1.1750 | 0.7927 | 1.4666 | 0.2290 | 1.1805 | 0.7672 | 0.8686 | 1.4490 | 1.0932 | 0.8619 | 1.1768 | 1.0129 | 1.2730 | 0.5404 |
| Sc | 1.2873 | 0.7379 | 0.9707 | 0.8648 | 1.0771 | 1.0636 | 0.6238 | 1.1633 | 0.9710 | 0.6474 | 1.1300 | 0.7822 | 0.7601 | 1.0212 | 1.0169 | 1.5590 | 0.8439 | 0.8460 | 1.4507 | 0.9346 |
| At | 1.1882 | 0.8758 | 1.0276 | 0.8891 | 1.1467 | 0.9497 | 0.4811 | 1.1619 | 0.7885 | 0.7090 | 1.3495 | 0.7949 | 1.0314 | 0.7427 | 0.9481 | 1.2108 | 0.7179 | 1.1111 | 0.8849 | 0.6470 |
| Ce | 1.3544 | 0.8051 | 0.8595 | 0.8300 | 1.1331 | 0.9660 | 0.8434 | 1.1434 | 0.9442 | 0.5638 | 1.6479 | 0.6363 | 0.6449 | 0.6548 | 0.9278 | 1.4986 | 1.4449 | 0.9360 | 1.4713 | 0.4245 |
| Dm | 1.3113 | 0.7948 | 0.8135 | 0.9479 | 1.1792 | 1.0690 | 0.8591 | 0.8354 | 1.4758 | 0.7088 | 1.2507 | 1.0053 | 0.7587 | 0.8966 | 1.2381 | 1.0439 | 0.8886 | 0.7289 | 1.3756 | 0.7875 |
| Gg | 1.0517 | 0.8999 | 1.2162 | 0.8893 | 1.2984 | 1.0510 | 0.1728 | 0.9849 | 1.0822 | 0.7658 | 1.4832 | 0.7527 | 1.0513 | 0.6496 | 1.1000 | 1.5054 | 0.1512 | 0.9242 | 1.0100 | 0.5236 |
| Mm | 0.9765 | 0.9321 | 1.2167 | 0.9296 | 1.2398 | 1.1517 | 0.2074 | 0.9810 | 0.9470 | 0.8716 | 1.2105 | 0.8866 | 1.1253 | 0.7965 | 1.2101 | 1.3217 | 0.1254 | 0.9683 | 0.8747 | 0.6859 |
| Pt | 0.9868 | 0.9224 | 1.1997 | 0.9418 | 1.2328 | 1.1651 | 0.2274 | 0.9730 | 0.9508 | 0.8739 | 1.2579 | 0.8113 | 1.0631 | 0.8358 | 1.1596 | 1.3135 | 0.2443 | 0.9229 | 1.0897 | 0.7115 |
| Hs | 0.9852 | 0.9238 | 1.1992 | 0.9420 | 1.2319 | 1.1646 | 0.2330 | 0.9715 | 0.9515 | 0.8765 | 1.2583 | 0.8108 | 1.0642 | 0.8353 | 1.1585 | 1.3157 | 0.2448 | 0.9251 | 1.0834 | 0.7101 |
Dinucleotide relative abundances for sets of minimal absent words of word length 11 (), 14 (), 17 () and 24 ().
Figure 2Similarity in dinucleotide compositional biases in prokaryotes:.
Dendograms from dinucleotide relative abundances in sets of minimal absent words of word length 11 (), 14 (), 17 () and 24 () for selected prokaryotic genomes.
Figure 3Similarity in dinucleotide compositional biases in eukaryotes.
Dendograms from dinucleotide relative abundances in sets of minimal absent words of word length 11 (), 14 (), 17 () and 24 () for selected eukaryotic genomes.