| Literature DB >> 21496459 |
Zhao-Hui Qi1, Ling Li, Zhi-Meng Zhang, Xiao-Qin Qi.
Abstract
We introduce a weighted graph model to investigate the self-similarity characteristics of eubacteria genomes. The regular treating in similarity comparison about genome is to discover the evolution distance among different genomes. Few people focus their attention on the overall statistical characteristics of each gene compared with other genes in the same genome. In our model, each genome is attributed to a weighted graph, whose topology describes the similarity relationship among genes in the same genome. Based on the related weighted graph theory, we extract some quantified statistical variables from the topology, and give the distribution of some variables derived from the largest social structure in the topology. The 23 eubacteria recently studied by Sorimachi and Okayasu are markedly classified into two different groups by their double logarithmic point-plots describing the similarity relationship among genes of the largest social structure in genome. The results show that the proposed model may provide us with some new sights to understand the structures and evolution patterns determined from the complete genomes.Entities:
Mesh:
Year: 2011 PMID: 21496459 PMCID: PMC7094106 DOI: 10.1016/j.jtbi.2011.03.033
Source DB: PubMed Journal: J Theor Biol ISSN: 0022-5193 Impact factor: 2.691
Genomes used for this study.
| NC_002758 | 2,878,529 | 2775 | ||
| NC_002737 | 1,852,441 | 1811 | ||
| NC_000964 | 4,214,630 | 4225 | ||
| NC_003366 | 3,031,430 | 2786 | ||
| NC_003210 | 2,944,528 | 2940 | ||
| NC_002771 | 963,879 | 815 | ||
| L43967.2 | NC_000908 | 580,076 | 525 | |
| U00089.2 | NC_000912 | 816,394 | 733 | |
| NC_011374 | 874,478 | 692 | ||
| NC_002755 | 4,403,837 | 4293 | ||
| NC_002677 | 3,268,203 | 2770 | ||
| NC_000963 | 1,111,523 | 886 | ||
| NC_001318 | 910,724 | 875 | ||
| CP000538.1 | NC_008787 | 1,616,554 | 1707 | |
| NC_000915 | 1,667,867 | 1630 | ||
| NC_000921 | 1,643,831 | 1535 | ||
| U00096.2 | NC_000913 | 4,639,675 | 4467 | |
| NC_003198 | 4,809,037 | 4711 | ||
| NC_002505 | 2,961,149 | 2889 | ||
| NC_002506 | 1,072,315 | 1119 | ||
| NC_003143 | 4,653,728 | 4103 | ||
| NC_003116 | 2,184,406 | 2065 | ||
| L42023.1 | NC_000907 | 1,830,138 | 1789 | |
| NC_000919 | 1,138,011 | 1095 |
Fig. 1A full-connected and weighted graph.
Fig. 2An example for a weighted graph related to Treponema pallidum. (a) All links of the gene “gi|3322290” of Treponema pallidum connected to other genes. The E-value form the detailed statistics information is used as the weighted value. (b) The overall weighted graph related to Treponema pallidum.
Some alignment results of the gene “gi|3322290” of Treponema pallidum by BLAST (bl2seq) with E-value 0.001.
| gi|3322290 | gi|3322384 | 238 | 269 | 73.6 | 4e−018 | 49/192(0.25) | 86/192(0.44) |
| gi|3322290 | gi|3322406 | 238 | 220 | 52.0 | 9e−012 | 47/194(0.24) | 85/194(0.43) |
| gi|3322290 | gi|3322432 | 238 | 266 | 89.0 | 9e−023 | 66/203(0.32) | 106/203(0.52) |
| gi|3322290 | gi|3322495 | 238 | 255 | 48.1 | 2e−010 | 47/200(0.23) | 85/200(0.42) |
| gi|3322290 | gi|3322575 | 238 | 516 | 45.1 | 3e−009 | 31/92(0.33) | 49/92(0.53) |
| gi|3322290 | gi|3322598 | 238 | 533 | 37.4 | 6e−007 | 31/104(0.29) | 54/104(0.51) |
| gi|3322290 | gi|3322806 | 238 | 960 | 35.4 | 4e−006 | 27/84(0.32) | 40/84(0.47) |
| gi|3322290 | gi|3322874 | 238 | 226 | 51.6 | 1e−011 | 52/207(0.25) | 87/207(0.42) |
Some quantified statistical variables; N: the number of social structures; N: the total number of genes in all social structures; P: the proportion of N in genome; N: the total number of genes in the largest social structure; P: the proportion of N in genome; N: the total number of links in the largest social structure; A: the average of links of every gene in the largest social structure.
| 2775 | 12 | 2533 | 0.91 | 2505 | 0.90 | 10822 | 4.32 | |
| 1811 | 51 | 1379 | 0.76 | 1264 | 0.70 | 3615 | 2.86 | |
| 4225 | 16 | 3845 | 0.91 | 3812 | 0.90 | 18850 | 4.94 | |
| 2786 | 4 | 2577 | 0.92 | 2571 | 0.92 | 17891 | 6.96 | |
| 2940 | 12 | 2679 | 0.91 | 2657 | 0.90 | 11665 | 4.39 | |
| 815 | 3 | 702 | 0.86 | 698 | 0.86 | 4493 | 6.44 | |
| 525 | 20 | 335 | 0.64 | 295 | 0.56 | 646 | 2.19 | |
| 733 | 32 | 493 | 0.67 | 403 | 0.55 | 1555 | 3.86 | |
| 692 | 5 | 560 | 0.81 | 550 | 0.79 | 2647 | 4.81 | |
| 4293 | 12 | 3901 | 0.91 | 3879 | 0.90 | 22217 | 5.73 | |
| 2770 | 44 | 1253 | 0.45 | 1149 | 0.41 | 2290 | 1.99 | |
| 886 | 20 | 622 | 0.70 | 573 | 0.65 | 1074 | 1.87 | |
| 875 | 4 | 754 | 0.86 | 748 | 0.85 | 2979 | 3.92 | |
| 1707 | 9 | 1497 | 0.88 | 1481 | 0.87 | 4860 | 3.28 | |
| 1630 | 16 | 1323 | 0.81 | 1291 | 0.79 | 3634 | 2.81 | |
| 1535 | 20 | 1265 | 0.82 | 1225 | 0.80 | 3473 | 2.84 | |
| 4467 | 34 | 3793 | 0.85 | 3712 | 0.83 | 16643 | 4.48 | |
| 4711 | 45 | 3987 | 0.85 | 3892 | 0.83 | 15688 | 4.03 | |
| 4008 | 59 | 2238 | 0.56 | 2101 | 0.52 | 6849 | 3.26 | |
| 4103 | 54 | 3486 | 0.85 | 3351 | 0.82 | 20108 | 6.00 | |
| 2065 | 92 | 1483 | 0.72 | 1243 | 0.60 | 2126 | 1.03 | |
| 1789 | 68 | 1316 | 0.74 | 1145 | 0.64 | 2546 | 2.22 | |
| 1095 | 82 | 609 | 0.56 | 352 | 0.32 | 695 | 1.97 |
Fig. 3Double logarithmic point-plot of degree distribution function P(k) in the largest social structure of genome and the fitting power-law curve ck−.
Fig. 4Double logarithmic point-plot of edge-weight distribution function P(w) in the largest social structure of genome and the fitting power-law curve ck−.