| Literature DB >> 24205026 |
Yuncan Ai1, Hannan Ai, Fanmei Meng, Lei Zhao.
Abstract
BACKGROUND: No attention has been paid on comparing a set of genome sequences crossing genetic components and biological categories with far divergence over large size range. We define it as the systematic comparative genomics and aim to develop the methodology.Entities:
Mesh:
Year: 2013 PMID: 24205026 PMCID: PMC3812135 DOI: 10.1371/journal.pone.0077912
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1A mathematical model for creating a set of coordinates (xn, yn, zn) from a circular genome sequence.
We randomly select a base (the nth) as the first target base (TB) while keep moving the mth focusing base (FB). For the given TB (nth), we define the relative distance (RD) between the selected TB (nth) and the moving FB (mth) (m = 1, 2, …, N).
Features of genome sequences from bacteria and archaeal bacteria.
| Species and Strain | Sequence ID | Type | Size (bps) |
|
| |||
|
| AC_000091NC_007779 | Chromosome | 4646332 |
|
| NC_010473 | Chromosome | 4686137 |
|
| NC_000913 | Chromosome | 4639675 |
|
| NC_012947 | Chromosome | 4570938 |
|
| NC_013941 | Chromosome | 5386352 |
|
| NC_007946 | Chromosome | 5065741 |
|
| NC_004431 | Chromosome | 5231428 |
|
| NC_010498 | Chromosome | 5068389 |
|
| NC_012588 | Chromosome | 2608832 |
|
| NC_012726 | Chromosome | 2586647 |
|
| NC_012623 | Chromosome | 2812165 |
|
| NC_012622 | Chromosome | 2702058 |
|
| NC_014222 | Chromosome | 1936387 |
|
| NC_007681 | Chromosome | 1767403 |
|
| NC_014532 | Chromosome | 4119315 |
|
| NC_008789 | Chromosome | 2716716 |
|
| NC_013158 | Chromosome | 3161321 |
|
| NC_011899 | Chromosome | 2614977 |
|
| NC_013422 | Chromosome | 2619785 |
|
| NC_014729 | Chromosome | 2860838 |
|
| NC_013743 | Chromosome | 3944596 |
|
| NC_019962 | Chromosome | 3844629 |
|
| NC_008212 | Chromosome | 3177244 |
|
| NC_012029 | Chromosome | 2774371 |
|
| NC_012028 | Chromosome | 533457 |
|
| NC_006396 | Chromosome | 3176463 |
|
| NC_006397 | Chromosome | 292165 |
|
| NC_006389 | plasmid pNG100 | 33779 |
|
| NC_006390 | plasmid pNG200 | 33930 |
|
| NC_006391 | plasmid pNG300 | 40086 |
|
| NC_006392 | plasmid pNG400 | 50776 |
|
| NC_006393 | plasmid pNG500 | 134574 |
|
| NC_006394 | plasmid pNG600 | 157519 |
|
| NC_006395 | plasmid pNG700 | 416420 |
|
| NC_013202 | Chromosome | 3154923 |
|
| NC_013201 | plasmid pHmuk01 | 225032 |
|
| NC_013967 | Chromosome | 2888440 |
|
| NC_013964 | plasmid pHV3 | 444162 |
|
| NC_013965 | plasmid pHV2 | 6450 |
|
| NC_013966 | plasmid pHV4 | 644869 |
|
| NC_013968 | plasmid pHV1 | 86308 |
|
| NC_002607 | Chromosome | 2014239 |
|
| NC_010364 | Chromosome | 2000962 |
|
| |||
|
| 91.1.1 | Chromosome fragment | 227694 |
|
| 91.1.61 | Chromosome fragment | 324260 |
|
| 91.6.59 | Chromosome fragment | 410186 |
|
| 91.7 | Chromosome fragment | 953958 |
|
| 913.1.77 | Chromosome fragment | 331163 |
|
| 913.5.57 | Chromosome fragment | 408963 |
|
| 4431.1.70 | Chromosome fragment | 401260 |
|
| 7946.4.7 | Chromosome fragment | 518065 |
|
| 10473.1.74 | Chromosome fragment | 325622 |
|
| 10473.4.57 | Chromosome fragment | 412818 |
|
| 10498.4.86 | Chromosome fragment | 331536 |
|
| 12947.1 | Chromosome fragment | 1759795 |
|
| 12947.1.50 | Chromosome fragment | 470050 |
|
| 12947.5 | Chromosome fragment | 43254 |
|
| 13941.1 | Chromosome fragment | 1915479 |
|
| 13941.2.60 | Chromosome fragment | 267039 |
Figure 2The primary genome fingerprint map (P-GFM) for the overall comparison among a number of genome fingerprint maps.
(A). Similar: Sulfolobus islandicus M.14.25 (NC_012588) and M.16.4 (NC_012726); (B). Partly similar: S. islandicus Y.N.15.51 (NC_012623) and Methanococcus voltae A3 (NC_014222); (C). Different: S. islandicus Y.G.57.14 (NC_012622) and Methanosphaera stadtmanae 3091 (NC_007681); (D). Mixture: (twelve fragmental genomes of strains in Escherichia coli (listed in Table 1): 91.1.1, 91.1.61, 91.6.59, 913.1.77, 913.5.57, 4431.1.70, 7946.4.7, 10473.1.74, 10473.4.57, 10498.4.86, 12947.1.50, 13941.2.60.
Figure 3The primary genome fingerprint map (P-GFM) (A) and the secondary genome fingerprint maps (S-GFMs) (B∼H) for the comparisons between two chromosomes of Halobacterium sp. NRC-1 (NC_002607) and Halobacterium salinarum R1 (NC_010364).
(A). xn∼yn∼zn; (B). xn∼yn; (C). xn∼zn; (D). yn∼zn; (E). xn∼n; (F). yn∼n; (G). zn∼n; (H). xn∼n and yn∼n together. Note that two replication ori points (oriC1 and ori C2) are marked by arrows; other arrows indicated the genome-wide evolution events.
Figure 4The universal genome fingerprint map (UGFM) for the comparison among a set of genomes in-one-sitting.
Twelve fragmental genome sequences (Table 1) are shown as one UGFM vision. Each individual primary genome fingerprint map (P-GFM) is classified into a discrete group solely based on its location: Group (A) (91.1.61, 913.1.77 and 10473.1.74), Group (B) (91.6.59, 913.5.57 and 13941.2.60), Group (C) (7946.4.7 and 12947.1.50), Group (D) (10498.4.86), Group (E) (91.1.1), and Group (F) (4431.1.70).
Figure 5The conceptual framework of the universal genome fingerprint analysis (UGFA).
The core concepts and tools include UGFM, UGFM-TGCC, and UGFM-TGCC-SCG. Abbreviations: 3D-P: three-dimensional plot; 2D-TP: two-dimensional trajectory projections; GF: genome fingerprint; GFM: genome fingerprint map; P-GFM: primary genome fingerprint map; S-GFM: secondary genome fingerprint map; UGFM: universal genome fingerprint map; TGCC: total genetic component configuration; UGFM-TGCC: universal genome fingerprint map of total genetic component configuration; SCG: systematic comparative genomics; UGFM-TGCC-SCG: universal genome fingerprint map of total genetic component configuration based systematic comparative genomics; UGFA: universal genome fingerprint analysis.
Figure 6The UGFM-TGCC-SCG of four archaeal bacterial strains crossing four genera of halophilic Archaea.
One set (A vs.B): Halorubrum lacusprofundii ATCC49239 [chromosome I (NC_012029), chromosome II (NC_012028), plasmid pHLAC01 (NC_012030)] vs. Haloarcula marismortui ATCC43049 [chromosome I (NC_006396), chromosome II (NC_006397), and seven plasmids pNG100 (NC_006389), pNG200 (NC_006390), pNG300 (NC_006391), pNG400 (NC_006392), pNG500 (NC_006393), pNG600 (NC_006394), pNG700 (NC_006395)] focusing on plasmids (A) and as a universal system (B); The other set (C vs.D): Haloferax vocanii DS2 [chromosome (NC_013967), and four plasmids pHV3 (NC_013964), pHV2 (NC_013965), pHV4 (NC_013966), pHV1 (NC_013968)] vs. Halomicrobium mukohataei DSM 12286 [chromosome (NC_013202), plasmid pHmuk01(NC_013201)] focusing on plasmids (C) and as a universal system (D). Note that the tiny spots and the giant visions are elegantly plotted in-one-sitting within the same figure.
Features of genome sequences from phages and viruses.
| Species and Strain | Sequence ID | Type | Size (bps) |
|
| |||
| WA5: Coliphage WA5 | NC_007847 | Phage chromosome | 5737 |
| ID11: Coliphage ID11 | NC_006954 | Phage chromosome | 5737 |
| WA3: Coliphage WA3 | NC_007845 | Phage chromosome | 5700 |
| WA2: Coliphage WA2 | NC_007844 | Phage chromosome | 5700 |
| ID41: Coliphage ID41 | NC_007851 | Phage chromosome | 5737 |
| NC10: Coliphage NC10 | NC_007854 | Phage chromosome | 5687 |
| WA6: Coliphage WA6 | NC_007852 | Phage chromosome | 5687 |
| ID12: Coliphage ID12 | NC_007853 | Phage chromosome | 5687 |
| NC13: Coliphage NC13 | NC_007849 | Phage chromosome | 5737 |
| NC2: Coliphage NC2 | NC_007848 | Phage chromosome | 5737 |
| NC6: Coliphage NC6 | NC_007855 | Phage chromosome | 5687 |
| ID52: Coliphage ID52 | NC_007825 | Phage chromosome | 5698 |
| ID8: Coliphage ID8 | NC_007846 | Phage chromosome | 5700 |
| G4: Enterobacteria phage G4 | NC_001420 | Phage chromosome | 5737 |
| ID2: Coliphage ID2 | NC_007817 | Phage chromosome | 5644 |
| WA14: Coliphage WA14 | NC_007857 | Phage chromosome | 5644 |
| ID18: Coliphage ID18 | NC_007856 | Phage chromosome | 5644 |
| WA45: Coliphage WA45 | NC_007822 | Phage chromosome | 6242 |
| ID21: Coliphage ID21 | NC_007818 | Phage chromosome | 6242 |
| NC28: Coliphage NC28 | NC_007823 | Phage chromosome | 6239 |
| ID62: Coliphage ID62 | NC_007824 | Phage chromosome | 6225 |
| NC35: Coliphage NC35 | NC_007820 | Phage chromosome | 6213 |
| NC29: Coliphage NC29 | NC_007827 | Phage chromosome | 6439 |
| NC3: Coliphage NC3 | NC_007826 | Phage chromosome | 6273 |
| alpha3: Enterobacteria phage alpha3 | DQ085810 | Phage chromosome | 6177 |
| WA13: Coliphage WA13 | NC_007821 | Phage chromosome | 6242 |
| phiK: Coliphage phiK | NC_001730 | Phage chromosome | 6263 |
| ID32: Coliphage ID32 | NC_007819 | Phage chromosome | 6245 |
| NC19: Coliphage NC19 | NC_007850 | Phage chromosome | 5737 |
| NC16: Coliphage NC16 | NC_007836 | Phage chromosome | 5540 |
| NC5: Coliphage NC5 | NC_007833 | Phage chromosome | 5540 |
| NC37: Coliphage NC37 | NC_007837 | Phage chromosome | 5540 |
| ID1: Coliphage ID1 | NC_007828 | Phage chromosome | 5540 |
| NC7: Coliphage NC7 | NC_007834 | Phage chromosome | 5540 |
| NC1: Coliphage NC1 | NC_007832 | Phage chromosome | 5540 |
| NC11: Coliphage NC11 | NC_007835 | Phage chromosome | 5540 |
| ID22: Coliphage ID22 | NC_007829 | Phage chromosome | 5540 |
| S13: Enterobacteria phage S13 | NC_001424 | Phage chromosome | 5540 |
| phiX174: Coliphage phiX174 | NC_001422 | Phage chromosome | 5540 |
| WA11: Coliphage WA11 | NC_007843 | Phage chromosome | 5541 |
| WA4: Coliphage WA4 | NC_007841 | Phage chromosome | 5540 |
| ID34: Coliphage ID34 | NC_007830 | Phage chromosome | 5540 |
| NC41: Coliphage NC41 | NC_007838 | Phage chromosome | 5540 |
| NC56: Coliphage NC56 | NC_007840 | Phage chromosome | 5540 |
| WA10: Coliphage WA10 | NC_007842 | Phage chromosome | 5540 |
| NC51: Coliphage NC51 | NC_007839 | Phage chromosome | 5540 |
| ID45: Coliphage ID45 | NC_007831 | Phage chromosome | 5540 |
|
| AY283796 | Virus chromosome | 30137 |
|
| AY283797 | Virus chromosome | 30132 |
|
| AY283798 | Virus chromosome | 30137 |
|
| AY283794 | Virus chromosome | 30137 |
|
| AY291451 | Virus chromosome | 30155 |
|
| AY278741 | Virus chromosome | 30153 |
|
| AY283795 | Virus chromosome | 30131 |
|
| AY278488 | Virus chromosome | 30151 |
|
| AY278491 | Virus chromosome | 30168 |
|
| AY278554 | Virus chromosome | 30162 |
|
| NC_004718 | Virus chromosome | 30178 |
|
| AY282752 | Virus chromosome | 30162 |
|
| AF201929 | Virus chromosome | 31724 |
|
| AF208066 | Virus chromosome | 31558 |
|
| AF208067 | Virus chromosome | 31681 |
|
| NC_001846 | Virus chromosome | 31806 |
|
| NC_003436 | Virus chromosome | 28435 |
|
| NC_001451 | Virus chromosome | 28004 |
|
| NC_002306 | Virus chromosome | 29776 |
|
| NC_002645 | Virus chromosome | 27709 |
|
| AF220295 | Virus chromosome | 31546 |
|
| u00735 | Virus chromosome | 31477 |
|
| AF391542 | Virus chromosome | 31473 |
|
| NC_003045 | Virus chromosome | 31473 |
Figure 7The landscape of the UGFM-TGCC-SCG visions at large scale.
(A). The twelve bacterial fragmental chromosomes of E.coli (II) (Table 1), twenty four virus genomes (I) and forty seven phage genomes (III) (Table 2) are shown as three distinct groups, resulting in fewer maps because the genomes are very close relatives and accordingly almost repeat themselves; (B). The representatives selected from (A) are shown as three distinct groups: two archaeal bacterial chromosomes (I); two bacterial fragmental chromosomes of E.coli, two viruses, and two phages (II); three plasmids (III). The strong effects of scale-down and view-angle rotation at large scale are demonstrated.
The quantitative analysis of representative taxa used in this study.
| Taxon |
|
|
|
|
|
|
|
|
| −723.50 | 3286.50 | −25173.50 | 3911.76 | 3.36 | 38846.52 | 11544.34 |
|
| −686.50 | 1944.50 | −36639.51 | 3657.07 | 6.39 | 3171.28 | 496.33 |
|
| −626.50 | 1452.50 | −36613.51 | 3217.80 | 10.25 | 35994.85 | 3512.41 |
|
| 254.00 | −1905.00 | −37151.01 | 2619.59 | 36.42 | 304004.04 | 8348.03 |
|
| −3518.50 | 1648.50 | −30606.51 | 5620.22 | 321.11 | 2987021.60 | 9302.08 |
|
| −299.00 | −2237.00 | −38421.01 | −2951.00 | 42.76 | 535251.30 | 12518.87 |
|
| 4072.00 | 3474.00 | −28174.01 | −7359.17 | 22.74 | 97703.92 | 4296.20 |
|
| 1205.00 | 3302.00 | −24979.00 | −4632.12 | 202.03 | 88805749.64 | 439557.42 |
|
| −6408.50 | −970.50 | 414491.71 | 13711.64 | 25.10 | 587771.06 | 23419.60 |
|
| 4145.50 | 7328.50 | 395302.72 | 22900.28 | 1467.93 | 16494288.43 | 11236.41 |
|
| 5251.00 | −3846.00 | 394896.15 | −19979.20 | 19.39 | 451075.63 | 23268.54 |
|
| −7837.50 | 757.50 | 413575.65 | −13490.82 | 25.28 | 667551.43 | 26407.22 |
|
| 644.00 | −2081.00 | 388729.15 | −8046.40 | 6.42 | 5292.47 | 824.63 |
|
| 476.50 | −1916.50 | 387938.65 | −7075.85 | 16.03 | 12422689.29 | 775145.76 |
|
| 401.00 | 865.00 | −387202.12 | −5121.13 | 30.68 | 11239486.19 | 366386.62 |
|
| −1343.00 | −717.00 | −20823.07 | −2716.74 | 216.56 | 73553959.06 | 339653.28 |
|
| −874.00 | 1275.00 | −360470.18 | 7378.43 | 1.24 | 432.16 | 347.72 |
|
| −851.50 | 1213.50 | −360811.68 | 7197.28 | 19.19 | 936045.35 | 48776.09 |
|
| −2079.00 | 1844.00 | −312055.11 | 10615.71 | 1.58 | 275711.29 | 174157.24 |
|
| 1900.50 | −1177.50 | −486140.66 | 10284.83 | 42.76 | 2879398.22 | 67330.93 |
|
| −6598.00 | 4630.00 | −552680.14 | 25654.04 | 221.76 | 137405406.45 | 619606.27 |
|
| −2613.00 | 5233.00 | 66913.02 | −9708.09 | 36.36 | 17055884.77 | 469043.29 |
|
| 5125.50 | 4368.50 | −402065.63 | −20802.81 | 160.79 | 16003171.72 | 99528.02 |
|
| −26901.01 | 54859.02 | −481632.18 | 89243.66 | 133.41 | 8899042.85 | 66706.24 |
|
| −2129.50 | −2139.50 | −457398.67 | −12773.06 | 42.39 | 17797581.80 | 419838.30 |
|
| −1716.50 | −2140.50 | −37560.57 | −5167.70 | 54.44 | 31479912.34 | 578208.73 |
|
| −3902.00 | −2238.00 | −615765.16 | −17519.47 | 215.81 | 200314105.28 | 928180.28 |
|
| 255.00 | 3329.00 | 312389.12 | 6424.66 | 291.91 | 126733403.82 | 434156.90 |
|
| 2821.00 | 6576.00 | −121748.05 | −13120.26 | 17.10 | 6022573.34 | 352284.18 |
|
| −4316.50 | −3111.50 | −473826.67 | −18531.35 | / | / | / |
The taxa with GenBank_ID are cross-listed in Table 1.
The Euclidean distance (),differentiate rate (), and weighted differentiate rate () are calculated according to the formula (7) by using two adjacent sequences in pairs; and the resultant is listed at the same upper row as the first sequence of the pairs, as shown by the last two rows.