| Literature DB >> 29633935 |
Apostolos Almpanis1,2, Martin Swain1, Derek Gatherer3, Neil McEwan1,4.
Abstract
Based on complete bacterial genome sequence data, we demonstrate a correlation between bacterial chromosome length and the G+C content of the genome, with longer genomes having higher G+C contents. The correlation value decreases at shorter genome sizes, where there is a wider spread of G+C values. However, although significant (P<0.001), the correlation value (Pearson R=0.58) suggests that other factors also have a significant influence. A similar pattern was seen for plasmids; longer plasmids had higher G+C values, although the large number of shorter plasmids had a wide spread of G+C values. There was also a significant (P<0.0001) correlation between the G+C content of plasmids and the G+C content of their bacterial host. Conversely, the G+C content of bacteriophages tended to reduce with larger genome sizes, and although there was a correlation between host genome G+C content and that of the bacteriophage, it was not as strong as that seen between plasmids and their hosts.Entities:
Keywords: bacteria; genome G+C content; genome length; plasmids
Mesh:
Year: 2018 PMID: 29633935 PMCID: PMC5989581 DOI: 10.1099/mgen.0.000168
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Microbes, plasmids and phages with extreme values of length
The five longest and shortest values are shown in each case. G+C values have been rounded to one decimal place.
| Longest bacterial genomes | |||
| 1 | 16 040 666 | 69.1 | |
| 2 | 14 782 125 | 72.1 | |
| 3 | 13 047 416 | 71.8 | |
| 4 | 13 033 779 | 71.4 | |
| 5 | 12 489 432 | 69.4 | |
| Shortest bacterial genomes | |||
| 1 | 112 031 | 16.6 | |
| 2 | 112 091 | 17.1 | |
| 3 | 133 698 | 46.8 | |
| 4 | 138 927 | 58.8 | |
| 5 | 138 931 | 58.8 | |
| Longest plasmids | |||
| 1 | 2 580 084 | 63.6 | |
| 2 | 2 555 069 | 62.4 | |
| 3 | 2 466 951 | 59.4 | |
| 4 | 2 430 033 | 62.3 | |
| 5 | 2 388 366 | 59.2 | |
| Shortest plasmids | |||
| 1 | 744 | 42.2 | |
| 2 | 870 | 32.6 | |
| 3 | 886 | 31.3 | |
| 4 | 1 085 | 30.4 | |
| 5 | 1 109 | 59.1 | |
| Longest phages | |||
| 1 | 490 380 | 37.1 | |
| 2 | 440 001 | 30.0 | |
| 3 | 378 379 | 35.9 | |
| 4 | 370 920 | 28.7 | |
| 5 | 358 663 | 35.6 | |
| Shortest phages | |||
| 1 | 2 435 | 33.3 | |
| 2 | 3 405 | 48.0 | |
| 3 | 3 486 | 46.5 | |
| 4 | 3 523 | 48.4 | |
| 5 | 3 525 | 51.0 | |
Microbes, plasmids and phages with extreme values of G+C content
Only the five highest and lowest values are shown in each case. G+C values have been rounded to one decimal place.
| Genome | |||
|---|---|---|---|
| Organisms with highest bacterial genome G+C content | |||
| 1 | | 5 013 479 | 74.9 |
| 2 | | 5 061 632 | 74.8 |
| 3 | | 6 543 262 | 74.8 |
| 4 | | 2 594 799 | 74.7 |
| 5 | | 4 266 344 | 74.7 |
| Organisms with lowest bacterial genome G+C content | |||
| 1 | | 208 564 | 13.5 |
| 2 | | 162 589 | 14.0 |
| 3 | | 166 163 | 14.2 |
| 4 | | 162 504 | 14.2 |
| 5 | | 157 543 | 14.6 |
| Plasmids with highest G+C content | |||
| 1 | | 30 888 | 87.5 |
| 2 | | 15 591 | 83.3 |
| 3 | | 1 809 491 | 73.3 |
| 4 | | 1 812 548 | 73.3 |
| 5 | | 24 272 | 72.9 |
| Plasmids with lowest G+C content | |||
| 1 | | 3 465 | 20.3 |
| 2 | | 3 674 | 20.6 |
| 3 | | 18 777 | 20.6 |
| 4 | | 10 702 | 20.9 |
| 5 | | 3 260 | 21.0 |
| Phage with highest G+C content | |||
| 1 | | 37 612 | 72.7 |
| 2 | | 38 411 | 72.6 |
| 3 | | 39 522 | 72.6 |
| 4 | | 38 496 | 72.5 |
| 5 | | 39 693 | 72.5 |
| Phage with lowest G+C content | |||
| 1 | | 6 825 | 20.3 |
| 2 | | 8 273 | 22.9 |
| 3 | | 7 878 | 23.0 |
| 4 | | 7 768 | 23.2 |
| 5 | | 18 855 | 24.8 |
Fig. 1.Scatterplot of G+C content versus sequence length for bacterial chromosomal sequences, showing an approximately triangular shape associated with their relation. Pearson’s R indicates that about 58 mol% of the G+C content variation can be explained by genome length, although there is also apparent heteroskedasticity. G+C content is plotted using values to the nearest percentage point.
Fig. 2.Scatterplot of plasmid G+C content versus plasmid sequence length, showing an approximately rotated L-shape. G+C content is shown to the nearest mol% value.
Fig. 3.Comparison of the G+C content of plasmids versus that of their host. G+C content is shown to the nearest mol% value.
Fig. 4.Scatterplot of phage G+C content versus phage sequence length, showing that longer phages tend to have a lower G+C content. G+C content is shown to the nearest mol% value.
Fig. 5.Comparison of the G+C content of phages versus that of their host. G+C content is shown to the nearest mol% value.