| Literature DB >> 31052562 |
Martin Bartas1, Michaela Čutová2, Václav Brázda3,4, Patrik Kaura5, Jiří Šťastný6,7, Jan Kolomazník8, Jan Coufal9, Pratik Goswami10, Jiří Červeň11, Petr Pečinka12.
Abstract
The role of local DNA structures in the regulation of basic cellular processes is an emerging field of research. Amongst local non-B DNA structures, the significance of G-quadruplexes was demonstrated in the last decade, and their presence and functional relevance has been demonstrated in many genomes, including humans. In this study, we analyzed the presence and locations of G-quadruplex-forming sequences by G4Hunter in all complete bacterial genomes available in the NCBI database. G-quadruplex-forming sequences were identified in all species, however the frequency differed significantly across evolutionary groups. The highest frequency of G-quadruplex forming sequences was detected in the subgroup Deinococcus-Thermus, and the lowest frequency in Thermotogae. G-quadruplex forming sequences are non-randomly distributed and are favored in various evolutionary groups. G-quadruplex-forming sequences are enriched in ncRNA segments followed by mRNAs. Analyses of surrounding sequences showed G-quadruplex-forming sequences around tRNA and regulatory sequences. These data point to the unique and non-random localization of G-quadruplex-forming sequences in bacterial genomes.Entities:
Keywords: G-quadruplex; G4Hunter; bacteria; bioinformatics; deinococcus
Mesh:
Substances:
Year: 2019 PMID: 31052562 PMCID: PMC6539912 DOI: 10.3390/molecules24091711
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1G-quadruplexes: (A) guanine tetrad stabilized by Hoogsten base pairing and positively charged central ion; (B) schematic drawing of intramolecular G4 structure arising from double stranded DNA; (C) G4Hunter, a new user-friendly web server for high throughput analyses of G4-forming sequences in DNA; and (D) 3D model of intramolecular antiparallel G4 formed from the sequence (5′-GGGGTGTGGGGTGT GGGGTGTGGGGTGT-3′) found in Microcystis aeruginosa built using 3D-NuS webserver [23].
Total number of PQS and their resulting frequencies per 1000 bp in all 1547 representative bacteria, grouped by G4Hunter score. Frequency was computed by using total number of PQS in each category divided by total length of all analyzed sequences and multiplied by 1000.
| Interval of G4Hunter Score | Number of PQS in Dataset | PQS Frequency per 1000 bp |
|---|---|---|
| 1.2–1.4 | 9,009,593 | 1.315033 |
| 1.4–1.6 | 180,395 | 0.025058 |
| 1.6–1.8 | 11,779 | 0.00155 |
| 1.8–2.0 | 511 | 0.000055 |
| 2.0–more | 86 | 0.000009 |
Figure 2Phylogenetic tree of inspected Bacterial Groups and Subgroups.
Genomic sequences sizes, PQS frequencies and total counts. Seq (total number of sequences), Median (median length of sequences), Short (shortest sequence), Long (longest sequence), GC % (average GC content), PQS (total number of predicted PQS), Mean f (mean frequency of predicted PQS per 1000 bp), Min f (lowest frequency of predicted PQS per 1000 bp), Max f (highest frequency of predicted PQS per 1000 bp). Colors correspond to phylogenetic tree depiction.
|
|
|
|
|
|
|
|
|
|
|
| Bacteria | 1627 | 3,307,820 | 83,026 | 13,033,779 | 50.6 | 9,202,364 | 1.342 | 0.013 | 14.213 |
|
|
|
|
|
|
|
|
|
|
|
| Spirochaetes | 38 | 2,646,038 | 277,655 | 4,653,970 | 39.7 | 87,109 | 0.809 | 0.079 | 6.668 |
| Thermotogae | 16 | 2,150,379 | 1,884,562 | 2,974,229 | 39.1 | 13,617 | 0.395 | 0.149 | 0.812 |
| PVC group | 28 | 2,917,407 | 1,041,170 | 9,629,675 | 50.7 | 198,358 | 1.646 | 0.388 | 4.802 |
| FCB group | 117 | 3,914,632 | 605,745 | 9,127,347 | 42.3 | 302,949 | 0.608 | 0.013 | 2.746 |
| Terrabacteria | 659 | 3,018,755 | 91,776 | 11,936,683 | 50.4 | 4,766,517 | 1.601 | 0.016 | 14.213 |
| Proteobacteria | 724 | 3,551,512 | 83,026 | 13,033,779 | 53.4 | 3,688,101 | 1.276 | 0.025 | 5.507 |
| Other | 45 | 2,157,835 | 1,012,010 | 6,237,577 | 44.3 | 145,713 | 1.103 | 0.062 | 5.855 |
|
|
|
|
|
|
|
|
|
|
|
| Spirochaetia | 38 | 2,646,038 | 277,655 | 4,653,970 | 39.7 | 87,109 | 0.809 | 0.079 | 6.668 |
| Thermotogae | 16 | 2,150,379 | 1,884,562 | 2,974,229 | 39.1 | 13,617 | 0.395 | 0.149 | 0.812 |
| Chlamydiae | 12 | 1,168,953 | 1,041,170 | 3,072,383 | 40.3 | 12,453 | 0.646 | 0.388 | 0.957 |
| Bacteroidetes/Chlorobi | 114 | 3,878,527 | 605,745 | 912,7347 | 41.9 | 282,516 | 0.585 | 0.013 | 2.746 |
| Cyanobacteria/Melainab. | 29 | 5,315,554 | 1,657,990 | 9,673,108 | 42.6 | 193,894 | 1.247 | 0.201 | 6.004 |
| Chloroflexi | 12 | 2,333,610 | 1,252,731 | 5,723,298 | 60 | 62,688 | 1.89 | 1.223 | 3.222 |
| Tenericutes | 52 | 981,001 | 564,395 | 1,877,792 | 28 | 6460 | 0.136 | 0.016 | 0.834 |
| Actinobacteria | 246 | 3,960,961 | 775,354 | 11,936,683 | 66.2 | 3,590,884 | 2.821 | 0.143 | 8.556 |
| Deinococcus-Thermus | 18 | 2,895,913 | 2,035,182 | 3,881,839 | 66.8 | 311,949 | 6.626 | 1.885 | 14.213 |
| Firmicutes | 298 | 2,835,823 | 91,776 | 8,739,048 | 40.8 | 579,740 | 0.56 | 0.064 | 6.587 |
| delta/epsilon subdiv. | 92 | 3,136,746 | 1,457,619 | 13,033,779 | 50 | 807,281 | 1.681 | 0.034 | 5.282 |
| Betaproteobacteria | 110 | 3,763,620 | 820,037 | 6,987,670 | 60.6 | 585,984 | 1.306 | 0.195 | 4.007 |
| Alphaproteobacteria | 213 | 3,424,964 | 83,026 | 9,207,384 | 61.5 | 126,134 | 1.764 | 0.051 | 5.507 |
| Gammaproteobacteria | 302 | 3,777,066 | 298,471 | 7,783,862 | 48.8 | 31,686 | 0.799 | 0.025 | 4.264 |
| other | 75 | 2,406,157 | 1,012,010 | 9,629,675 | 48.4 | 432,683 | 1.406 | 0.0616 | 5.855 |
Figure 3Frequencies of PQS in subgroups of the analyzed bacterial genomes. Data within boxes span the interquartile range and whiskers show the lowest and highest values within 1.5 interquartile range. Black diamonds denote outliers.
Figure 4Relationship between observed frequency of PQS per 1000 bp and GC content in all analyzed prokaryotic sequences in various G4 Hunter score intervals. In each G4Hunter score interval miniplot, frequencies were normalized according to the highest observed frequency of PQS. Organisms with max. frequency per 1000 bp greater than 50% are described and highlighted in color.
Figure 5Differences in PQS frequency by DNA locus. The chart shows PQS frequencies according to “gene” annotation and other annotated locations from the NCBI database. We analyzed the frequencies of all PQS within (inside), before (100 bp) and after (100 bp) annotated locations.