| Literature DB >> 33810462 |
Natália Bohálová1,2, Alessio Cantara1,2, Martin Bartas3, Patrik Kaura4, Jiří Šťastný4,5, Petr Pečinka3, Miroslav Fojta1, Václav Brázda1.
Abstract
The importance of gene expression regulation in viruses based upon G-quadruplex may point to its potential utilization in therapeutic targeting. Here, we present analyses as to the occurrence of putative G-quadruplex-forming sequences (PQS) in all reference viral dsDNA genomes and evaluate their dependence on PQS occurrence in host organisms using the G4Hunter tool. PQS frequencies differ across host taxa without regard to GC content. The overlay of PQS with annotated regions reveals the localization of PQS in specific regions. While abundance in some, such as repeat regions, is shared by all groups, others are unique. There is abundance within introns of Eukaryota-infecting viruses, but depletion of PQS in introns of bacteria-infecting viruses. We reveal a significant positive correlation between PQS frequencies in dsDNA viruses and corresponding hosts from archaea, bacteria, and eukaryotes. A strong relationship between PQS in a virus and its host indicates their close coevolution and evolutionarily reciprocal mimicking of genome organization.Entities:
Keywords: G-quadruplex; G4Hunter; bioinformatics; coevolution; dsDNA; host; virus
Year: 2021 PMID: 33810462 PMCID: PMC8036883 DOI: 10.3390/ijms22073433
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Full set of viral genomes divided according to host. The number of accessible unique genomes for each domain and group is shown in brackets.
Total number of putative G-quadruplex-forming sequences (PQS) and their resulting frequencies per 1000 nt in all 2903 viral genomes and host categories, grouped by G4Hunter score. Frequencies were calculated as the total number of PQS in each category divided by the total length of all analyzed sequences, multiplied by 1000 and normalized by the number of viruses infecting one genus.
| G4Hunter Score | PQS Frequency per 1000 nt | |||
|---|---|---|---|---|
| All | Archaea | Bacteria | Eukaryota | |
| 1.2–1.4 | 1.27 | 1.74 | 0.88 | 1.46 |
| 1.4–1.6 | 0.039 | 0.025 | 0.026 | 0.047 |
| 1.6–1.8 | 0.0042 | 0 | 0.00088 | 0.0062 |
| 1.8–2.0 | 0.00025 | 0 | 0.000041 | 0.00038 |
| 2.0 and more | 0.00021 | 0 | 0.000050 | 0.00031 |
Distribution of PQS frequencies in viruses according to host organisms. Genomic length, PQS frequencies, and total counts. Seq (total number of sequences), Median (median length of sequences), GC% (average GC content), PQS (total number of predicted PQS), Mean f (mean frequency of predicted PQS per 1000 nt normalized by the number of viruses infecting one genus), Min f (the lowest frequency of predicted PQS per 1000 nt), Max f (the highest frequency of predicted PQS per 1000 nt), and Cov (% of genome covered by PQS).
|
|
|
|
|
|
|
|
|
|
| All | 3134 | 44,746.5 | 44.94 | 220,569 | 1.32 | 0 | 11.51 | 3.34 |
|
|
|
|
|
|
|
|
|
|
| Archaea | 81 | 33,356 | 48.92 | 3137 | 1.76 | 0 | 4.80 | 4.32 |
| Bacteria | 2087 | 49,639 | 48.10 | 112,664 | 0.89 | 0 | 11.51 | 2.11 |
| Eukaryota | 966 | 7951.5 | 43.09 | 104,768 | 1.52 | 0 | 11.44 | 3.93 |
|
|
|
|
|
|
|
|
|
|
| Crenarchaeota | 54 | 32,047.5 | 40.91 | 1012 | 1.85 | 0 | 4.80 | 4.76 |
| Euryarchaeota | 27 | 49,107 | 54.92 | 2125 | 1.69 | 0.28 | 3.75 | 3.99 |
| Actinobacteria | 524 | 53,403.5 | 60.90 | 61,313 | 2.27 | 0.33 | 7.02 | 5.12 |
| Bacteroidetes | 32 | 47,060 | 38.12 | 477 | 0.41 | 0.03 | 1.14 | 1.01 |
| Cyanobacteria | 89 | 174,079 | 43.33 | 3875 | 0.82 | 0.06 | 3.88 | 2.10 |
|
| 6 | 61,150 | 50.26 | 726 | 4.21 | 0.33 | 11.51 | 10.45 |
| Firmicutes | 527 | 41,843 | 38.14 | 7886 | 0.32 | 0 | 1.39 | 0.78 |
| Proteobacteria | 904 | 49,035 | 50.07 | 38,334 | 0.80 | 0 | 4.55 | 1.90 |
| Amoebozoa | 22 | 495,022 | 42.47 | 21,931 | 0.66 | 0 | 1.89 | 1.60 |
| Arthropoda | 345 | 7276 | 38.77 | 4957 | 0.30 | 0 | 1.92 | 0.73 |
| Chordata | 561 | 7852 | 45.48 | 72,420 | 2.18 | 0 | 11.44 | 5.65 |
| Viridiplantae | 21 | 193,301 | 46.91 | 3542 | 1.06 | 0 | 2.01 | 2.54 |
|
|
|
|
|
|
|
|
|
|
| Humans | 120 | 7344 | 42.55 | 15,996 | 1.75 | 0 | 11.44 | 4.48 |
The colors correspond to phylogenetic tree depiction in Figure 1 (Grey—Archaea, Blue—Bacteria, Green—Eukaryota as host organisms).
Figure 2Frequencies of PQS in host groups of the analyzed viral genomes. Data within boxes span the interquartile range and whiskers show the lowest and highest values within the 1.5 interquartile range. Black diamonds denote outliers. The colors correspond to phylogenetic tree depiction.
Figure 3Cluster dendrogram based on PQS characteristics in all viral species by their host. Input data are listed in Supplementary Materials S4. Statistically significant clusters (based upon approximately unbiased p-values above 95, equivalent to p-values lower than 0.05) are highlighted by rectangles drawn with broken red lines. The colors correspond to phylogenetic tree depiction.
Figure 4The ratio of PQS frequencies per 1000 nt between gene annotation and other annotated locations from the NCBI database. PQS frequencies within (inside), before (100 nt), and after (100 nt) annotated locations were analyzed. Detailed results are summarized in Supplementary Materials S5.
Figure 5Relationships between virus and various hosts as measured by observed PQS frequency per 1000 nt and PQS frequency per 1000 GC. (A) All host-virus pairs, PQS frequencies; (B) All host–virus pairs, PQS per 1000 GC; (C) Archaea–virus pairs, PQS frequencies; (D) Archaea–virus pairs, PQS per 1000 GC; (E) Bacteria–virus pairs, PQS frequencies; (F) Bacteria–virus pairs, PQS per 1000 GC; (G) Eukaryota–virus pairs, PQS frequencies; (H) Eukaryota–virus pairs, PQS per 1000 GC of the archaea–virus pairs.