| Literature DB >> 32967357 |
Václav Brázda1, Yu Luo2, Martin Bartas3, Patrik Kaura4, Otilia Porubiaková1,5, Jiří Šťastný4,6, Petr Pečinka3, Daniela Verga2, Violette Da Cunha7, Tomio S Takahashi7, Patrick Forterre7, Hannu Myllykallio8, Miroslav Fojta1, Jean-Louis Mergny1,8.
Abstract
The importance of unusual DNA structures in the regulation of basic cellular processes is an emerging field of research. Amongst local non-B DNA structures, G-quadruplexes (G4s) have gained in popularity during the last decade, and their presence and functional relevance at the DNA and RNA level has been demonstrated in a number of viral, bacterial, and eukaryotic genomes, including humans. Here, we performed the first systematic search of G4-forming sequences in all archaeal genomes available in the NCBI database. In this article, we investigate the presence and locations of G-quadruplex forming sequences using the G4Hunter algorithm. G-quadruplex-prone sequences were identified in all archaeal species, with highly significant differences in frequency, from 0.037 to 15.31 potential quadruplex sequences per kb. While G4 forming sequences were extremely abundant in Hadesarchaea archeon (strikingly, more than 50% of the Hadesarchaea archaeon isolate WYZ-LMO6 genome is a potential part of a G4-motif), they were very rare in the Parvarchaeota phylum. The presence of G-quadruplex forming sequences does not follow a random distribution with an over-representation in non-coding RNA, suggesting possible roles for ncRNA regulation. These data illustrate the unique and non-random localization of G-quadruplexes in Archaea.Entities:
Keywords: Archaea; G4-forming motif; genome analysis; sequence prediction; unusual nucleic acid structures
Year: 2020 PMID: 32967357 PMCID: PMC7565180 DOI: 10.3390/biom10091349
Source DB: PubMed Journal: Biomolecules ISSN: 2218-273X
Figure 1A schematic phylogenic tree for Archaea. This unrooted evolutionary tree of Archaea is based on the schematic tree of Forterre (2015) [17] updated according to recent phylogenetic analyses [9,18]. BAT stands for Bathyarchaeota, Aigarchaeota, and Thaumarchaeota. DPANN is an acronym based on the first five groups discovered: Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, and Nanohaloarchaeota. The term BAT superphylum has been proposed by Gaia et al. in 2018 [19], and the terms Eury and Cren superphyla are suggested here. The terms Cren superphylum is suggested here because the phyla Crenarchaeota, Verstratearchaeota Marsarchaeota, Nezaarchaeota, and Geothermarchaeota form a consensus monophyletic clade in all archaeal phylogeny. We included Korarchaeota in this superphylum because they often branch as sister groups of the above phyla in archaeal phylogenies, although the fast evolutionary rate made their positioning sometimes difficult. We suggested in parallel the term Eury superphylum because Euryarchaeota includes very diverse groups of cultivated and uncultivated Archaea which are difficult to the group in a single phylum, especially considering that phyla, such as Verstratearchaeota Marsarchaeota, or Nezaarchaeota only contain few uncultivated species only defined by a few metagenome associated genomes (MAGs). Names in bold letters correspond to subgroups that include cultivated species; names in thin letters correspond to subgroups that include only MAGs.
Figure 2A G-quartet involves four coplanar guanines establishing a cyclic array of H-bonds (left). Stacking of two or more (three in this example) quartets leads to the formation of a G-quadruplex structure (right), stabilized by cations, such as potassium (not shown).
A number of putative quadruplex sequences (PQS) were found using four different window sizes in three complete archaeal genomes.
| Archaea (GC %) | Number of G4 Sequences Found for a Window of: | |||
|---|---|---|---|---|
| 25 nt | 30 nt | 50 nt | 100 nt | |
| 558 | 171 | 3 | 0 | |
| 6019 | 3197 | 324 | 5 | |
| 4738 | 2313 | 262 | 4 | |
Figure 3Examples of sequences with different G-quadruplexes (G4) Hunter scores (G4HS) and distribution of PQS according to threshold category. (A) Examples of archaea 25-nt long sequences (corresponding to the window size chosen for the analysis) for which G4Hunter scores are provided within parentheses. Isolated guanines are shown in red, all other guanines in bold red characters. Longer archaea motifs with high G4H scores are provided in Table 3. (B) Distribution of G4-prone motifs according to the G4Hunter score. 1.2 means any sequence with a score between 1.2 and 1.399; 1.4 between 1.4 and 1.599, etc. These numbers are normalized by the total number of PQS found in bacteria, archaea, and compared with Homo sapiens. The first category represents 97.9% and 97.2% of all PQS sequences in bacteria and archaea, respectively. Note the log scale on the Y-axis.
Number of PQS found and their frequencies per 1000 bp in all 3387 archaeal genomes, grouped by G4Hunter score (1.2-1.4 means any sequence with a score between 1.2 and 1.399; 1.4 between 1.4 and 1.599, etc.).
| G4HS | Number of | Fraction of | PQS Frequency |
|---|---|---|---|
| 1.2–1.4 | 4,344,917 | 0.9718 | 1.19 |
| 1.4–1.6 | 119,233 | 0.0267 | 1.8 × 10−2 |
| 1.6–1.8 | 6357 | 0.00142 | 9.9 × 10−4 |
| 1.8–2.0 | 174 | 0.0000389 | 2.5 × 10−5 |
| >2.0 | 132 | 0.0000295 | 2.2 × 10−5 |
| Total | 4,470,813 | 1 |
Genomic sequences sizes, GC%, total count of PQS, and mean frequencies of quadruplex motifs. Seq (total number of sequences), Median (median length of sequences), Short. (shortest sequence), Long. (longest sequence), GC % (average GC content), PQS (total number of predicted PQS), Mean f (mean frequency of predicted PQS per 1000 bp), Min f (lowest frequency of predicted PQS per 1000 bp), Max f (highest frequency of predicted PQS per 1000 bp). %PQS corresponds to the probability that any given nucleotide in the group or subgroup belongs to a G4-prone region (G4H > 1.2). Colors correspond to phylogenetic tree depiction.
| Kingdom | Seq. | Median | Short | Long | GC % | PQS | Mean f | Min f | Max f | % PQS |
|---|---|---|---|---|---|---|---|---|---|---|
| Archeae | 3387 | 1,686,930 | 100,212 | 13,399,915 | 46.51 | 7,927,775 | 1.21 | 0.04 | 15.31 | 3.58 |
|
|
|
|
|
|
|
|
|
|
|
|
| BAT | 320 | 1,180,629 | 164,795 | 3,506,105 | 43.07 | 421,678 | 1.16 | 0.05 | 8.42 | 3.49 |
| Cren | 379 | 1,808,184 | 210,860 | 6,451,204 | 43.05 | 1,009,660 | 1.56 | 0.09 | 9.44 | 4.75 |
| Asgard | 71 | 2,322,715 | 291,515 | 5,684,038 | 38.75 | 74,647 | 0.47 | 0.12 | 1.50 | 1.39 |
| DPANN | 309 | 832,169 | 100,212 | 6,604,953 | 39.22 | 219,058 | 0.70 | 0.08 | 4.20 | 2.18 |
| Eury | 2308 | 1,826,841 | 137,797 | 13,399,915 | 48.77 | 6,202,732 | 1.25 | 0.04 | 15.31 | 3.68 |
|
|
|
|
|
|
|
|
|
|
|
|
| Bathyarchaeota | 128 | 1,208,976.5 | 200,493 | 3,506,105 | 46.29 | 245,162 | 1.54 | 0.23 | 8.42 | 3.00 |
| Thaumarchaeota | 192 | 1,173,909.5 | 164,795 | 3,441,569 | 40.93 | 176,516 | 0.91 | 0.05 | 5.32 | 2.73 |
| Thermoproteales | 147 | 1,581,744 | 242,587 | 3,969,448 | 45.86 | 513,053 | 2.07 | 0.11 | 7.38 | 6.31 |
| Sulfolobales | 118 | 2,223,757.5 | 210,860 | 3,034,024 | 38.20 | 200,842 | 0.79 | 0.34 | 4.58 | 2.38 |
| Desulfurococcales | 29 | 1,580,347 | 807,477 | 2,148,448 | 46.99 | 99,211 | 2.29 | 0.40 | 6.37 | 6.95 |
| Verstraetearchaeota | 18 | 1,171,913.5 | 419,172 | 1,937,662 | 46.76 | 40,586 | 1.83 | 0.10 | 3.43 | 5.50 |
| Marsarchaeota | 15 | 1,915,630 | 351,358 | 3,731,392 | 46.72 | 52,853 | 1.64 | 0.47 | 2.94 | 5.01 |
| Geothermarchaeota | 6 | 1,183,145.5 | 803,797 | 1,671,866 | 42.72 | 16,582 | 2.15 | 0.96 | 7.03 | 6.65 |
| Nezhaarchaeota | 2 | 1,332,140.5 | 1,315,707 | 1,348,574 | 43.53 | 2016 | 0.76 | 0.75 | 0.77 | 2.27 |
| Korarchaeota | 18 | 1,542,873 | 834,209 | 2,942,065 | 48.39 | 68,434 | 2.63 | 1.05 | 9.44 | 7.95 |
| Unclassified Crenarchaeota | 27 | 1,203,892 | 301,027 | 6,451,204 | 37.01 | 19,361 | 0.44 | 0.09 | 1.49 | 1.29 |
| Lokiarchaeota | 29 | 1,892,624 | 320,847 | 5,143,417 | 32.77 | 25,479 | 0.41 | 0.21 | 1.50 | 1.24 |
| Odinarchaeota | 1 | 1,460,710 | 1,460,710 | 1,460,710 | 38.05 | 1038 | 0.71 | 0.71 | 0.71 | 2.16 |
| Thorarchaeota | 29 | 2,770,204 | 291,515 | 4,389,059 | 46.55 | 40,006 | 0.60 | 0.24 | 1.18 | 1.76 |
| Heimdallarchaeota | 12 | 2,167,091 | 432,340 | 5,684,038 | 34.42 | 8124 | 0.27 | 0.12 | 0.50 | 0.82 |
| Aenigmarchaeota | 35 | 751,672 | 248,182 | 1,410,470 | 39.33 | 17,990 | 0.71 | 0.11 | 3.78 | 2.12 |
| Nanohaloarchaeota | 17 | 815,638 | 565,289 | 1,480,846 | 44.53 | 8672 | 0.48 | 0.09 | 1.82 | 1.50 |
| Woesearchaeota | 72 | 966,794.5 | 518,295 | 2,944,567 | 40.77 | 57,833 | 0.66 | 0.08 | 3.92 | 1.96 |
| Pacearchaeota | 60 | 719,507 | 279,432 | 6,604,953 | 33.74 | 37,675 | 0.56 | 0.08 | 2.99 | 1.73 |
| Nanoarchaeota | 25 | 577,110 | 204,081 | 1,162,239 | 32.83 | 9940 | 0.59 | 0.13 | 4.20 | 1.70 |
| Micrarchaeota | 39 | 887,931 | 658,716 | 1,333,875 | 50.41 | 42,298 | 1.17 | 0.15 | 2.86 | 3.47 |
| Diapherotrites | 19 | 568,419 | 302,064 | 1,130,899 | 37.42 | 6077 | 0.49 | 0.11 | 2.33 | 1.46 |
| Unclassified DPANN | 40 | 858,043.5 | 100,212 | 3,188,023 | 35.57 | 33,846 | 0.67 | 0.15 | 2.39 | 2.04 |
| Hadesarchaeota | 12 | 857,575 | 451,393 | 1,241,441 | 53.77 | 56,369 | 4.61 | 1.26 | 15.31 | 14.55 |
| Persephonarchaeota | 33 | 637,942 | 137,797 | 1,412,535 | 44.06 | 34,905 | 1.49 | 0.59 | 2.36 | 4.49 |
| Thermococcales | 60 | 1,867,904.5 | 207,909 | 2,388,527 | 46.77 | 191,492 | 1.72 | 0.47 | 7.53 | 5.15 |
| Theinoarchaeota | 2 | 4,165,806 | 3,559,548 | 4,772,064 | 41.57 | 5480 | 0.66 | 0.65 | 0.67 | 1.94 |
| Methanofastidiosa | 96 | 992,372 | 156,656 | 13,399,915 | 40.71 | 141,192 | 0.83 | 0.08 | 3.64 | 2.54 |
| Methanococcales | 24 | 1,717,483 | 1,207,361 | 1,936,387 | 32.01 | 15,065 | 0.39 | 0.20 | 0.86 | 1.19 |
| Methanobacteriales | 224 | 2,001,036 | 1,157,521 | 3,466,370 | 33.62 | 175,191 | 0.39 | 0.04 | 2.32 | 1.14 |
| Methanopyrales | 3 | 1,430,309 | 1,421,621 | 1,694,969 | 58.94 | 10,798 | 2.34 | 1.97 | 3.00 | 6.84 |
| Methanomassilicoccales | 91 | 1,404,109 | 640,223 | 2,641,216 | 56.22 | 257,340 | 1.85 | 0.22 | 4.41 | 5.38 |
| Thermoplasmatales | 135 | 1,621,237 | 593,453 | 2,816,557 | 42.71 | 246,832 | 1.13 | 0.11 | 7.03 | 3.42 |
| Acidoprofondum/DHV2-2 | 11 | 1,731,076 | 519,420 | 2,981,805 | 40.55 | 16,609 | 1.21 | 0.29 | 4.12 | 3.59 |
| Archaeoglobales | 53 | 1,901,943 | 478,535 | 3,408,041 | 42.98 | 117,470 | 1.22 | 0.57 | 3.29 | 3.66 |
| Methanosarcinales | 279 | 2,913,215 | 208,261 | 5,751,492 | 44.99 | 845,394 | 1.19 | 0.15 | 7.52 | 3.54 |
| Methanomicrobiales | 146 | 2,228,967.5 | 622,799 | 3,978,804 | 54.97 | 783,172 | 2.38 | 0.23 | 7.20 | 7.07 |
| Methanocellales | 5 | 2,957,635 | 1,465,272 | 3,243,770 | 50.96 | 16,825 | 1.21 | 0.41 | 1.88 | 3.51 |
| Halobacteriales | 440 | 3,585,981 | 397,623 | 5,605,381 | 63.95 | 2,271,600 | 1.56 | 0.08 | 4.25 | 4.50 |
| Unclassified Diaforarchaea | 97 | 1,460,542 | 233,168 | 2,294,894 | 47.38 | 136,115 | 1.03 | 0.18 | 2.55 | 3.02 |
| Unclassified other | 597 | 1,400,198 | 258,312 | 7,416,915 | 46.88 | 862,962 | 1.02 | 0.07 | 5.16 | 3.00 |
Figure 4Frequencies of PQS in subgroups of analyzed archaeal genomes. Data within boxes span the interquartile range, and whiskers show the lowest and highest values within 1.5 interquartile range. Black points denote outliers. Horizontal black lines inside boxplots are median values.
Figure 5Cluster dendrogram of PQS characteristics of archaeal subgroups. Cluster dendrogram of PQS characteristics (Supplementary Table S4) was made in R v. 3.6.3 (code provided in Supplementary Table S4) using pvclust package with these parameters: Cluster method ‘ward.D2′, distance ‘euclidean’, number of bootstrap resamplings was 10,000. AU values are in blue and indicate the statistical significance of particular branching (values above 95 are equivalent to p-values lesser than 0.05). Statistically significant clusters are highlighted by red dashed rectangles.
Figure 6Relationship between the observed frequency of PQS per 1000 bp and GC content. Different G4Hunter score intervals are considered. In each G4Hunter score interval miniplot, frequencies were normalized according to the highest observed frequency of PQS. Organisms with max. frequency per 1000 bp greater than 50% are described and highlighted in color.
Long G4-prone motifs with high G4HS found in Hadesarchea archeon.
| Name | Sequences (5′ to 3′) | G4 Hunter Score | IDS | CD |
|---|---|---|---|---|
| 038_K |
| 2.07 | G4 | Parallel |
| 086_K |
| 2.57 | G4 | Parallel |
| 174_K |
| 2.07 | G4 | Parallel |
| 175_K |
| 2.54 | G4 | Parallel |
| 176_K |
| 1.93 | G4 | Parallel |
| 178_K |
| 2.89 | G4 | Parallel |
| 195_K |
| 1.91 | G4 | Parallel |
| 196_K |
| 2.22 | G4 | Parallel |
| 245_K |
| 2.33 | G4 | Parallel |
| 640_K |
| 2.38 | G4 | Parallel |
| 642_K |
| 2.93 | G4 | Hybrid* |
| 643_K |
| 2.07 | G4 | Parallel |
| 644_K |
| 2.41 | G4 | Parallel |
| 645_K |
| 1.74 | G4 | Parallel |
* Sequence 642_K adopts a hybrid structure at room temperature, which is converted to a parallel conformation at high temperatures.
Figure 7Relationship between GC percentage and % of PQS in genomes of particular archaeal subgroups. The Fitted equation with the R2 coefficient is depicted on the top side of the plot.
Figure 8Differences in PQS frequency by DNA locus. The chart shows PQS frequencies normalized per 1000 bp annotated locations from the NCBI database and shows a comparison between Archaea and Bacteria. Archaea G4-prone motifs are strongly over-represented in ncRNA and rRNA compared to the average G4 density in Archaea (mean f = 1.207), but also compared to bacteria. PQS count is provided in Supplementary Table S3 Excel file.
Figure 9Experimental evidence for quadruplex formation with archaea sequences. Isothermal differential absorbance (IDS; panel A) and circular dichroism (CD; panels B and C) spectra of Hadesarchaea archeon DNA sequences were recorded at 20 °C (panels A and B) or at a high temperature (80 °C) for CD (panel C).
Detailed characteristics of archaeal species with PQS frequency per 1000 bp greater than 6.00. Living environments data were obtained from the BioSample NCBI database [83].
| Organism Name | GC Content | PQS f | % PQS | Living Environment |
|---|---|---|---|---|
| 65.01 | 15.310 | 51.15 | Hot springs sediment, Yellowstone NP, USA | |
| 56.17 | 9.685 | 31.10 | Hot springs sediment, Jinze hot spring, China | |
| 56.04 | 9.581 | 30.69 | Hot springs sediment, Jinze hot spring, China | |
| 65.01 | 9.445 | 28.80 | Deep-sea hydrothermal vent sediments, Guaymas Basin, Gulf of California, Mexico | |
| 61.78 | 8.418 | 26.12 | Deep-sea hydrothermal vent sediments, Guaymas Basin, Gulf of California, Mexico | |
| 58.42 | 7.858 | 24.55 | Deep-sea hydrothermal vent sediments, Guaymas Basin, Gulf of California, Mexico | |
| 57.21 | 7.534 | 24.52 | Solfataric marine water hole on a beach of Vulcano, Italy | |
| 62.01 | 7.518 | 23.12 | Waste water, Suncor tailings pond 6, Canada | |
| 57.67 | 7.397 | 22.90 | Deep-sea hydrothermal vent sediments, Guaymas Basin, Gulf of California, Mexico | |
| 53.14 | 7.381 | 25.59 | Mud from a spring pool, Noji-onsen, Fukushima, Japan | |
| 62.36 | 7.198 | 22.90 | Paddy field soil, Chikugo, Fukuoka, Japan | |
| 61.14 | 7.089 | 21.80 | Wastewater, North Alberta, Canada | |
| 60.54 | 7.032 | 22.01 | Deep-sea hydrothermal vent sediments, Guaymas Basin, Gulf of California, Mexico | |
| 61.82 | 7.028 | 22.57 | Hypersaline soda lake sediment, Kulunda Steppe, Russia | |
| 60.8 | 6.738 | 20.67 | Anaerobic digester metagenome, Australia | |
| 60.6 | 6.721 | 20.66 | isolated from an upflow anaerobic sludge blanket reactor treating beer-manufacture wastewater in Beijing, China. | |
| 54.25 | 6.673 | 21.15 | Deep-sea hydrothermal vent sediments, Guaymas Basin, Gulf of California, Mexico | |
| 56.73 | 6.370 | 19.72 | Deep-sea hydrothermal vent chimney, the Suiyo Seamount in the Izu-Bonin Arc, Japan | |
| 61.92 | 6.332 | 19.03 | Deep-sea hydrothermal vent sediments, Guaymas Basin, Gulf of California, Mexico | |
| 53.83 | 6.327 | 20.11 | Deep-sea hydrothermal vent sediments, Guaymas Basin, Gulf of California, Mexico | |
| 53.66 | 6.240 | 19.72 | Deep-sea hydrothermal vent sediments, Guaymas Basin, Gulf of California, Mexico | |
| 59.91 | 6.233 | 19.52 | isolated from a hot spring in Iceland | |
| 52.98 | 6.164 | 19.65 | Deep-sea hydrothermal vent sediments, Guaymas Basin, Gulf of California, Mexico |