| Literature DB >> 35955617 |
Michaela Dobrovolná1,2, Natália Bohálová1,3, Vratislav Peška1, Jiawei Wang4, Yu Luo4,5, Martin Bartas6, Adriana Volná7, Jean-Louis Mergny1,4, Václav Brázda1,2.
Abstract
G-quadruplexes (G4s) have been long considered rare and physiologically unimportant in vitro curiosities, but recent methodological advances have proved their presence and functions in vivo. Moreover, in addition to their functional relevance in bacteria and animals, including humans, their importance has been recently demonstrated in evolutionarily distinct plant species. In this study, we analyzed the genome of Pisum sativum (garden pea, or the so-called green pea), a unique member of the Fabaceae family. Our results showed that this genome contained putative G4 sequences (PQSs). Interestingly, these PQSs were located nonrandomly in the nuclear genome. We also found PQSs in mitochondrial (mt) and chloroplast (cp) DNA, and we experimentally confirmed G4 formation for sequences found in these two organelles. The frequency of PQSs for nuclear DNA was 0.42 PQSs per thousand base pairs (kbp), in the same range as for cpDNA (0.53/kbp), but significantly lower than what was found for mitochondrial DNA (1.58/kbp). In the nuclear genome, PQSs were mainly associated with regulatory regions, including 5'UTRs, and upstream of the rRNA region. In contrast to genomic DNA, PQSs were located around RNA genes in cpDNA and mtDNA. Interestingly, PQSs were also associated with specific transposable elements such as TIR and LTR and around them, pointing to their role in their spreading in nuclear DNA. The nonrandom localization of PQSs uncovered their evolutionary and functional significance in the Pisum sativum genome.Entities:
Keywords: G-quadruplex; G4 propensity; chloroplast DNA; sequence prediction
Mesh:
Substances:
Year: 2022 PMID: 35955617 PMCID: PMC9369095 DOI: 10.3390/ijms23158482
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 6.208
Total number and frequencies of PQSs found in P. sativum genome grouped according to G4Hunter score (1.2–1.4 means any sequence with a score between 1.2 and 1.399; 1.4–1.6 between 1.4 and 1.599, etc.).
| G4Hunter Threshold | Number of PQSs | PQS Frequency (PQS/kbp) |
|---|---|---|
|
| ||
| 1.2–1.4 | 960,462 | 0.30 |
| 1.4–1.6 | 260,428 | 0.081 |
| 1.6–1.8 | 76,552 | 0.024 |
| 1.8–2.0 | 28,513 | 0.0088 |
| 2.0–more | 28,801 | 0.0089 |
|
| ||
| 1.2–1.4 | 377 | 1.04 |
| 1.4–1.6 | 117 | 0.32 |
| 1.6–1.8 | 47 | 0.13 |
| 1.8–2.0 | 16 | 0.044 |
| 2.0–more | 16 | 0.044 |
|
| ||
| 1.2–1.4 | 40 | 0.33 |
| 1.4–1.6 | 15 | 0.12 |
| 1.6–1.8 | 8 | 0.066 |
| 1.8–2.0 | 1 | 0.0082 |
| 2.0–more | 1 | 0.0082 |
Figure 1Comparison of G4Hunter score distribution across the different phylogenetic groups. Note the stronger counterselection against high-stability G4s in prokaryotes. P. sativum, with an initial slope closer to prokaryotes than to the two other eukaryotes studied here, exhibited an increase in PQS frequency with the highest analyzed G4Hunter score.
The overall number of PQSs found with a G4Hunter score of 1.2 or above; their frequencies per kbp; GC content; length of all PQSs (all base pairs with potential to form G4) divided by the total number of bp in the DNA (PQSs); and the number of PQSs per thousand GC for each chromosome, mtDNA, and cpDNA.
| DNA Sequence | Length (Mb) | Number of PQS | PQS Frequency (/kbp) | GC Content (%) | PQSs (%) | PQSs/GC% |
|---|---|---|---|---|---|---|
| Chr I | 372.17 | 160,922 | 0.432 | 31.07 | 1.31 | 1.392 |
| Chr II | 427.60 | 175,744 | 0.411 | 29.68 | 1.24 | 1.385 |
| Chr III | 437.56 | 181,878 | 0.416 | 29.72 | 1.26 | 1.399 |
| Chr IV | 446.35 | 184,737 | 0.414 | 29.90 | 1.25 | 1.384 |
| Chr V | 579.27 | 244,737 | 0.422 | 30.13 | 1.28 | 1.402 |
| Chr VI | 480.42 | 200,963 | 0.418 | 29.81 | 1.27 | 1.403 |
| Chr VII | 491.38 | 205,775 | 0.419 | 29.87 | 1.27 | 1.402 |
|
| 3234.74 | 1,354,756 | 0.419 | 30.02 | 1.27 | 1.395 |
| mtDNA | 0.36 | 573 |
| 45.07 | 4.81 | 3.494 |
| cpDNA | 0.12 | 65 | 0.533 | 34.78 | 1.65 | 1.531 |
Twelve sequences were analyzed using three different biophysical methods (IDS: isothermal difference spectra; CD: circular dichroism; FRET-MC, a competition fluorescence melting assay). G4Hunter score is indicated in the column labeled “G4H”. Concl. column indicates the conclusion reached based on these three methods. “+” stands for positive, meaning that the method indicated the sequence was forming a G4.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
| ||||||
| 40ps1 | TGGGCGTCTGGGGTTGGTTTAAGGAAAAATCGGGGTCGGA | 1.25 | + | + | + | G4 |
| 28ps2 | AGGGATCAAGAAACGGATAGGGAGGGGA | 1.32 | ? | + | - | G4? |
| 37ps3 | AGGGAGGACCGGGGGCCAGAGCAAGTTGGGTTGGGGT | 1.41 | + | + | + | G4 |
| 44ps4 | TGGGGCGAGGGTCTTTCATTAAAGGGGGGAAAAGAGGGGTGGGT | 1.66 | + | + | + | G4 |
| 28ps5 | CGGGGGCGGGTTCTGAGCAGGATGGGGA | 1.68 | + | + | + | G4 |
| 31ps6 | AGGAAGCGGGGGGAGGAACACAGGGGAAGGA | 1.61 | + | + | + | G4 |
|
| ||||||
| 28ps16 | TGGAAGGGGTCAATAAGGGGTTGGGGGA | 1.96 | + | + | + | G4 |
| 32ps17 | CGGGGGGTAGATTGGGGCGTGGACATAAGGGT | 1.62 | + | + | + | G4 |
| 25ps18 | TGGGATCCGGGCGGTCCAGGGGGGA | 1.48 | + | ? | + | G4 |
| 24ps23 | AGGGGTGGGGACAGAGGTTTTGGT | 1.67 | + | + | + | G4 |
| 21ps26 | TGGGGGTGGTGAAGGGAGGGC | 2.00 | + | + | + | G4 |
| 24ps27 | CGGGGTGGAGACGATGGGGTCGGT | 1.62 | + | ? | + | G4 |
Figure 2Experimental evidence for G4 formation. (A,B) Isothermal difference spectra (IDS); (C,D) circular dichroism spectra; (E,F) FRET-MC results for the mitochondrial (left) and chloroplast (right) sequences. In panels E and F, ss and ds correspond to single- and double-stranded negative controls, while cmyc and pu24t are G4-forming positive controls. F21T corresponds to the delta Tm observed in the absence of any competitor (S = 1). The red dotted line corresponds to the threshold under which a sequence was considered to form a quadruplex [15,28].
Figure 3Differences in PQS frequency according to DNA locus. The chart shows PQS frequencies normalized per 1000 bp annotated locations from the NCBI database. We analyzed the frequencies of all PQSs within (inside), before (100 bp), and after (100 bp) annotated locations in (A) genomic DNA, (B) mtDNA, and (C) cpDNA. Dashed lines denote the average PQS frequency in corresponding DNA. Statistical significance of annotated locations in genomic DNA was related to the average chromosomal PQS frequencies according to a Kruskal–Wallis test, followed by Dunn’s pairwise comparison with Bonferroni correction of the p-value. Asterisks denote statistical significance: * p-value < 0.05; ** p-value < 0.01.
Figure 4Differences in PQS frequency by repeat region. The chart shows PQS frequencies normalized per 1000 bp of annotated transposons. We analyzed the frequencies of all PQSs within (inside), before (100 bp), and after (100 bp) annotated transposons in genomic DNA. The dashed line denotes the average PQS frequency in transposons. Statistical significance is shown as in Figure 3. p < 0.05, * p < 0.01, ** p < 0.001.