| Literature DB >> 20860998 |
Melissa A Mullen1, Kalee J Olson, Paul Dallaire, François Major, Sarah M Assmann, Philip C Bevilacqua.
Abstract
Tandem stretches of guanines can associate in hydrogen-bonded arrays to form G-quadruplexes, which are stabilized by K(+) ions. Using computational methods, we searched for G-Quadruplex Sequence (GQS) patterns in the model plant species Arabidopsis thaliana. We found ∼ 1200 GQS with a G(3) repeat sequence motif, most of which are located in the intergenic region. Using a Markov modeled genome, we determined that GQS are significantly underrepresented in the genome. Additionally, we found ∼ 43,000 GQS with a G(2) repeat sequence motif; notably, 80% of these were located in genic regions, suggesting that these sequences may fold at the RNA level. Gene Ontology functional analysis revealed that GQS are overrepresented in genes encoding proteins of certain functional categories, including enzyme activity. Conversely, GQS are underrepresented in other categories of genes, notably those for non-coding RNAs such as tRNAs and rRNAs. We also find that genes that are differentially regulated by drought are significantly more likely to contain a GQS. CD-detected K(+) titrations performed on representative RNAs verified formation of quadruplexes at physiological K(+) concentrations. Overall, this study indicates that GQS are present at unique locations in Arabidopsis and that folding of RNA GQS may play important roles in regulating gene expression.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20860998 PMCID: PMC3001093 DOI: 10.1093/nar/gkq804
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.G-quartet and G-quadruplex structures and topologies. (a) G-quartet structure, showing Hoogsteen-to-Watson-Crick face hydrogen bonds and the central dehydrated monovalent ion integral to formation and stabilization. Unimolecular (b) parallel and (c) antiparallel G-quadruplex topologies. Adapted from (1,6,10,11). Dark lines follow the nucleic acid strand, arrowheads denote strand directionality, and gray boxes denote quadruplexes. The examples drawn here are for sequences having three quartets.
Distribution of GQS motifs in the Arabidopsis genome
| GQS Motif | Genomea | Intergenicb | Genicc | Codingd | Genic: intergenic |
|---|---|---|---|---|---|
| G3+ L1–7 | 1187 | 827 (70%) | 360 (30%) | 263 (22%) | 0.4 |
| G3+ L1–3 | 237 | 163 (69%) | 74 (31%) | 41 (17%) | 0.4 |
| G2+ L1–4 | 43 117 | 8561 (20%) | 34 556 (80%) | 30 555 (71%) | 4.0 |
| G2+ L1–2 | 12 340 | 1824 (15%) | 10 516 (85%) | 9415 (76%) | 5.8 |
| G2+ L1 | 8188 | 901 (11%) | 7287 (89%) | 6633 (81%) | 8.1 |
Numbers and percentages of GQS in different regions of the Arabidopsis genome. aGenome is comprised of bintergenic and cgenic, while dcoding is a subset of cgenic and includes all gene models. Quadparser search parameters included G and C patterns to account for both sense and antisense strands.
Density (D)a and Enrichment (E)b of GQS in various Arabidopsis genomic regions
| GQS Motif | Genome | Intergenic | Genic | Coding | |||
|---|---|---|---|---|---|---|---|
| D (GQS/Mb) | D (GQS/Mb) | E | D (GQS/Mb) | E | D (GQS/Mb) | E | |
| G3+L1–7 | 9.3 | 16.7 | 1.8c | 4.6 | 0.5c | 6.5 | 0.7 |
| G3+L1–3 | 1.9 | 3.3 | 1.8 | 1.0 | 0.5 | 1.0 | 0.5 |
| G2+L1–4 | 339.4 | 172.9 | 0.5 | 445.8 | 1.3 | 752.6 | 2.2 |
| G2+L1–2 | 97.1 | 36.8 | 0.4 | 135.7 | 1.4 | 231.9 | 2.4 |
| G2+L1 | 64.4 | 18.2 | 0.3 | 94.0 | 1.5 | 163.4 | 2.5 |
Provided are density and enrichment of GQS in different regions of the Arabidopsis genome from all gene models. Genome, intergenic, genic and coding are defined in Table 1.
aGQS density is defined as the total number of GQS per Megabase in the specified region; number of Megabases per region: whole genome 124.7 Mb, intergenic region 50.07 Mb, genic region 74.65 Mb, and coding sequence 39.59 Mb.
bEnrichment values are calculated as the GQS density of a region divided by the GQS density of the genome.
cThese calculations are with (G3T3A)3G3 sequences (see text). As with Table 1, Quadparser search parameters included G and C patterns to account for both sense and antisense strands.
Figure 2.Density of GQS in different organisms. Density of GQS in each genome is represented by a bar and given in GQS/Mb. GQS here is defined as G3L1–7. GQS density is provided for each organism at the right-hand end of the bar. All numbers are from this study except H. sapiens which is from Huppert et al. (21) and Todd et al. (22). Common names: Homo sapiens (Human), Drosophila melanogaster (fruitfly), Mus musculus (mouse), Arabidopsis thaliana (mouse ear cress), Arabidopsis lyrata (lyrate rock cress), Manihot esculenta (cassava), Lotus japonicus (Lotus japonicus), Glycine max (soybean), Oryza sativa indica (rice—indica), Zea mays (corn) and Physcomitrella patens (Physcomitrella patens, a moss).
Number of patterns in Arabidopsis and Markov simulated genome
| Window | X3 L1–7 | X3 L1–7 | X2 L1–4 | X2 L1–4 | |
|---|---|---|---|---|---|
| Real | 1232 | 147 266 | 43 117 | 746 324 | |
| Markov | 50 | 5838 | 195 259 | 63 307 | 816 536 |
| Markov | 75 | 3776 | 165 195 | 50 827 | 771 304 |
| Markov | 150 | 1977 | 125 727 | 36 554 | 708 312 |
| Markov | 200 | 1509 | 113 870 | 32 461 | 687 647 |
| Markov | 400 | 847 | 91 457 | 25 208 | 646 856 |
| Markov | 1000 | 421 | 73 006 | 19 076 | 608 637 |
| Markov | 2000 | 282 | 64 350 | 16 112 | 588 057 |
| Markov | 4000 | 191 | 58 097 | 14 113 | 572 833 |
Number of GC and AT patterns in the real Arabidopsis genome and a windowed Markov model simulated genome. See ‘Materials and Methods’ section for more details. The window size that accurately simulated the AT pattern of the Arabidopsis genome is 100 and is in bold text.
Figure 3.Ratio of RNA GQS density to non-genic DNA GQS density for different GQS motifs. DNA GQS density includes G and C sequences in the intergenic region only (Table 2, column 3), since this will not lead to GQS in RNA. RNA GQS density includes only the G sequences found in the genic regions (Table 5, column 3). For example, G3+L1–7, RNA GQS density is 2.9 and the DNA intergenic region GQS density is 16.7, leading to a ratio of 0.17.
Distribution and Density (D) of GQS motifs in Arabidopsis genic RNA
| GQS Motif | Genic | Codingb | 5′-UTRc | 3′-UTRd | Introne | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| G3+L1–7 | 225 | 2.9 | 174 | 4.3 | 77 | 10 | 2.6 | 4 | 14 | 2.2 | 6 | 27 | 1.0 | 12 |
| G3+L1–3 | 43 | 0.6 | 28 | 0.7 | 65 | 4 | 1.1 | 9 | 4 | 0.6 | 9 | 7 | 0.3 | 16 |
| G2+L1–4 | 19 985 | 257.8 | 17 989 | 443.1 | 90 | 417 | 110.0 | 2 | 657 | 105.5 | 3 | 922 | 34.3 | 5 |
| G2+L1–2 | 5435 | 71.1 | 4897 | 120.6 | 90 | 124 | 32.7 | 2 | 128 | 20.5 | 3 | 286 | 10.6 | 5 |
| G2+L1 | 3496 | 45.1 | 3174 | 78.2 | 91 | 74 | 19.5 | 2 | 63 | 10.1 | 2 | 185 | 6.9 | 5 |
Provided are distribution and density of GQS in different regions of the genes from all gene models.
aGenic region is comprised of bcoding, c5′-UTR, d3′-UTR and eIntron. fGQS density is defined as the total number of GQS per Megabase in the specified region. Number of megabases per region: genic region 74.65 Mb, CDS 39.59 Mb, 5′-UTR 3.62 Mb, 3′UTR 6.02 Mb and intron 25.43 Mb. gPercentages were calculated relative to the genic region. Raw numbers (GQS), GQS densities (D) and percentages (%) of GQS. Quadparser search parameters were set to include only G-patterns, which will be found in RNA, and exclude C-patterns.
Distribution and Density (D) of GQS motifs in Arabidopsis intergenic regions
| GQS Motif | Intergenic | Transcribed unitsb | Non-transcribed unitsc | D non-TU/ D TU | ||||
|---|---|---|---|---|---|---|---|---|
| G3 L1–7 | 827 | 22 | 2.3 | 3 | 805 | 20 | 97 | 8.7 |
| G2 L1–4 | 8561 | 686 | 72.0 | 8 | 7875 | 194 | 92 | 2.7 |
Provided are distribution and density of GQS in transcribed (TU) and non-transcribed (non-TU) regions of the intergenic region.
aIntergenic region is comprised of btranscribed units and cnon-transcribed units. Raw numbers (GQS) are provided.
dGQS density (D) is defined as the total number of GQS per Megabase in the specified region.
ePercentages were calculated relative to the intergenic region. Number of Megabases per region: intergenic region 50.070 Mb, TU 9.53 Mb, non-TU 40.54 Mb. Quadparser search parameters were set to include both G- and C-patterns.
Functional analysis of genes with at least one G2L1–4 GQS present in the RNA
| GO IDa | GO Catb | GO termc | GQS genesd | All genese | % GQS genesf | |
|---|---|---|---|---|---|---|
| Overrepresented | ||||||
| 0003824 | MF | Catalytic activity | 2894h | 6393i | 45 | 9E-65 |
| 0006468 | BP | Protein amino acid phosphorylation | 506 | 798 | 63 | 1E-53 |
| 0016740 | MF | Transferase activity | 1118 | 2176 | 51 | 2E-49 |
| 0016301 | MF | Kinase activity | 661 | 1151 | 57 | 2E-48 |
| 0043687 | BP | Post-translational protein modification. | 579 | 985 | 59 | 2E-46 |
| 0005478 | MF | Transporter activity | 504 | 993 | 51 | 1E-19 |
| 0000166 | MF | Nucleotide binding | 490 | 980 | 50 | 2E-17 |
| 0048856 | BP | Anatomical structure development | 359 | 681 | 53 | 5E-17 |
| 0007275 | BP | Multicellular organismal development | 441 | 871 | 51 | 7E-17 |
| 0016020 | CC | Membrane | 1011 | 2266 | 45 | 3E-16 |
| 0022414 | BP | Reproductive process | 275 | 502 | 55 | 8E-16 |
| Underrepresented | ||||||
| 0000496 | MF | Base pairing | 0 | 631 | 0 | <1E-99 |
| 0006412 | BP | Translation | 174 | 1129 | 15 | 4E-51 |
| 0010467 | BP | Gene expression | 293 | 1487 | 20 | 1E-41 |
| 0003723 | MF | RNA binding | 179 | 983 | 18 | 7E-30 |
| 0000154 | BP | rRNA modification | 1 | 70 | 0.01 | 3E-9 |
Provided are overrepresented and underrepresented gene ontologya (GO) ID numbers, bGO categories (Cat) and cGO term for gene products encoded by pre-mRNA with at least one G2+L1–4 GQS. Included are dthe number of genes (scored if GQS is in CDS, 5′-UTR, 3′-UTR, or introns) with a GQS that are annotated for the listed GO term, and ethe total number of genes in Arabidopsis with the listed GO term. Also included are fthe percentage of genes with GQS with a given GO term and gthe appropriate P-value, as determined using the BiNGO program. hThe total number of GO-annotated genes with a GQS in G2L1–4 is 9097. iThe total number of GO-annotated genes in A. thaliana is 25 179. Table is sorted in order of increasing P-value. Some GO terms are sub-categories of others. Complete list is provided in Supporting Information Supplementary Table S2.
Experimental values for G-quartet formation in Arabidopsis RNA
| GQS Motif | Gene IDa | K+1/2 (mM)b | |||
|---|---|---|---|---|---|
| Li+ | Na+ | K+ | |||
| G3 L221 | At1g07180 | 8.0 ± 0.7 | 54 | 62 | >85 |
| G3 L444 | At5g53580 | 42 ± 5 | 30 | 45 | 74 |
| G3 L444 + FLANKd | At5g53580 | 220 ± 70 | NDe | ND | ND |
| G2 L111 | At2g39320 | 30 ± 2 | 32 | 43 | >85 |
| G2 L444d | At1g44020 | 316 ± 1 | ND | ND | 62 |
Provided are thermodynamic parameters for G-quartet formation for RNA oligonucleotides from representative Arabidopsis genes. See ‘Materials and Methods’ section for full sequences.
aGene ID identifies the particular gene from the Arabidopsis genome and sequences are provided in ‘Materials and Methods’ section.
bK+1/2 (K+ concentration needed to fold half the RNA) values were determined by CD titrations and using Equation (1).
cTm (melting temperature) values were determined by UV thermal melts at 100 mM monovalent salt concentration using the chloride salt.
dMelts for these oligonucleotides were performed at 1M salt concentration owing to their higher K+1/2 values.
eND indicates that the Tm value could not be determined due to absence of a well defined folding transition.
Figure 4.Formation of GQS RNA oligonucleotides (a) CD spectra of RNA GQS in 10 mM LiCacodylate (pH 7.0) and 150 mM KCl at 20°C: G3L221 (black), G2L111 (red), G3L444 (gold), G3L444 + FLANK (green) and G2L444 (blue). The positive peak at 260 nm and the negative peak at 240 nm suggest that the RNA GQS adopt a parallel conformation. The G2L111 RNA most likely has some antiparallel character due to the shoulder in the spectrum extending to 300 nm. See ‘Materials and Methods’ section for full sequences. (b) Sample K+ titration. Titration is of G3L444 RNA GQS with KCl additions from 0 mM to 700 mM. The arrows indicate increasing K+ concentrations. Also included are UV thermal denaturations of G3L444 RNA in 100 mM LiCl (black), NaCl (red) and KCl (blue) at (c) 260 nm and (d) 295 nm. Absorbances are normalized to the highest absorbance.