| Literature DB >> 19617376 |
Sigve Nakken1, Torbjørn Rognes, Eivind Hovig.
Abstract
Specific guanine-rich sequence motifs in the human genome have considerable potential to form four-stranded structures known as G-quadruplexes or G4 DNA. The enrichment of these motifs in key chromosomal regions has suggested a functional role for the G-quadruplex structure in genomic regulation. In this work, we have examined the spectrum of nucleotide substitutions in G4 motifs, and related this spectrum to G4 prevalence. Data collected from the large repository of human SNPs indicates that the core feature of G-quadruplex motifs, 5'-GGG-3', exhibits specific mutational patterns that preserve the potential for G4 formation. In particular, we find a genome-wide pattern in which sites that disrupt the guanine triplets are more conserved and less polymorphic than their neutral counterparts. This also holds when considering non-CpG sites only. However, the low level of polymorphisms in guanine tracts is not only confined to G4 motifs. A complete mapping of DNA three-mers at guanine polymorphisms indicated that short guanine tracts are the most under-represented sequence context at polymorphic sites. Furthermore, we provide evidence for a strand bias upstream of human genes. Here, a significantly lower rate of G4-disruptive SNPs on the non-template strand supports a higher relative influence of G4 formation on this strand during transcription.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19617376 PMCID: PMC2761265 DOI: 10.1093/nar/gkp590
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.(a) A simplified illustration of a human gene, showing how the gene 5′ and gene 3′ regions were defined. (b) An example of a G4 sequence motif. The G4-disruptive sites are in grey colour, while the G4-neutral sites are in black. The underlined guanines are guanines within tracts that, when mutated, will not disrupt the G4 consensus.
Density of SNPs and CpG dinucleotides in G4 motifs
| Number of G4 motifs | Number of SNPs | SNPs/CpG | CpG island coverage | CpG/kb | |
|---|---|---|---|---|---|
| Genome | 282 501 | – | – | – | – |
| First introns | 17 926 (0.33 Mb) | 555 (441) | 0.00413 (0.014) | 0.093 (0.014) | 58.4 (28.7) |
| Gene 5′ | 31 694 (0.55 Mb) | 1157 (874) | 0.0052 (0.012) | 0.044 (0.010) | 57.7 (34.9) |
| Gene 3′ | 17 458 (0.30 Mb) | 906 (639) | 0.0190 (0.038) | 0.048 (0.008) | 29.5 (13.6) |
| Intergenic | 103 911 (2.01 Mb) | 5001 (4096) | 0.023 (0.064) | 0.036 (0.002) | 22.6 (7.6) |
aTotal number of SNPs that map to G4 motifs. The number of unique (non-redundant) SNPs is given in parentheses.
bThe density estimate of SNPs at G4-CpGs included only C/T and A/G SNPs, since the majority of substitutions occurring at the hypermutable CpG are methylation-dependent transitions. A similar density estimate of SNPs at CpGs in the genomic background is given in parentheses.
cCoverage is defined as the fraction of island bases covered by G4 bases. Coverage of G4 outside CpG islands is given in parentheses.
dCpG density in genomic background is given in parentheses.
Figure 2.SNP density in G4 sequences versus randomly picked non-G4 sequences. The set of non-G4 sequences were drawn such that their GC-richness was equivalent to that of G4.
Figure 3.SNP density in G4-disruptive sites versus G4-neutral sites (see Figure 1b for a definition of G4-disruptive and G4-neutral).
Density of SNPs in disruptive and neutral sites of G4 sequence motifs
| G4-neutral | G4-disruptive | ||||
|---|---|---|---|---|---|
| CpGs/kb | SNPs/kb | CpGs/kb | SNPs/kb | ||
| First introns | 166.5 | 1.52 (1.38) | <0.00001 | 73.1 | 0.90 (0.91) |
| Gene 5′ | 166.1 | 1.68 (1.48) | <0.05 | 71.4 | 1.28 (1.29) |
| Gene 3′ | 83.9 | 2.37 (1.72) | <0.05 | 36.1 | 1.63 (1.45) |
| Intergenic | 65.0 | 2.28 (1.65) | <0.001 | 25.6 | 1.62 (1.46) |
aDensity of SNPs in non-CpG sites are given in parentheses.
bDifference in SNP density between G4-disruptive and G4-neutral sites (non-CpG) by Chi-squared analysis.
Figure 4.Sequence conservation in G4-disruptive sites versus G4-neutral sites. Shown is the fraction of conserved (i.e. all bases identical) sites at G4-disruptive and G4-neutral sites, as extracted from MultiZ sequence alignments of human G4 with monkey (rheMac2), dog (canFam2) and mouse (mm8). Only non-CpG sites were probed for conservation.
Figure 5.The ratio of DNA three-mers at polymorphic to non-polymorphic sites. Only non-CpG three-mers have been plotted, and each three-mer ratio constitutes the combined ratio of the forward and reverse complementary context. Only SNPs that were proven polymorphic by the HapMap project were used in the calculation.
Figure 6.The ratio of SNP density (non-CpG) in nontemplate motifs to the SNP density in template motifs. The dashed line indicates a similar rate of SNPs with respect to the strandedness of the motif, i.e. no strand bias.