| Literature DB >> 24455203 |
Mohammadmersad Ghorbani1, Simon J E Taylor1, Mark A Pook2, Annette Payne1.
Abstract
Previous studies have examined DNA methylation in different trinucleotide repeat diseases. We have combined this data and used a pattern searching algorithm to identify motifs in the DNA surrounding aberrantly methylated CpGs found in the DNA of patients with one of the three trinucleotide repeat (TNR) expansion diseases: fragile X syndrome (FRAXA), myotonic dystrophy type I (DM1), or Friedreich's ataxia (FRDA). We examined sequences surrounding both the variably methylated (VM) CpGs, which are hypermethylated in patients compared with unaffected controls, and the nonvariably methylated CpGs which remain either always methylated (AM) or never methylated (NM) in both patients and controls. Using the J48 algorithm of WEKA analysis, we identified that two patterns are all that is necessary to classify our three regions CCGG∗ which is found in VM and not in AM regions and AATT∗ which distinguished between NM and VM + AM using proportional frequency. Furthermore, comparing our software with MEME software, we have demonstrated that our software identifies more patterns than MEME in these short DNA sequences. Thus, we present evidence that the DNA sequence surrounding CpG can influence its susceptibility to be de novo methylated in a disease state associated with a trinucleotide repeat.Entities:
Year: 2013 PMID: 24455203 PMCID: PMC3884633 DOI: 10.1155/2013/689798
Source DB: PubMed Journal: J Nucleic Acids ISSN: 2090-0201
Figure 1This figure shows the 3 gene regions under investigation: (a) DMPK 3′ UTR region, (b) FMR1 5′ UTR region, and (c) FXN intron 1 region. A scheme of the DNA sequence, transcriptional start site, and the regions analysed are shown. The grey shading shows the always methylated (AM) regions, blue shows the never methylated (NM), and the yellow area shows variably methylated (VM) regions. CpG sites are underlined and bold numbers at start and end of each line show base pair number in the sequence. A triangle shows the location of repeat expansion and the box above triangle shows the TNR repeats. The green highlighted region in the FMR1 gene indicates the promoter region. The CTCF binding sites are shown in red.
Table of discriminatory patterns.
| Top 10 patterns that separate AM class from NM and AM | Top 10 patterns that separate VM class from AM and NM | Top 10 patterns that separate NM class from AM and VM |
|---|---|---|
| ccgg[agct]{0,1} | [agct]{0,1}tttt | ta[agct]{0,2}a |
| g[agct]{0,1}gcg | t[agct]{0,1}cat | [agct]{0,1}gcgg |
| g[agct]{0,1}ctc | catg[agct]{0,1} | [agct]{0,1}ccgc |
| cgg[agct]{0,1}t | ga[agct]{0,1}at | c[agct]{0,1}ccg |
| ag[agct]{0,1}ct | at[agct]{0,1}ca | ag[agct]{0,1}gg |
| cg[agct]{0,1}tc | taa[agct]{0,1}t | c[agct]{0,1}gcg |
| cga[agct]{0,1}c | tct[agct]{0,1}a | ag[agct]{0,1}ac |
| t[agct]{0,1}cga | [agct]{0,1}tgca | tt[agct]{0,1}aa |
| gt[agct]{0,1}ac | ta[agct]{0,1}ta | ctt[agct]{0,1}a |
| tcga[agct]{0,1} | ac[agct]{0,1}ta | ttt[agct]{0,1}a |
Figure 2Decision tree created by Weka package. AM is always methylated, NM is never methylated (NM) and (VM) is variably methylated.
(a) Patterns unique to VM
| Pattern | Sum (VM) | Sum (AM) | Sum (NM) |
|---|---|---|---|
| ga[agct]{0,1}tc | 0.006803 | 0 | 0 |
| g[agct]{0,1}gac | 0.010544 | 0 | 0 |
| gt[agct]{0,1}ac | 0.014286 | 0 | 0 |
(b) Top 10 proportionally most frequently occurring patterns unique to AM
| Pattern | Sum (VM) | Sum (AM) | Sum (NM) |
|---|---|---|---|
| taa[agct]{0,1}t | 0 | 0.043757 | 0 |
| tct[agct]{0,1}a | 0 | 0.043119 | 0 |
| ta[agct]{0,1}ta | 0 | 0.038785 | 0 |
| ga[agct]{0,1}aa | 0 | 0.034818 | 0 |
| ta[agct]{0,1}at | 0 | 0.030483 | 0 |
| t[agct]{0,1}tat | 0 | 0.028634 | 0 |
| at[agct]{0,1}ag | 0 | 0.026786 | 0 |
| [agct]{0,1}atac | 0 | 0.024938 | 0 |
| a[agct]{0,1}tag | 0 | 0.024938 | 0 |
| tat[agct]{0,1}a | 0 | 0.024938 | 0 |
(c) The 8 proportionally most frequently occurring patterns which are more prevalent in NM than in VM and AM
| Pattern | Sum (VM) | Sum (AM) | Sum (NM) |
|---|---|---|---|
| [agct]{0,1}atct | 0 | 0.001848 | 0.004348 |
| agat[agct]{0,1} | 0 | 0.001848 | 0.004348 |
| a[agct]{0,1}gat | 0 | 0.003697 | 0.008696 |
| atc[agct]{0,1}t | 0 | 0.003697 | 0.008696 |
| [agct]{0,1}atcg | 0 | 0.005545 | 0.008696 |
| cgat[agct]{0,1} | 0 | 0.005545 | 0.008696 |
| atcg[agct]{0,1} | 0 | 0.005545 | 0.008696 |
| [agct]{0,1}cgat | 0 | 0.005545 | 0.008696 |
(a) 10 best 5 bp motifs in variably methylated regions
| Pattern | MEME detail positive or negative strand | Our software variably methylated |
|---|---|---|
| TGTTT | FXN+, FMR1+ | FMR1+, FXN+ |
| AAACT | FXN++− | FXN++− |
| TATTT | FXN++ | FXN++ |
| TCCAA | FXN+DMPK− | FXN+DMPK+ |
| TCGAA | DMPK+DMPK− | DMPK+DMPK− |
|
|
|
|
| CTGAA | FMR1−DMPK+ | DMPK+FMR1− |
|
|
|
|
| TA[CG]AA | DMPK−DMPK+ | DMPK−DMPK+FXN− |
| ACCCA | DMPK− − | DMPK − − |
(b) 10 best 5 bp motifs in always methylated regions
| Pattern | MEME detail positive or negative strand | Our software always methylated |
|---|---|---|
| AGGGG | FXN_AM_1++FMR1+−− | FMR1+−−FXN_AM_1++ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| CCGCC | FXN_AM_1−FMR1+ | FXN_AM_1−FMR1+ |
|
|
|
|
|
|
|
|
|
|
|
|
| AGAAA | FXN_AM_2+FMR1++−−− − | FXN_AM_2+FMR1++−−− − |
(c) 10 best 5 bp motifs in never methylated region
| Pattern | MEME detail positive or negative strand | Our software never methylated |
|---|---|---|
|
|
|
|
|
|
|
|
| AGGCA | DMPK−+ | DMPK+− |
|
|
|
|
| CCATC | DMPK+− | DMPK+− |
|
|
|
|
| CAGAC | DMPK−+ | DMPK+− |
| TGACG | DMPK++ | DMPK++ |
| ACC[AT]A | DMPK−+ | DMPK+− |
| CTGGG | DMPK++ | DMPK++ |
Key:
+ Positive strand.
− Negative strand.
FXN_AM_1 and FXN_AM_2 are two separated regions with always methylated CpG sites.
Bold: Motif that MEME does not report all occurrences.