| Literature DB >> 28767725 |
Michal Růžička1,2, Petr Kulhánek1,3, Lenka Radová1, Andrea Čechová1, Naďa Špačková2, Lenka Fajkusová4, Kamila Réblová1.
Abstract
Mutations in human genes can be responsible for inherited genetic disorders and cancer. Mutations can arise due to environmental factors or spontaneously. It has been shown that certain DNA sequences are more prone to mutate. These sites are termed hotspots and exhibit a higher mutation frequency than expected by chance. In contrast, DNA sequences with lower mutation frequencies than expected by chance are termed coldspots. Mutation hotspots are usually derived from a mutation spectrum, which reflects particular population where an effect of a common ancestor plays a role. To detect coldspots/hotspots unaffected by population bias, we analysed the presence of germline mutations obtained from HGMD database in the 5-nucleotide segments repeatedly occurring in genes associated with common inherited disorders, in particular, the PAH, LDLR, CFTR, F8, and F9 genes. Statistically significant sequences (mutational motifs) rarely associated with mutations (coldspots) and frequently associated with mutations (hotspots) exhibited characteristic sequence patterns, e.g. coldspots contained purine tract while hotspots showed alternating purine-pyrimidine bases, often with the presence of CpG dinucleotide. Using molecular dynamics simulations and free energy calculations, we analysed the global bending properties of two selected coldspots and two hotspots with a G/T mismatch. We observed that the coldspots were inherently more flexible than the hotspots. We assume that this property might be critical for effective mismatch repair as DNA with a mutation recognized by MutSα protein is noticeably bent.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28767725 PMCID: PMC5540541 DOI: 10.1371/journal.pone.0182377
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Survey of analysed genes, lengths of DNA sequences, number of identified 5-nt segments and number of germline mutations found in HGMD database.
| Gene | Total length of analysed DNA sequence (nt) | Number of unique 5-nt segments | Number of nucleotide positions with mutation dataset—2014 | Number of nucleotide positions with mutation dataset—2016 |
|---|---|---|---|---|
| 1528 | 423 | 470 | 525 | |
| 2720 | 464 | 767 | 843 | |
| 4912 | 487 | 862 | 921 | |
| 7357 | 486 | 1351 | 1488 | |
| 1457 | 432 | 539 | 543 |
Top twenty coldspots in 2014 and 2016 datasets.
| Mutation coldspot 5’→3’/5’→3’ | Fisher combined p-value for coldspot | Mutation coldspot 5’→3’/5’→3’ | Fisher combined p-value for coldspot |
|---|---|---|---|
| Dataset 2014 | Dataset 2016 | ||
| 3.23E-09 | 9.36858E-09 | ||
| 5.80E-09 | 1.07478E-08 | ||
| 6.82E-09 | 1.36135E-08 | ||
| 1.07E-08 | 1.77502E-08 | ||
| 1.68E-08 | 4.49683E-08 | ||
| 3.88E-08 | 6.47074E-07 | ||
| 6.47E-07 | 1.21713E-06 | ||
| 1.08E-06 | 1.26841E-06 | ||
| 1.22E-06 | 3.0346E-06 | ||
| 1.56E-06 | 3.83943E-06 | ||
| 1.70E-06 | 4.40191E-06 | ||
| 3.03E-06 | 8.86768E-06 | ||
| 4.40E-06 | 1.34215E-05 | ||
| 4.48E-06 | 1.62237E-05 | ||
| 1.38E-05 | 2.2854E-05 | ||
| 1.40E-05 | 2.88886E-05 | ||
| 1.62E-05 | 2.91218E-05 | ||
| 1.92E-05 | 4.88009E-05 | ||
| 2.29E-05 | 5.60054E-05 | ||
| 2.45E-05 | 7.23417E-05 | ||
Motifs containing four or five purine tracks are in bold. Motifs detected in both datasets are in grey field
Top twenty hotspots in 2014 and 2016 datasets.
| Mutation hotspot 5’→3’/5’→3’ | Fisher combined p-value for hotspot | Mutation hotspot 5’→3’/5’→3’ | Fisher combined p-value for hotspot |
|---|---|---|---|
| Dataset 2014 | Dataset 2016 | ||
| 3.12E-10 | 1.24262E-11 | ||
| 2.69E-08 | 2.6884E-08 | ||
| 5.51E-05 | 2.97136E-06 | ||
| 7.31E-05 | 2.11333E-05 | ||
| 0.000101 | 2.24424E-05 | ||
| 0.000158 | 5.50895E-05 | ||
| 0.000240 | 7.3056E-05 | ||
| 0.000451 | 0.000101 | ||
| 0.001007 | 0.000287 | ||
| 0.002042 | 0.000431 | ||
| 0.002623 | 0.000451 | ||
| 0.002625 | 0.000482 | ||
| 0.003514 | 0.000508 | ||
| 0.003707 | 0.000538 | ||
| 0.003755 | 0.000630 | ||
| 0.004126 | 0.000657 | ||
| 0.004429 | 0.000802 | ||
| 0.005546 | 0.001007 | ||
| 0.006022 | 0.001370 | ||
| 0.007033 | 0.002439 | ||
Motifs containing CpG dinucleotide in the middle are italic