| Literature DB >> 23303794 |
Wei Chen1, Peng-Mian Feng, Hao Lin, Kuo-Chen Chou.
Abstract
Meiotic recombination is an important biological process. As a main driving force of evolution, recombination provides natural new combinations of genetic variations. Rather than randomly occurring across a genome, meiotic recombination takes place in some genomic regions (the so-called 'hotspots') with higher frequencies, and in the other regions (the so-called 'coldspots') with lower frequencies. Therefore, the information of the hotspots and coldspots would provide useful insights for in-depth studying of the mechanism of recombination and the genome evolution process as well. So far, the recombination regions have been mainly determined by experiments, which are both expensive and time-consuming. With the avalanche of genome sequences generated in the postgenomic age, it is highly desired to develop automated methods for rapidly and effectively identifying the recombination regions. In this study, a predictor, called 'iRSpot-PseDNC', was developed for identifying the recombination hotspots and coldspots. In the new predictor, the samples of DNA sequences are formulated by a novel feature vector, the so-called 'pseudo dinucleotide composition' (PseDNC), into which six local DNA structural properties, i.e. three angular parameters (twist, tilt and roll) and three translational parameters (shift, slide and rise), are incorporated. It was observed by the rigorous jackknife test that the overall success rate achieved by iRSpot-PseDNC was >82% in identifying recombination spots in Saccharomyces cerevisiae, indicating the new predictor is promising or at least may become a complementary tool to the existing methods in this area. Although the benchmark data set used to train and test the current method was from S. cerevisiae, the basic approaches can also be extended to deal with all the other genomes. Particularly, it has not escaped our notice that the PseDNC approach can be also used to study many other DNA-related problems. As a user-friendly web-server, iRSpot-PseDNC is freely accessible at http://lin.uestc.edu.cn/server/iRSpot-PseDNC.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23303794 PMCID: PMC3616736 DOI: 10.1093/nar/gks1450
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.A schematic drawing to show the meiotic recombination pathways in a DNA system. Recombination is initiated by a double-strand break (DSB) catalysed by the Spo11 protein (green ball), a relative of archaeal topoisomerase VI. After DSBs are formed, Spo11 is removed from the DNA molecule (blue helix) and the single-stranded 3′ ends are formed. These tails undergo strand invasion of intact homologous duplexes (red helix), ultimately yielding mature recombinant products. The repair of meiotic DSB can result in either reciprocal exchange of the chromosome arms flanking the break (a crossover) as shown in the left lower panel, or no exchange of flanking arms (a non-crossover or parental configuration) as shown in the right lower panel. Adapted from (2).
Figure 2.A schematic illustration to show the correlations of dinucleotides along a DNA sequence. (a) The first-tier correlation reflects the sequence-order mode between all the most contiguous dinucleotide. (b) The second-tier correlation reflects the sequence-order mode between all the second-most contiguous dinucleotide. (c) The third-tier correlation reflects the sequence-order mode between all the third-most contiguous dinucleotide.
The original numerical values for the six DNA dinucleotide physical structures
| Dinucleotide | Physical structures | |||||
|---|---|---|---|---|---|---|
| AA | 0.026 | 0.038 | 0.020 | 1.69 | 2.26 | 7.65 |
| AC | 0.036 | 0.038 | 0.023 | 1.32 | 3.03 | 8.93 |
| AG | 0.031 | 0.037 | 0.019 | 1.46 | 2.03 | 7.08 |
| AT | 0.033 | 0.036 | 0.022 | 1.03 | 3.83 | 9.07 |
| CA | 0.016 | 0.025 | 0.017 | 1.07 | 1.78 | 6.38 |
| CC | 0.026 | 0.042 | 0.019 | 1.43 | 1.65 | 8.04 |
| CG | 0.014 | 0.026 | 0.016 | 1.08 | 2.00 | 6.23 |
| CT | 0.031 | 0.037 | 0.019 | 1.46 | 2.03 | 7.08 |
| GA | 0.025 | 0.038 | 0.020 | 1.32 | 1.93 | 8.56 |
| GC | 0.025 | 0.036 | 0.026 | 1.20 | 2.61 | 9.53 |
| GG | 0.026 | 0.042 | 0.019 | 1.43 | 1.65 | 8.04 |
| GT | 0.036 | 0.038 | 0.023 | 1.32 | 3.03 | 8.93 |
| TA | 0.017 | 0.018 | 0.016 | 0.72 | 1.20 | 6.23 |
| TC | 0.025 | 0.038 | 0.020 | 1.32 | 1.93 | 8.56 |
| TG | 0.016 | 0.025 | 0.017 | 1.07 | 1.78 | 6.38 |
| TT | 0.026 | 0.038 | 0.020 | 1.69 | 2.26 | 7.65 |
aIn this table, the following symbols were used to represent the six physical structures of dinucleotide (32): for ‘twist, for ‘tilt’, for ‘roll’, for ‘shift’, for ‘slide’ and for ‘rise’.
The normalized values for the six DNA dinucleotide physical structures
| Dinucleotide | Physical structures | |||||
|---|---|---|---|---|---|---|
| AA | 0.06 | 0.5 | 0.27 | 1.59 | 0.11 | −0.11 |
| AC | 1.50 | 0.50 | 0.80 | 0.13 | 1.29 | 1.04 |
| AG | 0.78 | 0.36 | 0.09 | 0.68 | −0.24 | −0.62 |
| AT | 1.07 | 0.22 | 0.62 | −1.02 | 2.51 | 1.17 |
| CA | −1.38 | −1.36 | −0.27 | −0.86 | −0.62 | −1.25 |
| CC | 0.06 | 1.08 | 0.09 | 0.56 | −0.82 | 0.24 |
| CG | −1.66 | −1.22 | −0.44 | −0.82 | −0.29 | −1.39 |
| CT | 0.78 | 0.36 | 0.09 | 0.68 | −0.24 | −0.62 |
| GA | −0.08 | 0.5 | 0.27 | 0.13 | −0.39 | 0.71 |
| GC | −0.08 | 0.22 | 1.33 | −0.35 | 0.65 | 1.59 |
| GG | 0.06 | 1.08 | 0.09 | 0.56 | −0.82 | 0.24 |
| GT | 1.50 | 0.50 | 0.80 | 0.13 | 1.29 | 1.04 |
| TA | −1.23 | −2.37 | −0.44 | −2.24 | −1.51 | −1.39 |
| TC | −0.08 | 0.5 | 0.27 | 0.13 | −0.39 | 0.71 |
| TG | −1.38 | −1.36 | −0.27 | −0.86 | −0.62 | −1.25 |
| TT | 0.06 | 0.5 | 0.27 | 1.59 | 0.11 | −0.11 |
aSee footnote a of Table 1 for further explanation.
A comparison of between iRSpot-PseDNC with the existing method
| Predictor | Test method | Sn (%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|---|
| iRSpot-PseDNC | Jackknife | 73.06 | 89.49 | 82.04 | 0.638 |
| 5-fold cross | 81.63 | 88.14 | 85.19 | 0.692 | |
| IDQD | 5-fold cross | 79.40 | 81.00 | 80.30 | 0.603 |
aThe parameters used: and for Equation 9; and for the LIBSVM operation engine (47).
bFrom Liu et al. (6).
Figure 3.A semi-screenshot to show the top page of the iRSpot-PseDNC web-server. Its website address is at http://lin.uestc.edu.cn/server/iRSpot-PseDNC.