| Literature DB >> 27641752 |
Bingquan Liu1, Yumeng Liu2, Xiaopeng Jin3, Xiaolong Wang2,4, Bin Liu2,4.
Abstract
Meiotic recombination presents an uneven distribution across the genome. Genomic regions that exhibit at relatively high frequencies of recombination are called hotspots, whereas those with relatively low frequencies of recombination are called coldspots. Therefore, hotspots and coldspots would provide useful information for the study of the mechanism of recombination. In this study, we proposed a computational predictor called iRSpot-DACC to predict hot/cold spots across the yeast genome. It combined Support Vector Machines (SVMs) and a feature called dinucleotide-based auto-cross covariance (DACC), which is able to incorporate the global sequence-order information and fifteen local DNA properties into the predictor. Combined with Principal Component Analysis (PCA), its performance was further improved. Experimental results on a benchmark dataset showed that iRSpot-DACC can achieve an accuracy of 82.7%, outperforming some highly related methods.Entities:
Year: 2016 PMID: 27641752 PMCID: PMC5027590 DOI: 10.1038/srep33483
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The distribution of Acc values achieved by iRSpot-DACC with different lag values based on the benchmark dataset through five-fold cross validation.
Results of different predictors on benchmark dataset.
| Predictor | Test method | Sn(%) | Sp(%) | Acc (%) | MCC |
|---|---|---|---|---|---|
| IDQD | 5-fold | 79.40 | 81.00 | 80.30 | 0.603 |
| iRSpot-PseDNC | Jackknife | 73.06 | 89.49 | 82.04 | 0.638 |
| iRSpot-DACC | Jackknife | 75.71 | 88.16 | 82.52 | 0.647 |
| iRSpot-DACC-PCA | Jackknife | 76.33 | 87.99 | 82.70 | 0.651 |
aFrom Liu et al.1;
bFrom Chen et al.16;
cThe parameter used: lag = 6 for Eq. (4) and Eq. (7); C = 23 and γ = 2−3 for the LIBSVM47;
dThe parameter used: lag = 6 for Eq. (4) and Eq. (7); C = 23 and γ = 2−3 for the LIBSVM47; w = 0.99 for PCA.
Figure 2An illustration for the discriminant visualization.
The figure labeled by y-axis and x-axis shows the distribution of different features. The adjacent color bar shows the mapping of sum score values.
The top ten most important features in iRSpot-DACC for identifying hot/cold spots.
| Features | Parameters | |||
|---|---|---|---|---|
| Discriminative power | ||||
| DAC | F-tilt | F-tilt | 3 | 78.56 |
| DCC | F-tilt | tilt | 3 | 77.98 |
| DCC | F-tilt | entropy | 1 | 70.56 |
| DCC | F-roll | F-slide | 3 | 67.56 |
| DCC | F-roll | twist | 1 | 66.33 |
| DCC | F-tilt | F-roll | 5 | 63.03 |
| DCC | F-roll | energy | 5 | 60.57 |
| DCC | F-roll | F-rise | 5 | 59.84 |
| DCC | F-tilt | tilt | 1 | 58.81 |
| DCC | F-roll | rise | 2 | 54.74 |
μ1 and μ2 are the indices of dinucleotide local property, lag is the distance between two dinucleotides and the value of discriminative power represents the discriminative power of the corresponding features. The larger the value is, the stronger the discriminative power. The calculation of this value refers to Eq. (1).
Figure 3The process of generating DACC feature vector.
(a) The generating process of DAC feature vector. It depicts the correlation of the same property index between two dinucleotides. (b) The generating process of DCC feature vector. It depicts the correlation of the different property indices between two dinucleotides.
The values of the fifteen DNA dinucleotide properties.
| AA/TT | AC/GT | AG/CT | AT | CA/TG | CC/GG | CG | GA/TC | GC | TA | |
|---|---|---|---|---|---|---|---|---|---|---|
| F-roll | 0.04 | 0.06 | 0.04 | 0.05 | 0.04 | 0.04 | 0.04 | 0.05 | 0.05 | 0.03 |
| F-tilt | 0.08 | 0.07 | 0.06 | 0.10 | 0.06 | 0.06 | 0.06 | 0.07 | 0.07 | 0.07 |
| F-twist | 0.07 | 0.06 | 0.05 | 0.07 | 0.05 | 0.06 | 0.05 | 0.06 | 0.06 | 0.05 |
| F-slide | 6.69 | 6.80 | 3.47 | 9.61 | 2.00 | 2.99 | 2.71 | 4.27 | 4.21 | 1.85 |
| F-shift | 6.24 | 2.91 | 2.80 | 4.66 | 2.88 | 2.67 | 3.02 | 3.58 | 2.66 | 4.11 |
| F-rise | 21.34 | 21.98 | 17.48 | 24.79 | 14.51 | 14.25 | 14.66 | 18.41 | 17.31 | 14.24 |
| roll | 1.05 | 2.01 | 3.60 | 0.61 | 5.60 | 4.68 | 6.02 | 2.44 | 1.70 | 3.50 |
| tilt | −1.26 | 0.33 | −1.66 | 0.00 | 0.14 | −0.77 | 0.00 | 1.44 | 0.00 | 0.00 |
| twist | 35.02 | 31.53 | 32.29 | 30.72 | 35.43 | 33.54 | 33.67 | 35.67 | 34.07 | 36.94 |
| slide | −0.18 | −0.59 | −0.22 | −0.68 | 0.48 | −0.17 | 0.44 | −0.05 | −0.19 | 0.04 |
| shift | 0.01 | −0.02 | −0.02 | 0.00 | 0.01 | 0.03 | 0.00 | −0.01 | 0.00 | 0.00 |
| rise | 3.25 | 3.24 | 3.32 | 3.21 | 3.37 | 3.36 | 3.29 | 3.30 | 3.27 | 3.39 |
| energy | −1.00 | −1.44 | −1.28 | −0.88 | −1.45 | −1.84 | −2.17 | −1.30 | −2.24 | −0.58 |
| enthalpy | −7.60 | −8.40 | −7.80 | −7.20 | −8.50 | −8.00 | −10.60 | −8.20 | −9.80 | −7.20 |
| entropy | −21.30 | −22.40 | −21.00 | −20.40 | −22.70 | −19.90 | −27.20 | −22.20 | −24.40 | −21.30 |