| Literature DB >> 27648451 |
Bingquan Liu1, Yumeng Liu2, Dong Huang3.
Abstract
Recombination presents a nonuniform distribution across the genome. Genomic regions that present relatively higher frequencies of recombination are called hotspots while those with relatively lower frequencies of recombination are recombination coldspots. Therefore, the identification of hotspots/coldspots could provide useful information for the study of the mechanism of recombination. In this study, a new computational predictor called SVM-EL was proposed to identify hotspots/coldspots across the yeast genome. It combined Support Vector Machines (SVMs) and Ensemble Learning (EL) based on three features including basic kmer (Kmer), dinucleotide-based auto-cross covariance (DACC), and pseudo dinucleotide composition (PseDNC). These features are able to incorporate the nucleic acid composition and their order information into the predictor. The proposed SVM-EL achieves an accuracy of 82.89% on a widely used benchmark dataset, which outperforms some related methods.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27648451 PMCID: PMC5015011 DOI: 10.1155/2016/8527435
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1An example of the kmer features' generation by using Pse-in-One.
The values of fifteen DNA dinucleotide properties.
| AA/TT | AC/GT | AG/CT | AT | CA/TG | CC/GG | CG | GA/TC | GC | TA | |
|---|---|---|---|---|---|---|---|---|---|---|
| F-roll | 0.04 | 0.06 | 0.04 | 0.05 | 0.04 | 0.04 | 0.04 | 0.05 | 0.05 | 0.03 |
| F-tilt | 0.08 | 0.07 | 0.06 | 0.10 | 0.06 | 0.06 | 0.06 | 0.07 | 0.07 | 0.07 |
| F-twist | 0.07 | 0.06 | 0.05 | 0.07 | 0.05 | 0.06 | 0.05 | 0.06 | 0.06 | 0.05 |
| F-slide | 6.69 | 6.80 | 3.47 | 9.61 | 2.00 | 2.99 | 2.71 | 4.27 | 4.21 | 1.85 |
| F-shift | 6.24 | 2.91 | 2.80 | 4.66 | 2.88 | 2.67 | 3.02 | 3.58 | 2.66 | 4.11 |
| F-rise | 21.34 | 21.98 | 17.48 | 24.79 | 14.51 | 14.25 | 14.66 | 18.41 | 17.31 | 14.24 |
| Roll | 1.05 | 2.01 | 3.60 | 0.61 | 5.60 | 4.68 | 6.02 | 2.44 | 1.70 | 3.50 |
| Tilt | −1.26 | 0.33 | −1.66 | 0.00 | 0.14 | −0.77 | 0.00 | 1.44 | 0.00 | 0.00 |
| Twist | 35.02 | 31.53 | 32.29 | 30.72 | 35.43 | 33.54 | 33.67 | 35.67 | 34.07 | 36.94 |
| Slide | −0.18 | −0.59 | −0.22 | −0.68 | 0.48 | −0.17 | 0.44 | −0.05 | −0.19 | 0.04 |
| Shift | 0.01 | −0.02 | −0.02 | 0.00 | 0.01 | 0.03 | 0.00 | −0.01 | 0.00 | 0.00 |
| Rise | 3.25 | 3.24 | 3.32 | 3.21 | 3.37 | 3.36 | 3.29 | 3.30 | 3.27 | 3.39 |
| Energy | −1.00 | −1.44 | −1.28 | −0.88 | −1.45 | −1.84 | −2.17 | −1.30 | −2.24 | −0.58 |
| Enthalpy | −7.60 | −8.40 | −7.80 | −7.20 | −8.50 | −8.00 | −10.60 | −8.20 | −9.80 | −7.20 |
| Entropy | −21.30 | −22.40 | −21.00 | −20.40 | −22.70 | −19.90 | −27.20 | −22.20 | −24.40 | −21.30 |
Figure 2An example of the DACC features' generation by using Pse-in-One.
Figure 3An example of the PseDNC features' generation by using Pse-in-One.
Figure 4The basic framework for an ensemble classifier.
Results on benchmark dataset for different predictors proposed in the current study.
| Predictor | Test method | Se (%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|---|
| SVM-Kmera | Jackknife | 75.92 | 86.29 | 81.59 | 0.628 |
| SVM-DACCb | Jackknife | 76.12 | 87.99 | 82.61 | 0.649 |
| SVM-PseDNCc | Jackknife | 72.04 | 90.69 | 82.24 | 0.644 |
| SVM-EL | Jackknife | 76.33 | 88.33 | 82.89 | 0.654 |
aThe parameters used are k = 6 for SVM-Kmer and C = 27 and γ = 2 for LIBSVM [18].
bThe parameters used are lag = 6 for SVM-DACC and C = 23 and γ = 2−3 for LIBSVM [18].
cThe parameters used are λ = 7 and w = 0.3 for SVM-PseDNC and C = 213 and γ = 23 for LIBSVM [18].
Figure 5The comparison of different predictors for hotspots/coldspots identification. The areas under ROC curves (AUC) of SVM-EL, SVM-DACC, SVM-Kmer, and SVM-PseDNC are 0.91, 0.90, 0.89, and 0.87, respectively.
Results on benchmark dataset for different predictors.
| Predictor | Test method | Se (%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|---|
| IDQDa | 5-fold | 79.40 | 81.00 | 80.30 | 0.603 |
| iRSpot-PseDNCb | Jackknife | 73.06 | 89.49 | 82.04 | 0.638 |
| SVM-EL | Jackknife | 76.33 | 88.33 | 82.89 | 0.654 |
aFrom Liu et al. [10].
bFrom Chen et al. [12].