| Literature DB >> 27147572 |
Xuan Xiao1,2,3, Han-Xiao Ye1, Zi Liu4, Jian-Hua Jia1, Kuo-Chen Chou5,3.
Abstract
DNA replication, occurring in all living organisms and being the basis for biological inheritance, is the process of producing two identical replicas from one original DNA molecule. To in-depth understand such an important biological process and use it for developing new strategy against genetics diseases, the knowledge of duplication origin sites in DNA is indispensible. With the explosive growth of DNA sequences emerging in the postgenomic age, it is highly desired to develop high throughput tools to identify these regions purely based on the sequence information alone. In this paper, by incorporating the dinucleotide position-specific propensity information into the general pseudo nucleotide composition and using the random forest classifier, a new predictor called iROS-gPseKNC was proposed. Rigorously cross-validations have indicated that the proposed predictor is significantly better than the best existing method in sensitivity, specificity, overall accuracy, and stability. Furthermore, a user-friendly web-server for iROS-gPseKNC has been established at http://www.jci-bioinfo.cn/iROS-gPseKNC, by which users can easily get their desired results without the need to bother the complicated mathematics, which were presented just for the integrity of the methodology itself.Entities:
Keywords: general pseudo nucleotide composition; iROS-gPseKNC; origin of replication; position-specific dinucleotide propensity; random forest
Mesh:
Year: 2016 PMID: 27147572 PMCID: PMC5085147 DOI: 10.18632/oncotarget.9057
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Figure 1A schematic drawing to show the DNA replication origin (RO)
Figure 2A semi-screenshot for the top page of the web-server iROS-gPseKNC at http://www.jci-bioinfo.cn/iROS-gPseKNC
A comparison of the proposed predictor with the existing methods via the jackknife tests on a same benchmark dataset of Supporting Information S1
| Predictor | Sn (%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|
| BC-based | 81.23 | 80.30 | 80.76 | 61.53 |
| iORI-PseKNC | 84.69 | 82.76 | 83.72 | 67.46 |
| iROS-gPseKNC |
The prediction method developed by Chen [4].
The prediction method developed by Li et al. [12]} that was deemed the most powerful one among the existing methods for the same purpose.
The prediction method proposed in this paper.
See Eq.7 for the definition of the metrics.
Figure 3Graph to show the statistical distribution of the dinucleotide occurrence frequency for (A) AA and (B) TT along the 300 bp region. See the text for further explanation
Figure 4Graph to show the ROC curve [32, 33]
The one with red is for iORI-PseKNC predictor [12]}; while the one with blue is for the proposed predictor iROS-gPseKNC. The area under the blue curve is remarkably larger than that under the red curve. See the text for further explanation.